Idle Banter For non SV and non bike related chat (and the odd bit of humour - but if any post isn't suitable it'll get deleted real quick).![]() |
![]() |
|
Thread Tools |
![]() |
#11 |
Guest
Posts: n/a
|
![]()
I hate computers.
We've had some database "issues" over the past few days. Unfortunately, just before it all went tits-up, the outsource boys in India made an un-documented and un-approved change (for which they received a proper bollocking from me). Anyway the application support bods got wind of this and of course this had to be the cause of the incident (Of course it could never be the prehistoric version of Oracle or the ****ty application causing a problem :P ) I tried to explain that me putting the kettle on in Nottingham and causing a power spike in the data centre 50 miles away, was just as likely to have caused the problem, as the Mumbai lads adding an extra line in syslog.conf, but they wouldn't have it. What are the sentencing guidelines for pre-meditated ABH ? Sorry for the derail/rant. |
![]() |
![]() |
#12 |
Guest
Posts: n/a
|
![]()
OK, for the geek contingent on the site, the issue is now fixed (I think) so the system is running. It's far from perfect, but it's running.
Just a bit of background information first I think. Unfortunately I'm tied to an NDA, so I'll have to keep details vague. The two clients affected yesterday, one is a large car manufacturer (quite expensive cars too), the other is a world wide distribution/logistics company. For the car manufacturer, our applications control everything from production line robotics right the way through their own distribution & GPS tracking systems. Because they had downtime, and I mean, complete downtime, everything was powered down for safety reasons (ever seen a robotic rivet gun getting fed random data when there's a chance for people to be nearby doing inspections? NOT GOOD). For the logistic's company, all their route planning system, GPS tracking, automated warehousing, the lot, down. This meant that everything for them had to revert to manual, which isn't quite as big a deal as the car manufacturer, but still slows down operations. fizzwheel, I understand fully what you were getting at. For the car manufacturer, I simply shutdown the servers. Their have to power down their hardware anyway, so what's the point of having servers commanding hardware that has no power. Also, two clients down, which one to fix first? I can't be on two systems at the same time. The logistics company, well, it works out that if they have to stop operations completely, they loose just short of £41,700 every minute! ![]() So the setup. Each client has 4 servers (at least). Database server, and 3 application servers. Of those application servers, 2 deal with users, the other automated processes (EDI stuff mainly). All of this runs on various Windows platforms (we don't care, it's up to them to choose), with IIS (yuck) and WebObjects (nice, but rather limiting). Our applications are written in Java, and interface like most other web based applications. User sees HTML, this is fed back through WebObjects to our application. OK, so the steps to fix is were basically: 1) Get call from clients, have a brief look on both systems, realise that the car manufacturer has a potential to actually hurt people if it's left running, so shut that down. Screw the money they loose, policy states you don't put profit ahead of people, ever. 2) Investigate the logistics client a little more, and realise that this is a MAJOR issue. Call the client & get authorisation to "do whatever is needed to ensure productivity". Basically, I now have the green light to do everything up to and including re-imaging live servers in-situ. It's at this point is when I turn around to my boss, explain what I see before me, and his response was literally "OK, you'd best deal with it" as he walked out of the office. He never came back, I've no idea where he went, and I really don't care. He can explain himself to those who pay his wages. It was this point when I emailed him, CC'd the directors of our company & the client & basically said "This is what I plan on doing, if anyone has any objections I'm on extension 229, you've got 30 seconds before I start. If my first thoughts don't work, I've been left alone to deal with it, it'll be dealt with however I see fit." 3) Get a call from the client asking if I'm going to be able to fix this within critical SLA (4 hours). I explain that I'm not sure, as this is something no-one has ever encountered, and he says he'll call me back in an hour to get an update. If I'm still not sure, he'll arrange for transport to site (which means a private jet ![]() 4) Seeing that the server is running, and I can keep a connection to it, but our application isn't talking to anything, either via IIS or standalone EDI file transaction, or our own port communications. Start scratching my head, decide to shut down the entire thing temporarily & restart it. Not having any of it. 5) Run some of our database integrity checking tools, which tell me that the DB has 'inconsistancies' (this could mean anything realisitcally), so drop to resillience & run the same scripts, same result. Bugger. Live DB, no backup system available. Oh ****. 6) OK, so shutdown & restart hasn't killed this thing. I've got no access to the applications. Need to start thinking now. I know nothing about it, so, I need to know something, how do I do that? Aha, I have a java compiler. So I quickly knock out a rough application to iterate the running processes, memory segments they reside in etc etc, run it on one of the servers & start ruling out legitimate services/applications. 7) Repeat 6, on another server, and cross-reference results. This leaves one standing out from the rest. I take this to be a virus (it might not be, but it damn well looks like it) so call a college friend of mine who just happens to work as a code monkey for an AV company ![]() ![]() 9) Damn it seems there's a pattern to it's processing. That's good. My friend emails me a modified Windows NTx kernel, which I send to one of the application servers and reboot it (there's another application server, so things will keep running). With the new kernel in place, I now have the power to overwrite any memory segments I choose (but so does the virus). So I quickly fire up a memory hex editor, and fill in a few NOOPs. It takes a few attempts, but I manage to commit these NOOPs at the right time, and kill the damn thing. (For those that know about it, I basically used the old NOOP sled attack that used to cause buffer overflows, but a customized version). 10) Restore the original kernel, reboot the server, test things (after locking it away from the rest of the LAN by firewall rules), good, she's running. 11) Repeat one server at a time, until the client is sorted out. Thank Allah for that! 12) Report back to the client, tell him to cancel my transport, and do everything above for the other client. By now my head hurts, a lot. 13) Everything fixed, everyone happy, I head home, not too late either ![]() Then I get a call from the logistics client at 10pm last night "We've got the same issue again"... OH ****! So spend half the night fixing it again. Now I have to fill in reports about what happened, and try to find out where this thing came from. My college friend will probably come in handy for that, and in reward, his company will get as much information I can provide (without breeching NDA) about the attack. I'm also going to spell out that I'd like a chocolate fireguard as my new boss when I file the reports. It'd of been much more useful! So there you have it folks ![]() |
![]() |
![]() |
#13 | |
Member
Mega Poster
Join Date: Aug 2003
Location: HomeBound
Posts: 3,302
|
![]() Quote:
![]() ![]() ![]() Sounds like you did a ![]() Cheers Ben |
|
![]() |
![]() |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
I love my job! | SAMMY650 | Bikes - Talk & Issues | 9 | 05-07-08 12:17 PM |
Do you love me or love my wife? | Stig | Idle Banter | 39 | 01-02-08 08:56 PM |
Love will tear us apart cover - love it or loathe it ? | fizzwheel | Idle Banter | 19 | 08-10-07 07:11 PM |
Love is... | Law | Guildford Massive | 14 | 23-03-07 07:29 AM |
Sometimes I love my job... | Bear | Idle Banter | 16 | 22-11-06 11:41 AM |