My work on Thanksgiving ended up a hell of a lot more interesting than I planned. I ended up working from 10 AM to 9 AM the next day, with a break for dinner. Ouch. Seems that the network had a bit of a timebomb waiting in it, and it picked a day the mill was down to finally go off.
It snowed on Wensday. Well, it’s Wisconsin, so that’s not nearly as surprising as it snowing in Texas! I forgot my coat. I brought long sleved shirts, so, I’ve just been wearing layers,
Woke up on TDay and had breakfast with the mill manager and a couple of his guys after they got the mill all shut down. One of the local resteraunts that we eat at from time to time (Baker Street) had an awesome breakfast for Thanksgiving. An omlet bar where you picked out your fixin’s and he made it right there for ya, and just about any pastry you could think of was laid out, tons of breakfast stuff like biscuts and gravy. And oddly enough for breakfast, they already started serving turkey and fixin’s too. Yeah, I ate too much for breakfast, enough that I skipped lunch.
Thanksgiving dinner was pretty cool. Dominic had invited me to come eat with him and his family. I’ve met his wife and kids before, and they are all cut out of the same cloth as Dominic – lots of sarcasam and laughter. So yeah, I fit in pretty well 🙂
And of course I found time to follow the holiday tradition – I listened to Alice’s Resteraunt by Arlo Guthree. Of course I did that part alone (though I had threatened Tammy with calling her up and forcing her to listen to it with me 🙂
(Prepare for a bit of this to get geeky. Sorry if your eyes glaze over and ya fall asleep 😉
When I showed up on Tuesday, the scope of work changed a bit. The milling network has always been isolated from the rest of the world, but the plan was to connect three network segments together and then connect all that to the Internet for remote support for multiple contractors (of which I’m one that needs access.)
I built a new router from old spare parts (an IPCop box) that handles a lot of port forwarding, security, and access controls. Their network is a bit weird – along with being isolated, they are in a spot where they can’t get a reasonable Internet connection for support. No cable modem or DSL, and phone connections have been notoriously bad. So they use a satalite Internet connection. Which is a bit slow at times (but I managed to improve the speed considerably. Hehehhehehhe)
So, on Wensday I connected up the Interenet to the full network. A couple of quick tests, and all looks ok. My office space up there is off in another room from the satalite modem, so I can’t see what’s going on. For a while, that was an important problem – what I didn’t see was after you’d connect up for about 5 minutes, the line would go down.
While in the mill, the internal portion of the network went down while I was in the control room (There are 13 computers grand total, spread out across two buildings and 6 floors.) That’s odd. I do a little exploring and rebooted one of the machines – the network came back up at almost the same time. Hrm. Then a miller walked in with a burnt out relay, and he assumed that the two events are connected. But I wasn’t convinved.
I went back on my task of getting the remote support stuff back up and running. Testing was interesting – I’ve got an iPhone that can do VNC, so all I have to do is VNC into the portforwarded router. Turn off the WiFi capability, and now I’m standing in the same building as my target network, but talking to it from the outside world via EDGE. Pretty slick.
Except it didn’t work for some odd reason. It should have worked right off the bat, but it didn’t. I could see the router (pinging it from the iPhone – having a command line & unix network utilities on something tht fits in your pocket is bizarre, but useful as hell), but it was responding SLLLLLLOOOOOOOOOWWWWWWWLLLLLLLLLYYYYYYYYY. WTF? Traceroute shows the phone isn’t the problem – something weird is happening when you get to the satalite modem.
Well, I built the router from spare parts. So I take it offline, replace it with a Linksys wireless router they had originally. I’m assuming that it’s a bad network card somewhere on the built up router, which is very possible since it’s old retired hardware that I’m reusing.
I throw the linksys online, and all is well. Now, I’m not troubleshooting just one thing – I’ve got multiple things going on, so I’m not throwing my full attention behind the network issue. Honestly, it’s mainly being done right now to make sure the second and third network segments don’t have problems (since those are other people’s equiptment that I don’t normally work on, and would have to call them up, etc. to have things fixed), and to show proof of concept on the portforwarding for the mill manager. The bulk of the portforward and firewall configuration is going to be done on the weekend.
I move on ot other things for a while, and the end of the day rolls around – time to head home.
Thursday morning I get here and the network is down again. Odd. A little checking around, and it’s back up on it’s own. Hrm.
I settle into working on the PLC’s, doing the stuff that has to be done while the plant is down. I’m doing well, and since Dominic isn’t in his office that day, it’s much easier to just set up in his office, have access to my computer and the mill programming computer all in one place. It’s also where the box for the satelite modem is, setting right in view of the desk.
I hacked on the various PLC’s until around 4 PM when I left to go hit Dominic’s house for TDay. My last action was to check my email real quick – I hadn’t bothered to look at Internet for most of the day. Huh. It’s offline. Well… crap, I’ll check that out when I get back, otherwise I’ll be late.
When I get back, I look, and it’s still offline. In fact, the satalite modem has online one light on – power. If we had lost satalite feed, then the LAN and Power lights would be on. Something isn’t right at all! And I suddenly have a horrifying idea what it might be.
I pull the plug on the local network – all that’s hooked up is the satalite modem, the wireless router (with nothing hooked up on the LAN side), and my computer. Satalite modem reboots it’s self. Everything works well.
Plug the network back in, and wait a couple of minutes. Suddenly I see a large spike of traffic, and the modem freaks out and the only light left is the power light.
The network is very segmented, thankfully, So I begin confirming my theory – I snag a fresh copy of ClamAV for Windows, and diagnose the closest machine (one of the two servers.) Yep – we’ve got a worm. And looking at the datestamp, it’s been floating around for a while now.
The mill network environment is interesting – it’s a very closed system from the outside world, and standalone. Inside however, it’s very open between machines. When something gets loose, it’s got access to everything else. This was a customer choice from ages ago to reduce maintenance. (After this event, I’ve managed to convince them it’s time to move on past that theory 🙂
Some worms are interesting beasts – they’ll infect thier neighbors over and over until the network is saturated. This one was a bit different – it infected as many machines as it could (8 out of 13) then went silent, cross infecting it’s neighbors very quietly. But it was stuck here – it couldn’t reach the Internet. Part of it’s job was to turn the local network into a Botnet owned by someone else by downloading an updateable set of local exploits to own the boxes. The timebomb had been there for quite a while, just waiting for someone to give it access to the outside world.
Once Internet access was established, the 8 infected machines went insane, oversaturating the line. Satalite Internet connections suck balls – slow, extremely high ping times and latency (because you’re bouncing a signal off of something that at it’s closest 22,000 miles away, then another 22,000 mile back down to earth, then the response has to take the same route. All told, to send out one packet and get a response the data travels at minimum 88,000 miles!) Since bandwidth is so limited, the modems protect the line a bit – if things get too intense, the fucker will just shut down during a worm storm! Which is what we had going outbound from the network.
I had spent 3 hours at Dominic’s house, so it’s about 7 when I got back. All of this was starting to go into full swing about 8 PM. The mill is going to being restarting at 7 AM the next day.
It’s gonna be a long night.
I spent time finding the extent of the infection, then finding the cure. The first one was just to use ClamWin, and let it quaranteen any infected files.
That’s how I blew up the first server.
Luckly the servers are mirrored, so I just ghosted an image back on and away I went. But that burned 2 hours between the failed disinfection and reghosting.
Second attempt was to use a cleaner that was designed specifically for this worm. I’m also using read only media (a burned CD with a closed session), and have disconnected the network (ALL machines, not just the one I’m working on). In the end it took three runs of the first cleaning program with Windows XP running in safe mode, and a second run of another cleaning program. 3 HOURS to clean one machine. But it’s clean. This makes it about 1 AM. I’ve got around 5 to 6 hours before people show up to run the mill.
I set up a prioritized list of which machine in which order, and away I go. Lots of walking back and forth that night – each machine had to be visited a minimum of 5 times to complete disinfection. Meanwhile, nothing is allowed hooked back to the network until not only it was disinfected, but everything else was disinfected. Since it took around 3 hours to clean up a machine, I couldn’t just do one and move to the next one – I was hitting a machine, starting it’s clean up, move to the next, etc., then back to the first machine and start the cycle again. I got one hell of a workout!
At 6:30 AM, I re-hooked up the servers, and the dis-infected machines. Two of ’em couldn’t go back online – they were older machines the resisted every attempt to disinfect them. I’ll have to rebuild those two from scratch. At 7 AM Dominic walked in, and they began slowly restarting the mills.
A little after 9 AM I finally got the chance to go back to the hotel and get some sleep.
Ya know, I was planning on working for Thanksgiving. I didn’t plan on working THAT much for Thanksgiving!
When I’m done with all that, I hooked back up my spare parts router, and everything is going smoothly 🙂 Since then, no re-infections and no line connection issues. The iPCop box lets me have really nice monitoring of traffic, and things are extremely quiet. This is good – it means I managed to clean every machine properly. Otherwise, well…. I would have had to clean this mess all over again!