PDA

View Full Version : What happened yesterday?


jgcoastie
July 2, 2012, 08:31 PM
Sorry if this has been asked and answered, but I didn't see anything about it...

I couldn't access TFL from about 0900 yesterday morning until at least 2200 last night.

Was it a problem with the site? Or did something go wrong on my end?
(all other websites I frequent worked fine)

scrubcedar
July 2, 2012, 08:36 PM
I had the same trouble so it wasn't just your imagination.

hoytinak
July 2, 2012, 08:40 PM
It was the site. I couldn't get on till about 1300 today....worst night of my life. :eek:

ScottRiqui
July 2, 2012, 08:46 PM
For a while, they had their DNS entry re-mapped to a page explaining what happened. If I remember correctly, they had a server go down as a result of the Linux "leap second" error. I'm not sure whether they went the route of configuring a new server and reloading it from backups, or if they just worked with the existing server until they got it working. Regardless, I'm glad it's back up and running!

Mike Irwin
July 2, 2012, 09:06 PM
The gnomes that live in the server went on strike.

There was a failure during a reboot, and it took awhile to recover.

tyme
July 2, 2012, 11:15 PM
The leap second issue was only indirectly responsible; there were at least a couple of leap second bugs in the linux kernel: the first would have caused a crash on Saturday, but didn't affect this server. The second caused excessive CPU utilization starting at midnight July 1st.

While rebooting to try to clear up that leap second CPU bug, the server ended up hanging while trying to start mysql, before the webserver or ssh was started. I'm still not sure why it didn't timeout eventually and continue with the startup, but it didn't. I haven't had a chance today to poke around and look.

Ideally the problem would have been fixed quickly, but the colo situation is less than ideal.