More tech info:
The disk/raid driver was getting stuck, and the kernel wasn't rebooting from the subsequent soft lockup like it was supposed to.
After last weekend's fiasco we had a new linux kernel ready, so we're on 2.6.28 now.
I've also converted 2 of the 3 filesystems that were xfs over to ext3 (the database/web/root directories have always been ext3, but a few non-critical directories were xfs to speed things up). There's a recent xfs bug report I'm worried might have been the cause of these lockups.
__________________
“The egg hatched...” “...the egg hatched... and a hundred baby spiders came out...” (blade runner)
“Who are you?” “A friend. I'm here to prevent you from making a mistake.” “You have no idea what I'm doing here, friend.” “In specific terms, no, but I swore an oath to protect the world...” (continuum)
“It's a goal you won't understand until later. Your job is to make sure he doesn't achieve the goal.” (bsg)
|