View Single Post
Old January 20, 2009, 12:20 AM   #46
Join Date: October 13, 2001
Posts: 3,302
Xfs apparently wasn't the problem. The remaining suspects include the kernel software raid driver, lvm2, and the kernel's softlockup detection/reboot code itself. If the kernel incorrectly detects a softlockup and tries to reboot and fails (seems unlikely), that could also explain the symptoms.

Second hand reports from the colo staff were that, a week ago Sunday, sysrq+t indicated the system was stuck in software raid code. However, neither Friday nor today have I been able to get sysrq hotkeys to work after the crashes. Just a message about a detected softlockup, and then one saying it's rebooting in 6 seconds... which it obviously didn't.

For now I've disabled the kernel panic and reboot after a softlockup. Since it wasn't successfully rebooting anyway, there's no sense in having that turned on.

The attachment upload issues and usercp problems were due to bad temporary directory permissions after switching those temp directories from xfs to ext3; those permissions are both fixed.

Originally Posted by peetzakilla
FiringLine needs to move to Mac!
FreeBSD is a possibility. It's a trade off between the prospect of possible future downtime due to crashes and forced immediate downtime to switch OSes.
“The egg hatched...” “...the egg hatched... and a hundred baby spiders came out...” (blade runner)
“Who are you?” “A friend. I'm here to prevent you from making a mistake.” “You have no idea what I'm doing here, friend.” “In specific terms, no, but I swore an oath to protect the world...” (continuum)
“It's a goal you won't understand until later. Your job is to make sure he doesn't achieve the goal.” (bsg)
tyme is offline  
Page generated in 0.04817 seconds with 7 queries