Need help debugging hard lock on high disk activity
I am struggling trying to find the cause of a hardlock under heavy IO. First things first:
lshw attached, in short:
ASUS ncch-dl (dual xeon, i875 chipset)
Nvidia Quadro FX 4000
Adaptex 2120s U320 SCSI Raid controller on PCIX 66MHz, 4 drives on RAID0
System often hangs on heavy disk I/O (i.e. dd if=/dev/zero of=test bs=1M count=1024, installing a package, etc.). Keyboard and mouse are completely dead, including numlock and capslock. Happens both when using the GUI or a virtual terminal. While on the GUI it looks like a complete system freeze, under the virtual terminal I can still see the cursor blinking, making me rather think of a kernel deadlock than a system freeze.
My efforts so far:
It is not a distribution specific issue, I tested from kernel 2.6.24 onwards and always got the issue.
It is not a hardware failure since the system is rock solid on pcbsd. I tried three different raid controllers, both from adaptec and lsi, but it made no difference. I tested the RAM will memtest, no problems.
I can not find any indication of nearing failure in any system log. I tried enabling nmi_watchdog from GRUB, but kernel log stays empty and system still hangs.
I'd be really happy if anyone could give me any hints on what may be the cause respectively on how I can collect any useful information that would allow me to open a bug in the kernel bugzilla.
Thanks in advance!