Huge lock contention when doing I/O
I have a Fedora 16 system with Intel i7 970 processor, 12GB RAM, which seems sometimes to succumb to incredibly bad lock contention problems.
Symptoms: any process attempting to read anything from the filesystem uses 100% system time for between several seconds and several minutes. When the problem gets bad, reading 10MB from /dev/zero can take several minutes. The system eventually becomes pretty much unusable.
This has been happening for a long time - I think since I upgraded from F13, but seems to be getting worse with recent kernels.
The problems seems to occur only when VMWare Workstation is running, which suggests it's involved in some way. Quitting VMWare makes the problem go away until it's started again. (Unfortunately I need to run it most of the time.)
When the problem is happening, perf top always shows this kind of output:
49.14% [kernel] [k] mutex_spin_on_owner
15.05% [kernel] [k] get_index
7.92% [kernel] [k] prio_tree_next
6.20% [kernel] [k] prio_tree_left
5.98% [kernel] [k] prio_tree_right
1.28% [kernel] [k] iter_walk_down
It's always these calls, and basically similar percentages. Drilling down into mutex_spin_on_owner shows a call to
, so presumably there's some hideous amount of lock contention going on.
static inline bool owner_running()
Attempting to attach to a process in this state with a debugger will hang until it finishes its read (which may be in several minutes time) - they seem to be actually stuck in the read() call, but show in top running and using 100% system time, rather than in "D" device wait state as they would usually if stuck in a read() call.
Does anyone know what might be happening here, if there's some setting which might help, or how I could find out more about what's going on? I posted on the Fedora and VMWare forums a while back, but got no answers.
Any help/advice much appreciated.