I've been investigating a problem recently, in which N runnable CPU-bound tasks on an N-way machine run on only N-1 CPUs. The remaining CPU is almost 100% idle. I have seen it occur with both the CFS and O(1) schedulers. I've traced this down to what seems to be a quirk in the SMP balancer, whereby a high-priority thread which spends most of its time sleeping can artificially inflate the CPU load average calculated for one processor. Most of the time this CPU is idle (nr_running==0) yet its CPU load average is much higher than that of any other CPU. Please find attached a sample program which demonstrates this behaviour on a 2-way SMP machine. It creates three threads: two are CPU bound and run at the default priority, the third spends most of its time sleeping and runs at an elevated priority. It wakes up frequently (using /dev/rtc) and randomly generates some CPU load. On my machine (2-way Opteron with a vanilla 2.6.23.1 kernel) this test program will reliably put the scheduler into a state where one CPU has both of the busy-looping processes in its runqueue, and the other CPU is usually idle. The usually-idle CPU will have a very high cpu_load, as reported by /proc/sched_debug. Your mileage may vary. On some machines, this test program will only enter the "bad" state for a few seconds. Sometimes we bounce back and forth between good and bad states every few seconds. In all cases, removing the priority elevation fixes the balancing problem. Is this a behaviour any of the scheduler developers are aware of? I would be very greatful if anyone could shed some light on the root cause behind the inflated cpu_load average. If this turns out to be a real bug, I would be happy to work on a patch. Thanks in advance, Micah Dowty
| Benjamin Herrenschmidt | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Mariusz Kozlowski | [PATCH 03] drivers/sbus/char/bbc_envctrl.c: kmalloc + memset conversion to kzalloc |
| Yinghai Lu | [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 |
git: | |
| Gerrit Renker | [PATCH 13/37] dccp: Deprecate Ack Ratio sysctl |
| James Morris | Re: [GIT]: Networking |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
