Nick Piggin [interview] released version w25 of his scheduler policy patches [story], primarily targeting improvements for SMP and NUMA servers. Testing his changes on a 16-way NUMAQ at OSDL, Nick notes, "I have done some tweaking of the 'domains', and found some pretty impressive performance improvements." These improvements best show up in his dbench results, included in the following email.
Nick notes that SMT (Symmetric multithreading, aka hyperthreading) [story] is currently broken, suggesting it will remain this way until he finds an SMP P4 HT to use for testing. Read on for the full description of his changes, and numerous benchmarking comparisons.
From: Nick Piggin [email blocked] To: "Martin J. Bligh" [email blocked] Subject: [RFC] Further SMP / NUMA scheduler improvements Date: Sun, 30 Nov 2003 20:35:57 +1100 http://www.kerneltrap.org/~npiggin/w25/ Been sorting some bugs out. Its pretty stable. Although CONFIG_SMT is still apparently broken. It will probably stay that way due to not having an SMP P4 to test with and being a bit busy with other things. I have done some tweaking of the "domains", and found some pretty impressive performance improvements. The 16-way NUMAQ at OSDL is running out of steam now (ie. I've got quite a bit of low hanging fruit), and I've only got a couple more days with it anyway. So I'd like other architectures especially to try it if interested. I could assist in building architecture specific scheduling descriptions if anyone would like to try it on a non traditional SMP / NUMA, however I think something might be broken with SMT handling. Probably active migration. I can't be bothered fixing it unless I can find an SMP P4 HT to test with :P (dbench is most significantly improved) System is dev16-000 at OSDL. total/idle ticks are profiler ticks. 16GB ram, 4x4 nodes NUMAQ model name : Pentium III (Katmai) cpu MHz : 495.274 cache size : 512 KB dbench 8 16 32 64 128 bk19 470.84 433.47 360.04 351.96 359.86 w22 477.35 439.82 387.77 378.65 367.07 w25 473.10 587.74 503.35 532.83 524.61 total/idle ticks bk19 3598603/124601 w22 3425876/227225 w25 2408853/232176 tbench 8 16 24 32 bk19 46.17 58.74 60.78 59.79 w22 47.39 58.72 57.73 57.86 w25 53.59 58.60 65.80 63.35 total/idle ticks bk19 7603448/1115754 w22 7808203/1897589 w25 7150680/2038258 kernbench (make -j, 5 runs) real user sys bk19 82.231 997.021 152.044 w22 81.384 973.653 140.246 w25 80.650 970.900 131.833 total/idle ticks bk19 2218739/1442831 w22 2110820/1398357 w25 2062930/1393573 hackbench (3 runs) 1 100 bk19 0.591 37.578 w22 0.386 31.954 w25 0.365 33.289 total/idle ticks bk19 1948913/319060 w22 1655610/360777 w25 1721178/496178 reaim 256 parent time child stime child utime jpm bk19 247.33 373.57 3568.12 6396.64 w22 250.37 332.70 3614.94 6318.97 w25 258.67 311.85 3684.09 6116.21
Very Cool
But will it negatively affect interactivity? It would be nice to have both. :)
Interactivity
Should not be changed. In fact there should be no changes at all with a UP kernel. Only movement of tasks from one CPU to another is changed (ie SMP balancing).