* Rob Hussey <robjhussey@gmail.com> wrote:thanks for the update! the unbound results are harder to compare because CFS changed SMP balancing to saturate multiple cores better - but this can result in a micro-benchmark slowdown if the other core is idle (and one of the benchmark tasks runs on one core and the other runs on the first core). This affects lat_ctx and pipe-test. (I'll have a look at the hackbench behavior.) these are the more comparable (apples to apples) tests. Usually the most stable of them is pipe-test: so -ck1 is 0.8% faster in this particular test. (but still, there can be caching effects in either direction - so i usually run the test on both cores/CPUs to see whether there's any systematic spread in the results. The cache-layout related random spread can be as high as 10% on some systems!) many things happened between 2.6.22-ck1 and 2.6.23-cfs-devel that could affect performance of this test. My initial guess would be sched_clock() overhead. Could you send me your system's 'dmesg' output when running a 2.6.22 (or -ck1) kernel? Chances are that your TSC got marked unstable, this turns on a much less precise but also faster sched_clock() implementation. CFS uses the TSC even if the time-of-day code marked it as unstable - going for the more precise but slightly slower variant. To test this theory, could you apply the patch below to cfs-devel (if you are interested in further testing this) - this changes the cfs-devel version of sched_clock() to have a low-resolution fallback like v2.6.22 does. Does this result in any measurable increase in performance? (there's also a new sched-devel.git tree out there - if you update to it you'll need to re-pull it against a pristine Linus git head.) Ingo --- arch/i386/kernel/tsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux/arch/i386/kernel/tsc.c =================================================================== --- linux.orig/arch/i386/kernel/tsc.c +++ linux/arch/i386/kernel/tsc.c @@ -110,9 +110,9 @@ unsigned long long native_sched_clock(vo * very important for it to be as fast as the platform * can achive it. ) */ - if (unlikely(!tsc_enabled && !tsc_unstable)) + if (1 || unlikely(!tsc_enabled && !tsc_unstable)) /* No locking but a rare wrong value is not a big deal: */ - return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ); + return jiffies_64 * (1000000000 / HZ); /* read the Time Stamp Counter: */ rdtscll(this_offset); -
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Christian Kujau | 2.6.20.4: NETDEV WATCHDOG and lockups |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jack Steiner | Re: [patch] my mmu notifiers |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Arjan van de Ven | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Jens Axboe | Re: [BUG] New Kernel Bugs |
| YAMAMOTO Takashi | removing VOPs |
| Lennart Augustsson | Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x |
| Daniel Carosone | Re: direct I/O |
| Brian Buhrow | Re: /sbin/reboot and secmodel |
