I don't know if this has been noticed before. I was benchmarking my
page table relocation code and I noticed that on 2.6.25-rc9 page
faults take 10% more time than on 2.6.22. This is using lmbench
running on an intel x86_64 system. The good news is that the page
table relocation code now only adds a 1.6% slow down to page faults.
Ross
2.6.25-rc9:
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
ipnn2 Linux 2.6.25- 13.9 7.6111 103.4 9.7453 926.0 0.711 2.14250 2.552
ipnn2 Linux 2.6.25- 13.7 7.6243 310.7 9.6574 932.0 0.750 2.15970 2.555
ipnn2 Linux 2.6.25- 13.9 7.6831 192.5 10.0 927.0 0.760 2.21310 2.553
ipnn2 Linux 2.6.25- 13.9 7.5739 98.4 9.5330 927.0 0.703 2.17610 2.554
ipnn2 Linux 2.6.25- 14.6 7.6429 39.1 10.8 935.0 0.763 2.17250 2.552
ipnn2 Linux 2.6.25- 14.1 7.8777 129.8 9.9375 930.0 0.782 2.26460 2.559
ipnn2 Linux 2.6.25- 14.8 7.9639 623.8 8.2042 927.0 0.773 2.21510 2.557
ipnn2 Linux 2.6.25- 14.4 7.5842 622.3 8.3272 920.0 0.745 2.22210 2.558
ipnn2 Linux 2.6.25- 14.2 7.6339 45.7 10.2 935.0 0.675 2.23860 2.554
ipnn2 Linux 2.6.25- 14.1 7.7175 263.7 10.1 929.0 0.762 2.22350 2.556
ipnn2 Linux 2.6.25- 13.9 8.1230 378.2 9.4343 975.0 0.752 2.25920 2.554
2.6.23:
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- ...Do you have CONFIG_CGROUP_MEM_RES_CTLR=y in 2.6.25? That added about 20% to my lmbench "Page Fault" tests (with adverse effect on several others e.g. the fork, exec, sh group). Try the same kernel with boot option "cgroup_disable=memory", that should recoup most (but not quite all) of the slowdown; or rebuild with n to CGROUP_MEM_RES_CTLR. But your "Mmap Latency" went up 425% ?? --
I don't have config cgroups set. I do have fake numa on, but I'm
pretty sure it was on for 2.6.23 as well.
# CONFIG_CGROUPS is not set
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
C
Ross
--
Hmm.. strange.. I don't remember the overhead being so bad (I'll Balbir --
On Tue, Apr 29, 2008 at 10:52 AM, Balbir Singh
I'm checking 2.6.24 now. A quick run of 2.6.25-rc9 without fake numa
showed no real change.
Ross
--
Worth checking 2.6.24, yes. But you've already made it clear that you do NOT have mem cgroups in your 2.6.25-rc9, so Balbir (probably) need not worry about your regression: my guess was wrong on that. Hugh --
2.6.24 is slower as well. I can't say for sure it's the full 10%
without more work than it's worth. But it is definitely significantly
slower than 2.6.23.
Ross
--
Aah.. Yes... but I am definitely interested in figuring out the root cause for the regression. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
On Tue, Apr 29, 2008 at 12:42 PM, Balbir Singh
I can't reproduce the 2.6.23 results. I'm going to run the benchmarks
a few more times, but I'm suspecting something changed with the
hardware.
Ross
--
The 2.6.23 results have been consistant with 2.6.24 results and
lmbench has crashed my test machine at least once. I'm guessing some
sort of memory error causing a lot of ECC and slowing things down.
Ross
--
On Tue, 29 Apr 2008 09:10:36 -0400 It seems lmbench's pagefault program uses 'page fault by READ'. Then, this patch affects. (this patch was added at 2.6.24-rc?.) == http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=557ed1fa26... == By it, ZERO_PAGE is not used for page fault in anonymous mapping. So it seems an expexted result. Thanks, --
I'd wondered about that one too, but no: lmbench lat_pagefault uses a shared mmap of an ordinary file (not /dev/zero), so the ZERO_PAGE changes should have no effect on it whatsoever. I notice that test is expecting msync(,,MS_INVALIDATE) to do something it's never done on Linux (a kind of drop caches for the range). We've never done anything with MS_INVALIDATE, beyond permitting the flag: I think you find problems however you try to go about implementing it (and it might even originate from a UNIX which couldn't do shared mmap coherently). So I wonder if that test is erratic because of it. Hugh --
