On Wed, 2010-04-07 at 00:10 +0200, Eric Dumazet wrote:
Thanks. I also found that. Normally, my script runs hackbench for 3 times and
gets an average value. To decrease the variation, I use
'./hackbench 100 process 200000' to get a more stable result.
By default, slub_min_order=3 on my Nehalem machines. I also tried different
larger slub_min_order and didn't find help.
I collected retired instruction, dtlb miss and LLC miss.
Below is data of LLC miss.
Kernel 2.6.33:
# Samples: 11639436896 LLC-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ...................................................... ......
#
20.94% hackbench [kernel.kallsyms] [k] copy_user_generic_string
14.56% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
12.88% hackbench [kernel.kallsyms] [k] kfree
7.37% hackbench [kernel.kallsyms] [k] kmem_cache_free
7.18% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node
6.78% hackbench [kernel.kallsyms] [k] kfree_skb
6.27% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_caller
2.73% hackbench [kernel.kallsyms] [k] __slab_free
2.21% hackbench [kernel.kallsyms] [k] get_partial_node
2.01% hackbench [kernel.kallsyms] [k] _raw_spin_lock
1.59% hackbench [kernel.kallsyms] [k] schedule
1.27% hackbench hackbench [.] receiver
0.99% hackbench libpthread-2.9.so [.] __read
0.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
Kernel 2.6.34-rc3:
# Samples: 13079611308 LLC-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... .................................................................... ......
#
18.55% hackbench [kernel.kallsyms] [k] copy_user_generic_str
ing
13.19% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
11.62% hackbench [kernel.kallsyms] [k] kfree
8.54% hackbench [kernel.kallsyms] [k] kmem_cache_free
7.88% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_
caller
6.54% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node
5.94% hackbench [kernel.kallsyms] [k] kfree_skb
3.48% hackbench [kernel.kallsyms] [k] __slab_free
2.15% hackbench [kernel.kallsyms] [k] _raw_spin_lock
1.83% hackbench [kernel.kallsyms] [k] schedule
1.82% hackbench [kernel.kallsyms] [k] get_partial_node
1.59% hackbench hackbench [.] receiver
1.37% hackbench libpthread-2.9.so [.] __read
--