Just in case people wonder _why_, here is a profile from before and after.
Note how big a deal the page faulting, unmapping (TLB flushes etc), and
fork() is (copy_page_range()).
And notice how the biggest user space cost - even after the change - is
do_lookup_x() in the dynamic loader. But before the change it was the very
top entry, and you had things like strcmp and _dl_relocate_object pretty
high too. Not to mention that you got just a lot _more_ munmap's and page
faults:
- before:
4.51% git /lib64/ld-2.10.1.so [.] do_lookup_x
3.17% git [kernel] [k] unmap_vmas
2.75% git [kernel] [k] page_fault
1.48% git [kernel] [k] copy_page_c
1.43% git /lib64/ld-2.10.1.so [.] strcmp
1.30% git [kernel] [k] _spin_lock
1.12% git /lib64/ld-2.10.1.so [.] _dl_relocate_object
0.99% git-svn [kernel] [k] copy_page_range
0.99% git [kernel] [k] kmem_cache_alloc
0.97% git [kernel] [k] get_page_from_freelist
0.92% git [kernel] [k] copy_page_range
0.88% git [kernel] [k] clear_page_c
0.80% git [kernel] [k] find_vma
0.79% git /lib64/ld-2.10.1.so [.] _dl_lookup_symbol_x
0.68% git [kernel] [k] handle_mm_fault
0.68% git /lib64/libc-2.10.1.so [.] _int_malloc
0.63% git /bin/bash 0x00000000046e96
0.57% git /lib64/libc-2.10.1.so [.] __GI__dl_addr
0.51% git [kernel] [k] release_pages
- after:
3.02% git [kernel] [k] unmap_vmas
2.74% git [kernel] [k] page_fault
1.32% git [kernel] [k] copy_page_c
1.23% git [kernel] [k] _spin_lock
1.17% git-svn [kernel] [k] copy_page_range
1.06% git [kernel] [k] copy_page_range
0.99% git /lib64/ld-2.10.1.so [.] do_lookup_x
0.95% git /lib64/libc-2.10.1.so [.] _int_malloc
0.83% git [kernel] [k] get_page_from_freelist
0.83% git /lib64/libc-2.10.1.so [.] __GI__dl_addr
0.82% git [kernel] [k] clear_page_c
0.70% git [kernel] [k] kmem_cache_alloc
0.65% git [kernel] [k] handle_mm_fault
0.62% git /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so [.] Perl_yyparse
0.60% git-svn /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so [.] Perl_yyparse
0.59% git /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so [.] Perl_yylex
0.58% git [kernel] [k] release_pages
0.58% git [kernel] [k] page_remove_rmap
0.57% git /bin/bash 0x0000000004c2df
0.55% git-svn /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so [.] Perl_yylex
0.52% git [kernel] [k] find_vma
0.52% git [kernel] [k] strnlen_user
0.52% git-svn /lib64/libc-2.10.1.so [.] _int_malloc
Interesting to see how after the change, perl is now looking like a fairly
big part.
The big picture (not per-function, but per-program split by code segment:
kernel, executable, library) shows the same thing. git does have a high
kernel component in general, but something like "make test" makes it even
bigger, since most of the costs are really forking a _lot_ of git
programs:
- before:
33.23% git [kernel]
11.93% git /lib64/ld-2.10.1.so
7.55% git-svn /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
6.82% git /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
4.83% git /lib64/libc-2.10.1.so
3.28% git-svn [kernel]
1.82% sh [kernel]
1.57% git /bin/bash
1.52% perl /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
1.37% git-svn /lib64/libc-2.10.1.so
1.28% tput [kernel]
1.26% git-filter-bran [kernel]
0.98% rm [kernel]
0.97% sed [kernel]
0.82% git-rebase--int [kernel]
0.71% git-bisect [kernel]
0.64% git ./git
0.62% grep [kernel]
0.55% cat [kernel]
- after:
30.30% git [kernel]
10.62% git-svn /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
9.77% git /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
6.31% git /lib64/libc-2.10.1.so
4.31% git-svn [kernel]
3.49% git /lib64/ld-2.10.1.so
2.17% git /bin/bash
2.10% perl /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
1.93% git-svn /lib64/libc-2.10.1.so
1.90% sh [kernel]
1.40% git-filter-bran [kernel]
1.24% tput [kernel]
0.95% sed [kernel]
0.91% rm [kernel]
0.89% git ./git
0.84% git-rebase--int [kernel]
0.82% git-filter-bran /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
0.75% sh /usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so
0.59% sh /lib64/libc-2.10.1.so
0.57% grep [kernel]
0.57% git-bisect [kernel]
0.55% cat [kernel]
Note how the biggest user-space component used to be the dynamic loader.
Now it's down there way below the perl overhead.
And notice how while the dynamic loader was "just" 11% of all overhead
(and is still 3.5% after the fix), the reason performance has improved by
30% is that the dynamic loader has a _huge_ kernel overhead due to the
whole mmap/munmap/mprotect/page-fault-to-COW/etc code.
Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html