login
Header Space

 
 

Re: git log filtering

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Thursday, February 8, 2007 - 2:16 am

On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote:


The patch is delightfully simple (though a real patch would probably be
conditional):

diff --git a/Makefile b/Makefile
index aca96c8..cf391dc 100644
--- a/Makefile
+++ b/Makefile
@@ -323,7 +323,7 @@ BUILTIN_OBJS = \
 	builtin-pack-refs.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS = -lz
+EXTLIBS = -lz -lpcreposix -lpcre
 
 #
 # Platform specific tweaks
diff --git a/git-compat-util.h b/git-compat-util.h
index c1bcb00..a6c77f9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -40,7 +40,7 @@
 #include <sys/poll.h>
 #include <sys/socket.h>
 #include <assert.h>
-#include <regex.h>
+#include <pcreposix.h>
 #include <netinet/in.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>


A few numbers, all from a fully packed kernel repository:

# glibc, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36617minor)pagefaults 0swaps

# glibc, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36210minor)pagefaults 0swaps

# pcre, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36571minor)pagefaults 0swaps

# pcre, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36583minor)pagefaults 0swaps


So the winner seems to vary based on the complexity of the pattern.
There are some less rudimentary but non-git performance tests here:

  http://www.boost.org/libs/regex/doc/gcc-performance.html

In every case there, pcre has either comparable performance, or simply
blows away glibc.

One final note that caused some confusion during my testing: git-grep
still uses external grep for working tree greps (i.e., 'git grep foo').
This meant that 'git grep' and 'git grep --cached' produced wildly
different results once I was using pcre internally. Something to look
out for if we switch to pcre (or any other library which doesn't exactly
match external grep behavior!).

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
git log filtering, Don Zickus, (Wed Feb 7, 12:41 pm)
Re: git log filtering, Linus Torvalds, (Wed Feb 7, 1:12 pm)
Re: git log filtering, Don Zickus, (Wed Feb 7, 2:19 pm)
Re: git log filtering, Linus Torvalds, (Wed Feb 7, 2:27 pm)
Re: git log filtering, Linus Torvalds, (Wed Feb 7, 2:16 pm)
Fix "git log -z" behaviour, Linus Torvalds, (Wed Feb 7, 3:49 pm)
Re: Fix "git log -z" behaviour, Junio C Hamano, (Thu Feb 8, 6:34 pm)
Re: Fix "git log -z" behaviour, Junio C Hamano, (Sat Feb 10, 3:32 am)
Re: Fix "git log -z" behaviour, Junio C Hamano, (Sat Feb 10, 5:36 am)
Re: Fix "git log -z" behaviour, Linus Torvalds, (Sat Feb 10, 1:09 pm)
Re: Fix "git log -z" behaviour, Don Zickus, (Wed Feb 7, 6:53 pm)
Re: Fix "git log -z" behaviour, Linus Torvalds, (Wed Feb 7, 7:05 pm)
Re: Fix "git log -z" behaviour, Junio C Hamano, (Wed Feb 7, 3:55 pm)
Re: git log filtering, Johannes Schindelin, (Wed Feb 7, 1:25 pm)
Re: git log filtering, Junio C Hamano, (Wed Feb 7, 4:47 pm)
Re: git log filtering, Linus Torvalds, (Wed Feb 7, 5:03 pm)
Re: git log filtering, Horst H. von Brand, (Wed Feb 7, 9:59 pm)
Re: git log filtering, Junio C Hamano, (Wed Feb 7, 5:09 pm)
Re: git log filtering, Linus Torvalds, (Wed Feb 7, 5:53 pm)
Re: git log filtering, Jeff King, (Thu Feb 8, 2:16 am)
pcre performance, was Re: git log filtering, Johannes Schindelin, (Wed Mar 7, 1:37 pm)
Re: pcre performance, was Re: git log filtering, Paolo Bonzini, (Wed Mar 7, 2:03 pm)
Re: git log filtering, Johannes Schindelin, (Thu Feb 8, 2:06 pm)
Re: git log filtering, Jeff King, (Thu Feb 8, 6:33 pm)
Re: git log filtering, Johannes Schindelin, (Thu Feb 8, 8:18 pm)
Re: git log filtering, Jeff King, (Thu Feb 8, 9:59 pm)
Re: git log filtering, Johannes Schindelin, (Fri Feb 9, 9:13 am)
Re: git log filtering, Jeff King, (Fri Feb 9, 9:22 am)
Re: git log filtering, Johannes Schindelin, (Fri Feb 9, 11:02 am)
Re: git log filtering, Shawn O. Pearce, (Thu Feb 8, 8:23 pm)
Re: git log filtering, Sergey Vlasov, (Fri Feb 9, 6:15 am)
Re: git log filtering, Johannes Schindelin, (Thu Feb 8, 8:45 pm)
Re: git log filtering, Jakub Narebski, (Wed Feb 7, 12:55 pm)
Re: git log filtering, Uwe , (Wed Feb 7, 1:01 pm)
Re: git log filtering, Johannes Schindelin, (Wed Feb 7, 1:12 pm)
speck-geostationary