I was curious to know what is the easiest way to filter info inside a commit message. For example say I wanted to find out what patches Joe User has submitted to the git project. I know I can do something like ' git log |grep -B2 "^Author: Joe User" ' and it will output the matches and the commit id. However, if I wanted to filter on something like "Signed-off-by: Joe User", then it is a little harder to dig for the commit id. Is there a better way of doing this? Or should I accept the fact that git wasn't designed to filter info like this very quickly? I guess what I was looking to do was embed some metadata inside the commit message and parse through it at a later time (ie like a bugzilla number or something). Any thoughts/tips/tricks would be helpful. Cheers, Don -
There are two ways: - "git log" can itself do a lot of filtering. Both on date, on revisions, on "modifies files/directories X, Y and Z" _and_ on strings. See "man git-rev-list" for more (it doesn't apply to just "git log", it applies to just about any revision listing, including gitk etc) For example, git log [--author=pattern] [--committer=pattern] [--grep=pattern] will likely do exactly what you want. You can do git log --grep="Signed-off-by:.*akpm" on the kernel archive to see which ones were signed off by Andrew. So the above works, and catches *most* uses. But it has problems if you want to do something fancier (and I think that includes something as simple as doing a case-insensitive grep). So the other approach is: - The hacky way: use "git log --pretty -z", and GNU grep -z: git log --pretty -z | grep -i -z Signed-off-by:.*junkio | tr '\0' '\n' which allows you to do anything you want with grep (or other unix tools Git definitely was designed to do it. The "-z" option in particular is very much designed for any generic UNIX scripting, but the *easy* cases git does internally. Linus -
Cool. The hidden little options. :-) This is exactly what I was looking for. Thanks. I didn't see these options in the man pages. Might be worth putting in there?? Cheers, Don -
Well, they really _are_ there, indirectly: The command takes options applicable to the git-rev-list(1) command to control what is shown and how, and options applicable to the git-diff-tree(1) commands to control how the change each commit introduces are shown. so you have to look at both git-rev-list and git-diff-tree to get all the options. It then goes on to say: This manual page describes only the most frequently used options. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ so technically it's complete and true. But yeah, maybe we could include all the options there. Linus -
Gaah. If all you want is normal logs, you don't need the "--pretty", of course, since that's the default. Just "git log -z" will give you zero-terminated logs. But if you want to grep on committer, you'd need to use "--pretty=full" or something, of course, so the "--pretty=xyz" thing is indeed often applicable for things like this. Also, I just checked, and we have a bug. Merges do not have the ending zero in "git log -z" output. It seems to be connected to the fact that we handle the "always_show_header" commits differently (the ones that we wouldn't normally show because they have no diffs associated with them). The obvious fix for that failed. I'll look at it some more. Linus -
For commit messages, we should really put the "line_termination" when we output the character in between different commits, *not* between the commit and the diff. The diff goes hand-in-hand with the commit, it shouldn't be separated from it with the termination character. So this: - uses the termination character for true inter-commit spacing - uses a regular newline between the commit log and the diff We had it the other way around. For the normal case where the termination character is '\n', this obviously doesn't change anything at all, since we just switched two identical characters around. So it's very safe - it doesn't change any normal usage, but it definitely fixes "git log -z". By fixing "git log -z", you can now also do insane things like git log -p -z | grep -z "some patch expression" | tr '\0' '\n' | less -S and you will see only those commits that have the "some patch expression" in their commit message _or_ their patches. (This is slightly different from 'git log -S"some patch expression"', since the latter requires the expression to literally *change* in the patch, while the "git log -p -z | grep .." approach will see it if it's just an unchanged _part_ of the patch context) Of course, if you actually do something like the above, you're probably insane, but hey, it works! Try the above command line for a demonstration (of course, you need to change the "some patch expression" to be something relevant). The old behaviour of "git log -p -z" was useless (and got things completely wrong for log entries without patches). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- Actually, the obvious fix was right, I just did the *wrong* obvious fix at first ;) log-tree.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/log-tree.c b/log-tree.c index d8ca36b..85acd66 100644 --- a/log-tree.c +++ b/log-tree.c @@ -143,7 +143,7 @@ void show_log(struct rev_info *opt, const ...
Gaah. I have already applied this but I think this has fallout for existing users of "-z --raw". Nothing in-tree uses "git log" as the upstream of a pipe as far as I know because in-tree stuff tend to stick to plumbing when it comes to scripting, but I think your patch would affect the plumbing level as well. Scripts that read from "-z --raw" have been expecting to get a record whose first 7 bytes are "commit " to be a log, which is followed by an arbitrary number of records whose first byte is ":" (and then it needs variable number of records to complete one diff record). This patch removes the separator NUL between the log message and the first diff record. -
I think the new semantics for -z ("inter-record termination is
NUL") makes a lot more sense for "-p -z" format that shows
commit log message and the patch text. It makes filtering the
output with "grep -z" feel much more natural.
The new semantics is however quite inconsistent with the other
formats: --raw, --name-only and --name-status. These already
use NUL for separating pathnames and fields when -z is given, in
order to allow scripts sensibly deal with pathname that contain
funny characters (e.g. LF and HT). Nobody is likely to feed
their output to "grep -z", but one problematic case I see is to
use this:
git log -z --raw -r --pretty=raw $commit
or its equivalent:
git rev-list $commit |
git diff-tree --stdin --raw -r --pretty=raw
to prepare data to feed something like fast-import.
But such newly written scripts can read from non -z and unwrap
paths themselves just as easily (the pathname safety with NUL
was invented before we started using c-quote consistently), so
it might be Ok to leave them (slightly) broken.
So, I give up.
-... well, it just occured to me that it might make sense not to
let this new "use NUL as inter-commit separator for grep -z"
semantics hijack existing -z option, but introduce another
option, say, -Z. Then you could even do something like:
git log -Z -r --numstat |
grep -z -e '^[1-9][0-9][0-9][0-9]* '
to find commits that has more than 100 lines of additions to a
file. (or use --stat and grep for '| *[1-9][0-9][0-9][0-9]* ' to
look for sum of addition+deletion ).
Hmmmm.
-I don't think I disagree, but I do suspect it's not worth it. Yes, we really do have two "line_termination" characters: the one between commits, and the one we use within raw diffs. However, I don't think the *combination* ever makes sense any more (*), so using the same flag doesn't seem to really be a problem. And the -z "line_termination" already got hijacked a long time ago for inter-commit messages too, so while adding a "-Z" would perhaps avoid a certain ambiguity, it would actually potentially break stuff that just did git-rev-list -z --pretty .. | ... which is actually _more_ likely than the "multiple commit messages _and_ raw outpu _and_ '-z'" combination. So I would suggest leaving it as-is, especially since I don't think anybody has actually even noticed (ie nobody probably used that combination), and the new semantics in many ways are both more useful and more logical. Linus (*) It may well have made sense a year and a half ago, I don't think it makes much sense any more. -
Works for me. :) And I thought I had a handle on a lot of the Unix commands. That -z stuff just threw me for a loop. It's pretty neat to be able to grep commits and have the output display the whole commit and diff. Cheers, Don -
The whole "-z" flag to grep is a GNU extension, as far as I know. I don't
think it's portable.
Even for GNU grep, it's not mentioned in the man-page. Whether that is
just due to the normal inane FSF rules ("man-pages are evil, you should
use those idiotic info pages") or whether it is a conscious effort to not
document nonstandard features, I don't know.
Linus
-Ah, I was looking at other minor issues and then came up with
this one liner. But obviously "termination should be the true
inter-commit spacing" is the right direction, so I'll chuck this
one.
diff --git a/log-tree.c b/log-tree.c
index d8ca36b..410f90f 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -354,6 +354,8 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
if (!shown && opt->loginfo && opt->always_show_header) {
log.parent = NULL;
show_log(opt, "");
+ if (!opt->diffopt.line_termination)
+ putchar(0);
shown = 1;
}
opt->loginfo = NULL;
-Hi,
[TIC PATCH] revision.c: accept "-i" to make --grep case insensitive
When calling
git log --grep=blabla -i --grep=blublu
the expression "blabla" is greppend case _sensitively_, but "blublu"
case _insensitively_.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
revision.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/revision.c b/revision.c
index 42ba310..843aa8e 100644
--- a/revision.c
+++ b/revision.c
@@ -9,6 +9,8 @@
#include "grep.h"
#include "reflog-walk.h"
+static int case_insensitive_grep = 0;
+
static char *path_name(struct name_path *path, const char *name)
{
struct name_path *p;
@@ -742,6 +744,8 @@ static void add_grep(struct rev_info *revs, const char *ptn, enum grep_pat_token
opt->status_only = 1;
opt->pattern_tail = &(opt->pattern_list);
opt->regflags = REG_NEWLINE;
+ if (case_insensitive_grep)
+ opt->regflags |= REG_ICASE;
revs->grep_filter = opt;
}
append_grep_pattern(revs->grep_filter, ptn,
@@ -1042,6 +1046,11 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
add_header_grep(revs, "committer", arg+12);
continue;
}
+ if (!strcmp(arg, "-i") ||
+ !strcmp(arg, "--case-insensitive")) {
+ case_insensitive_grep = 1;
+ continue;
+ }
if (!strncmp(arg, "--grep=", 7)) {
add_message_grep(revs, arg+7);
continue;
-This is very tempting but, ... hmmmm... -
I would actually prefer to have it be some marker on the expression itself. We already do that '^' handling by hand for "author"/"committer" things. We could do other things like that. Although I guess the downside of not doing standard regexps would be too big. Linus -
Use Perl's regexps? the pcre library packs them, and they have all sorts of goodies like markers in the expression itself. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 2654431 Universidad Tecnica Federico Santa Maria +56 32 2654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513 -
We could go pcre and let you say "(?i)". That would all be post 1.5.0, though. -
Hmm. PCRE is probably wide-spread enough that it could be an option. What's PCRE performance like? I'd hate to make "git grep" slower, and it would be stupid and confusing to use two different regex libraries.. Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API standpoint, not from a regex standpoint!) wrapper thing, and it might be interesting to hear if doing "git grep" is slower or faster.. (I realize that the performance thing depends heavily on the patterns and the working set they are used on, but I guess _I_ personally only care about fairly simple patterns on the kernel ;) Linus -
The patch is delightfully simple (though a real patch would probably be conditional): diff --git a/Makefile b/Makefile index aca96c8..cf391dc 100644 --- a/Makefile +++ b/Makefile @@ -323,7 +323,7 @@ BUILTIN_OBJS = \ builtin-pack-refs.o GITLIBS = $(LIB_FILE) $(XDIFF_LIB) -EXTLIBS = -lz +EXTLIBS = -lz -lpcreposix -lpcre # # Platform specific tweaks diff --git a/git-compat-util.h b/git-compat-util.h index c1bcb00..a6c77f9 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -40,7 +40,7 @@ #include <sys/poll.h> #include <sys/socket.h> #include <assert.h> -#include <regex.h> +#include <pcreposix.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <arpa/inet.h> A few numbers, all from a fully packed kernel repository: # glibc, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36617minor)pagefaults 0swaps # glibc, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36210minor)pagefaults 0swaps # pcre, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36571minor)pagefaults 0swaps # pcre, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36583minor)pagefaults 0swaps So the winner seems to vary based on the complexity of the pattern. There are some less rudimentary but non-git performance tests here: http://www.boost.org/libs/regex/doc/gcc-performance.html In every case there, pcre has either comparable performance, or simply blows away glibc. One final note tha...
Hi,
So I tested this against external grep. For completeness' sake, I tested
these against each other: GNU regex-0.12, Git _without_ external grep
(relies on glibc's regex), Git _with_ external grep ("original"), pcre,
and for good measure, pcre with NO_MMAP=1 (to test if disk access is the
problem).
Here are the numbers:
grep-gnu-regex:
21.41user 1.08system 0:22.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
21.40user 1.06system 0:22.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
21.61user 1.06system 0:22.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
21.30user 1.10system 0:22.48elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
21.30user 1.08system 0:22.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
grep-no-external-grep:
6.98user 1.17system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7120minor)pagefaults 0swaps
7.07user 1.16system 0:08.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps
6.98user 1.12system 0:08.11elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps
7.00user 1.18system 0:08.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps
grep-original:
0.82user 1.15system 0:01.97elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7090minor)pagefaults 0swaps
0.94user 1.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7099minor)pagefaults 0swaps
0.89user 1.07system 0:01.96elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7092minor)pagefaults 0swaps
0.81user 1.15system 0:01.97elapsed 99%CPU (0avgtext+0avgda...Indeed GNU regex 0.12 loses, and that's why it was rewritten for (IIRC) glibc 2.3. Older glibc's use code derived from GNU regex 0.12; but the old GNU regex code is dead in general (maybe it survives in Emacs -- but I don't remember), and the glibc regex code can be used by external programs via gnulib. glibc is slower than PCRE mostly because it is internationalized. So for example it supports things like stra[.ss.]e matching both strasse and stra
Hi, May I register a complaint? This is yet _another_ dependency. Ciao, Dscho -
Unlike other dependencies, I think it's quite natural to make it a conditional dependency. If you have pcre, you get more featureful regular expressions. If you don't, you get posix regular expressions. Do you object to a few extra lines in the Makefile? -Peff -
Hi, Yes, I do. Not because of the extra lines, but because of the inconsistent interface. We included libxdiff _exactly_ to ensure consistency between different git installations (remember, diff behaves quite differently on different platforms, and even GNU diff behaves differently depending on which version you use). So no, I do not like the idea of using git on some random box, only to realize that what I have grown used to does not work. Ciao, Dscho -
OK, so we may either: 1. always use the lowest common denominator (i.e., no pcre support) 2. force a dependency for new features (i.e., require pcre) 3. have inconsistency between builds (i.e., conditional dependency) 4. include all dependencies, or re-write them natively I agree that 4 can make some sense in limited situations, but I worry that it will eventually cease to be scalable (we don't get improvements or bugfixes automatically from other packages, we potentially re-invent the wheel). We already have '3' for other things: openssl, curl, expat, even perl. -Peff -
Hi, The difference, of course, is that with the "other things", we either have no alternative (if you do not have curl, you cannot use HTTP transport), or we have workalikes (if you don't use openssl, the (possibly slower) SHA1 replacements take effect). We _used_ to rely on external "diff" and "merge", but have them as inbuilt components, exactly to avoid "if you have a slightly differing setup, git behaves differently". Ciao, Dscho -
I'm not a pcre expert, but I thought most of the additions to posix extended regular expressions were expressed through constructs that would otherwise be invalid patterns. For example, '(?i)' doesn't make any sense as a pattern. Thus you would only see different behavior when inputting nonsense. Of course, we're not currently using extended regexps, but that could be made the default without additional But you're OK with "if you didn't built against curl, http transport just doesn't work." So what if there is a '--pcre' option and a corresponding config option? Thus you get the same results always, unless you use --pcre and it's not built, in which case git dies. That seems to be the moral equivalent of the curl situation. At any rate, you didn't address my original point, which is _all_ of those options have drawbacks. I think the drawbacks of re-writing or re-packaging a regular expression library outweigh those of adding the dependency (or even having slightly irregular behavior). -Peff -
Hi, So, once pcre is used, you can use these constructs. Even in scripts. Which just so happen to break on platforms where git is not compiled with pcre support. Or do you suggest checking (in git!) if the pattern is a pcre special or This is only because you do not really have problems with dependencies. You just install, or compile, the dependent thing, which happens to be no hassle, since you use Linux. And you can compile & install things. Once everybody runs Linux, and is allowed to compile & install things, I will no longer complain about trillions of dependencies. Ciao, Dscho -
pcre is covered by the BSD license. Can we ship it with git, like we ship libxdiff? I want to say Apache ships with pcre, but they use the Apache License so it might be easier for them to do so. -- Shawn. -
If you do this, please do not forget to add a way to use the system copy of libpcre instead of the bundled version.
Hi, If we bundle it like we do with libxdiff, I do not have any objections. It would also help MinGW. Ciao, Dscho -
[Cc: git@vger.kernel.org] You can use "git log --grep=<pattern>" for that, instead. This greps raw commit message. You can use --author and --comitter to grep those headers. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git -
What about
Maybe:
git log | awk -v sob="Joe User" '$1 == "commit" {commit = $2} /Signed-off-by:/ {if (match($0, sob)) print commit}'
Best regards
Uwe
--
Uwe Kleine-KHi, On Wed, 7 Feb 2007, Uwe Kleine-K
| Linus Torvalds | Linux 2.6.27-rc5 |
| Linus Torvalds | Linux 2.6.27-rc8 |
| Tomasz Kłoczko | Is it time for remove (crap) ALSA from kernel tree ? |
| Linus Torvalds | Linux v2.6.27-rc1 |
git: | |
| Pierre Habouzit | git push (mis ?)behavior |
| Linus Torvalds | Re: Something is broken in repack |
| Michael J Gruber | gitignore: negating path patterns |
| Steven Grimm | StGIT vs. guilt: What's the difference? |
| Markus Wernig | host to host ipsec link |
| Richard Stallman | Re: Real men don't attack straw men |
| Kevin Neff | Patching a SSH 'Weakness' |
| Jeffrey 'jf' Lim | "VIA Announces Strategic Open Source Driver Development Initiative" |
| Denys Fedoryshchenko | panic 2.6.27-rc3-git2, qdisc_dequeue_head |
| Evgeniy Polyakov | [resend take 2 4/4] DST Makefile/Kconfig files. |
| Volker Armin Hemmann | build error with 2.6.27.6+reiser4+ehci-hub patch. ERROR: "mii_ethtool_gset" [drive... |
| Arkadiusz Miskiewicz | htb and UDP packages bigger than 1500 |
| USB statistics | 1 hour ago | Linux kernel |
| Block Sub System query | 5 hours ago | Linux kernel |
| kernel module to intercept socket creation | 6 hours ago | Linux kernel |
| Image size changing during each build | 6 hours ago | Linux kernel |
| Soft lock bug | 11 hours ago | Linux kernel |
| sysctl - dynamic registration problem | 18 hours ago | Linux kernel |
| Question on swap as ramdisk partition | 20 hours ago | Linux kernel |
| serial driver xmit problem | 1 day ago | Linux kernel |
| Generic Netlink subsytem | 1 day ago | Linux kernel |
| 'Report spam filter error' page broken | 1 day ago | KernelTrap Suggestions and Feedback |
