Hi, I've written this perl script that takes a patch as input and prints the authors/committers of the affected lines, using git-blame as the back end. (The purpose of this is of course to find out whom to send patches to.) There are some caveats: - If I've understood correctly, git-blame incremental output doesn't split commits when a newer one is found, so we currently possibly take into account more than just the last patch to touch a line. This might not be a disadvantage, however... - The patch must apply to the current working tree. I suppose there is some way to use the index information in the patch to determine what to run git-blame against, but this is currently beyond my git knowledge. - It's a bit slow, particularly for large files. But doing the same thing by hand would be slower, so I suppose it's an overall improvement. Running this on a random -mm patch, for example http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc2/2.6.26-rc2-m... gives the following output: $ perl whomto2.pl acpi-fix-fadt-parsing.patch Running git-blame on drivers/acpi/tables/tbfadt.c... To: (Committers) 48 Len Brown <len.brown@intel.com> Cc: (Authors) 44 Bob Moore <robert.moore@intel.com> 2 Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> 2 Len Brown <len.brown@intel.com> Maybe this tool can be useful? :-) (Improvements are of course also welcome.) Vegard #! /usr/bin/perl use strict; use warnings; for my $file (@ARGV) { check($file); } sub git_apply { my $filename = shift; my @args = ( 'git-apply', '--check', $filename, ); open(my $fh, '-|', @args) || die $!; my @b = <$fh>; close $fh; return $? ? undef : 1; } sub git_blame { my $filename = shift; my @args = ( 'git-blame', '--incremental', '--', $filename, ); open(my $fh, '-|', @args) || die $!; chomp(my @b = <$fh>); close ...
Nice enough script. It's unfortunate that it can't output the appropriate mailing lists. I think the shell script that Linus gave awhile ago: http://lkml.org/lkml/2007/8/14/276 #!/bin/sh git log --since=6.months.ago -- "$@" | grep -i '^ [-a-z]*by:.*@' | sort | uniq -c | sort -r -n | head (Maybe you want to add a grep -v '\(Linus Torvalds\)\|\(Andrew Morton\)' might work just as well. I still prefer the file pattern match in MAINTAINERS, or another external file, and/or data stored directly into GIT via gitattributes approaches. This script can give maintainer, mailing lists, and git contact information for patches or files. http://lkml.org/lkml/2007/8/20/352 The script works with git-send-email to cc the appropriate parties. This script and git repository is very old and probably doesn't apply... git pull git://repo.or.cz/linux-2.6/trivial-mods.git get_maintainer --
<snip> The script is nice, but I'd wish it looked at a few other things as well. When I personally need to determine who to send patches to I do use 'git blame' for some of the addresses, but in addition to that I also check; - The comments at the top of the file. Sometimes there are email addresses there for relevant people (sometimes just names, but addresses can then usually be found for those people in CREDITS or MAINTAINERS). - Entries in MAINTAINERS that are relevant to the subsystem and/or file I'm modifying. - Entries in CREDITS that look relevant to the subsystem and/or file I'm modifying. - Names/email addresses in files in Documentation/ that are relevant to the subsystem/file I'm modifying. If the script could be made to check all (or just some) of those sources as well it would be really great. -- Jesper Juhl <jesper.juhl@gmail.com> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html --
"git blame" does not give irrelevant commits in its output, with or without --incremental. Perhaps you were thinking about the "oops, earlier one was wrong, here are the corrections" behaviour of "git log --early-output", which is an unrelated mechanism in a different command. But I have to wonder why you used --incremental and not --porcelain format, the latter of which is more compact and is designed for parsing by tools. I also have to wonder why you did not use -M, -C, and/or -w, if you used blame to find the true origin of lines that are involved. Unless the patch is truly about a narrow region of a handful files (e.g. micro-optimizing the implementation of a single function without changing its external interface at all, or fixing an off-by-one error in a group of functions that do similar things), I suspect that it would make more sense to use "git shortlog --no-merges -- paths" to get the list of people who are involved in the general area, even though they may not have been involved in particular _lines_ that the patch touches. For example, if a patch changes the lines in a function's implementation, you would want input not only from the people who improved the implementation of the function over the years, but more from the people who care about the calling sites of that function the patch is touching. --
Hi, This comment was based on my observation that several (sometimes different) commits would span the same line numbers. Though it seems to also happen that no line is spanned by any commit at all. I There were some different reasons. I found --incremental easier to parse, and I didn't really want the actual lines of the file. Maybe I I haven't used these options before and didn't know if it would really make sense to use them in this context. I guess I could allow them to Yes, it seems that log/shortlog is the most common (and probably faster) way of doing what I'm trying to do, based on all the feedback I had :-) However, I use git-blame myself and so I wanted to automate that task in particular. I did not intend for the tool to be fully automatic, however; it outputs a ranked list of names and e-mails. The user (well, me ;-)) is still expected to pick the sensible entries and leave out the rest. For instance, I bet half the patches run through the script on the linux kernel sources would turn Linus up, even though you don't want to send patches directly there in most of the cases. And this is simply because he did the initial commit and a lot of code may not have changed since that... Thanks for the comments. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Minor fix: do not git-blame /dev/null in patches that add new files.
-Andrea
diff -urpN linux/whomto.orig.pl linux/whomto.pl
--- linux/whomto.orig.pl 2008-05-30 09:43:08.000000000 +0200
+++ linux/whomto.pl 2008-05-30 09:49:26.000000000 +0200
@@ -101,6 +101,7 @@ sub parse_patch {
for (@p) {
if (m/^--- .*?\/([^\s~]*)/) {
+ next if ($1 eq 'dev/null');
$file = $files{$1} = {
chunks => [],
};
--
I missed that, thanks :-) (Other diff programs may also use other paths for new files, so I'm also adding an -f check.) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Based on Linus' script to get the email address of a maintainer, I wrote
this bash script to get
an indication of relevant lists. Maybe you can make use of the part that
parses the
MAINTAINERS file for relevant lists?
---
git log --since="1 year ago" "$@" | sed -n "s/^ .[-a-z]*by: \(.*\) <.*$/\1/p" |
sort | uniq | sort -n -r | while read -r name; do
sed -n "/^P:[ \t]*.*$name/,/^$/{
s/^L:[ \t]*\(.*\)$/\1/p
}" ./MAINTAINERS
done | sort | uniq -c | sort -n -r | while read -r nr list; do
tot=`grep -c "^L:\W*.*$list.*" ./MAINTAINERS`
echo "`expr $nr / \( $tot + 1 \)` $nr $tot $list"
done | sort -r | cut -d " " -f2- | while read -r nr tot list; do
echo -e "$nr/$tot Acks were commited by maintainers of list $list"
done
--
