Re: whomto.pl -- finding out whom to send patches to

Previous thread: none

Next thread: [GIT PATCH] USB fixes for 2.6.26-rc4 by Greg KH on Thursday, May 29, 2008 - 2:21 pm. (11 messages)
From: Vegard Nossum
Date: Thursday, May 29, 2008 - 2:00 pm

Hi,

I've written this perl script that takes a patch as input and prints the
authors/committers of the affected lines, using git-blame as the back end.

(The purpose of this is of course to find out whom to send patches to.)

There are some caveats:

- If I've understood correctly, git-blame incremental output doesn't split
  commits when a newer one is found, so we currently possibly take into
  account more than just the last patch to touch a line. This might not be
  a disadvantage, however...

- The patch must apply to the current working tree. I suppose there is
  some way to use the index information in the patch to determine what to
  run git-blame against, but this is currently beyond my git knowledge.

- It's a bit slow, particularly for large files. But doing the same thing
  by hand would be slower, so I suppose it's an overall improvement.

Running this on a random -mm patch, for example
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc2/2.6.26-rc2-m...
gives the following output:

  $ perl whomto2.pl acpi-fix-fadt-parsing.patch
  Running git-blame on drivers/acpi/tables/tbfadt.c...

  To: (Committers)
      48 Len Brown <len.brown@intel.com>
  Cc: (Authors)
      44 Bob Moore <robert.moore@intel.com>
       2 Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
       2 Len Brown <len.brown@intel.com>

Maybe this tool can be useful? :-)

(Improvements are of course also welcome.)


Vegard


#! /usr/bin/perl

use strict;
use warnings;

for my $file (@ARGV) {
	check($file);
}

sub git_apply {
	my $filename = shift;

	my @args = (
		'git-apply',
		'--check',
		$filename,
	);

	open(my $fh, '-|', @args) || die $!;
	my @b = <$fh>;
	close $fh;

	return $? ? undef : 1;
}

sub git_blame {
	my $filename = shift;

	my @args = (
		'git-blame',
		'--incremental',
		'--',
		$filename,
	);

	open(my $fh, '-|', @args) || die $!;
	chomp(my @b = <$fh>);
	close ...
From: Joe Perches
Date: Thursday, May 29, 2008 - 11:20 am

Nice enough script.
It's unfortunate that it can't output the appropriate mailing lists.

I think the shell script that Linus gave awhile ago:
http://lkml.org/lkml/2007/8/14/276

        #!/bin/sh
               git log --since=6.months.ago -- "$@" |
                       grep -i '^    [-a-z]*by:.*@' |
                       sort | uniq -c |
                       sort -r -n | head
        
         (Maybe you want to add a
               grep -v '\(Linus Torvalds\)\|\(Andrew Morton\)'
        
might work just as well.

I still prefer the file pattern match in MAINTAINERS, or
another external file, and/or data stored directly into GIT
via gitattributes approaches.

This script can give maintainer, mailing lists, and git
contact information for patches or files.
http://lkml.org/lkml/2007/8/20/352
The script works with git-send-email to cc the appropriate parties.

This script and git repository is very old and probably doesn't apply...
git pull git://repo.or.cz/linux-2.6/trivial-mods.git get_maintainer



--

From: Jesper Juhl
Date: Thursday, May 29, 2008 - 3:19 pm

<snip>

The script is nice, but I'd wish it looked at a few other things as well.

When I personally need to determine who to send patches to I do use
'git blame' for some of the addresses, but in addition to that I also
check;

- The comments at the top of the file.  Sometimes there are email
addresses there for relevant people (sometimes just names, but
addresses can then usually be found for those people in CREDITS or
MAINTAINERS).

- Entries in MAINTAINERS that are relevant to the subsystem and/or
file I'm modifying.

- Entries in CREDITS that look relevant to the subsystem and/or file
I'm modifying.

- Names/email addresses in files in Documentation/ that are relevant
to the subsystem/file I'm modifying.

If the script could be made to check all (or just some) of those
sources as well it would be really great.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
--

From: Junio C Hamano
Date: Thursday, May 29, 2008 - 4:33 pm

"git blame" does not give irrelevant commits in its output, with or
without --incremental.  Perhaps you were thinking about the "oops, earlier
one was wrong, here are the corrections" behaviour of "git log
--early-output", which is an unrelated mechanism in a different command.

But I have to wonder why you used --incremental and not --porcelain
format, the latter of which is more compact and is designed for parsing by
tools.

I also have to wonder why you did not use -M, -C, and/or -w, if you used
blame to find the true origin of lines that are involved.

Unless the patch is truly about a narrow region of a handful files
(e.g. micro-optimizing the implementation of a single function without
changing its external interface at all, or fixing an off-by-one error in a
group of functions that do similar things), I suspect that it would make
more sense to use "git shortlog --no-merges -- paths" to get the list of
people who are involved in the general area, even though they may not have
been involved in particular _lines_ that the patch touches.  For example,
if a patch changes the lines in a function's implementation, you would
want input not only from the people who improved the implementation of the
function over the years, but more from the people who care about the
calling sites of that function the patch is touching.

--

From: Vegard Nossum
Date: Saturday, May 31, 2008 - 3:33 am

Hi,


This comment was based on my observation that several (sometimes
different) commits would span the same line numbers. Though it seems
to also happen that no line is spanned by any commit at all. I

There were some different reasons. I found --incremental easier to
parse, and I didn't really want the actual lines of the file. Maybe I

I haven't used these options before and didn't know if it would really
make sense to use them in this context. I guess I could allow them to

Yes, it seems that log/shortlog is the most common (and probably
faster) way of doing what I'm trying to do, based on all the feedback
I had :-) However, I use git-blame myself and so I wanted to automate
that task in particular.

I did not intend for the tool to be fully automatic, however; it
outputs a ranked list of names and e-mails. The user (well, me ;-)) is
still expected to pick the sensible entries and leave out the rest.
For instance, I bet half the patches run through the script on the
linux kernel sources would turn Linus up, even though you don't want
to send patches directly there in most of the cases. And this is
simply because he did the initial commit and a lot of code may not
have changed since that...

Thanks for the comments.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Andrea Righi
Date: Friday, May 30, 2008 - 12:58 am

Minor fix: do not git-blame /dev/null in patches that add new files.

-Andrea

diff -urpN linux/whomto.orig.pl linux/whomto.pl
--- linux/whomto.orig.pl	2008-05-30 09:43:08.000000000 +0200
+++ linux/whomto.pl	2008-05-30 09:49:26.000000000 +0200
@@ -101,6 +101,7 @@ sub parse_patch {
 
 	for (@p) {
 		if (m/^--- .*?\/([^\s~]*)/) {
+			next if ($1 eq 'dev/null');
 			$file = $files{$1} = {
 				chunks => [],
 			};
--

From: Vegard Nossum
Date: Saturday, May 31, 2008 - 3:03 am

I missed that, thanks :-)

(Other diff programs may also use other paths for new files, so I'm
also adding an -f check.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Roel
Date: Friday, May 30, 2008 - 2:29 am

Based on Linus' script to get the email address of a maintainer, I wrote 
this bash script to get
an indication of relevant lists. Maybe you can make use of the part that 
parses the
MAINTAINERS file for relevant lists?

---

git log --since="1 year ago"  "$@" | sed -n "s/^    .[-a-z]*by: \(.*\) <.*$/\1/p" |
sort | uniq | sort -n -r | while read -r name; do
        sed -n "/^P:[ \t]*.*$name/,/^$/{
                s/^L:[ \t]*\(.*\)$/\1/p
        }" ./MAINTAINERS
done | sort | uniq -c | sort -n -r | while read -r nr list; do
        tot=`grep -c "^L:\W*.*$list.*" ./MAINTAINERS`
        echo "`expr $nr / \( $tot + 1 \)` $nr $tot $list"
done | sort -r | cut -d " " -f2- | while read -r nr tot list; do
        echo -e "$nr/$tot Acks were commited by maintainers of list $list"
done


--

Previous thread: none

Next thread: [GIT PATCH] USB fixes for 2.6.26-rc4 by Greg KH on Thursday, May 29, 2008 - 2:21 pm. (11 messages)