Signed-off-by: Ryan Anderson <ryan@michonline.com>
---
I think this version is mostly ready to go.
Junio, the post you pointed me at was very helpful (once I got around to
listening to it), but the code it links to is missing - if that's a
better partial implementation than this, can you ressurrect it
somewhere? I'd be happy to reintegrate it together.
Makefile | 1
git-annotate.perl | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 292 insertions(+), 0 deletions(-)
create mode 100755 git-annotate.perl
86fa163e7fd1bee2929b7946456407dbc7745193
diff --git a/Makefile b/Makefile
index 5c32934..8d24660 100644
--- a/Makefile
+++ b/Makefile
@@ -117,6 +117,7 @@ SCRIPT_SH = \
SCRIPT_PERL = \
git-archimport.perl git-cvsimport.perl git-relink.perl \
git-shortlog.perl git-fmt-merge-msg.perl git-rerere.perl \
+ git-annotate.perl \
git-svnimport.perl git-mv.perl git-cvsexportcommit.perl
SCRIPT_PYTHON = \
diff --git a/git-annotate.perl b/git-annotate.perl
new file mode 100755
index 0000000..a3ea201
--- /dev/null
+++ b/git-annotate.perl
@@ -0,0 +1,291 @@
+#!/usr/bin/perl
+# Copyright 2006, Ryan Anderson <ryan@michonline.com>
+#
+# GPL v2 (See COPYING)
+#
+# This file is licensed under the GPL v2, or a later version
+# at the discretion of Linus Torvalds.
+
+use warnings;
+use strict;
+
+use Data::Dumper;
+
+my $filename = shift @ARGV;
+
+
+my @stack = (
+ {
+ 'rev' => "HEAD",
+ 'filename' => $filename,
+ },
+);
+
+our (@lineoffsets, @pendinglineoffsets);
+our @filelines = ();
+open(F,"<",$filename)
+ or die "Failed to open filename: $!";
+
+while(<F>) {
+ chomp;
+ push @filelines, $_;
+}
+close(F);
+our $leftover_lines = @filelines;
+our %revs;
+our @revqueue;
+our $head;
+
+my $revsprocessed = 0;
+while (my $bound = pop @stack) {
+ my @revisions = git_rev_list($bound->{'rev'}, $bound->{'filename'});
+ foreach my $revinst (@revisions) {
+ my ($rev, @parents) = @$revinst;
+ $head ||= ...Does it depends on some ealier patch? I get this: git]$ git-annotate diff-delta.c Undefined subroutine &main::all_lines_claimed called at /home/peter/bin/git-annotate line 124. The patch was applied to: git version 1.1.6.gd19e-dirty. Peter -
Hi,
Just add a function like
-- snip --
sub all_lines_claimed {
return ($leftover_lines == 0);
}
-- snap --
and you're done.
However, it does not yet do the correct thing: it does not show the root
commit. For example, if you do "git annotate git-am.sh" it should show
"d1c5f2a4" for the first lines, not "a1451104" as it does.
Ciao,
Dscho
-
another perl script :(
Thanks
--
Franck
-
Hi, Yes. Do not try to introduce unnecessary dependencies. But if it is the right tool to do the job, you should use it. As of now, we have perl, python and Tcl/Tk. Hth, Dscho -
Very well said. That's what currently stands. -
The dependency on Python 2.4 already is a problem for installation on some systems ... Ralf -
Not many though. Since Python is only required on the workstation where the developer does his/her work it's not a very cumbersome requirement. The same holds for Perl, btw. It's not a requirement on the server hosting the public repositories, unless some of the scripts are used from the hooks (git shortlog is used from the default update-hook, but that can be changed with no trouble at all). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
I understand that in the environments where the Python dependency is a problem it is probably not due to the specific version. However, if WITH_OWN_SUBPROCESS is defined in the Makefile then Python 2.3 should work fine too (this is actually automatically detected now, so you shouldn't have to do anything special to use Python 2.3). - Fredrik -
>>>>> "Franck" == Franck Bui-Huu <vagabon.xyz@gmail.com> writes: Franck> another perl script :( Franck> Are there any rules on the choice of the script language ? I could argue that they should all be Perl. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! -
Brave thing to do among such a bunch of hardcore C hackers. ;) So long as we never involve ruby, java or DCL, I'm a happy fellow. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
I agree to but my point was more why not only using python scripts ?
Why sometimes some scripts is written in perl whereas python could be
used and vice-versa ?
Thanks
--
Franck
-
Perl is better suited for some tasks, Python for others. Mostly it's because the contributor (one out of 137 to date) thought the language appropriate for the tool he/she set out to write and felt comfortable with it. I personally abhor the syntax of Perl and the block indentation of Python but I happily embrace both if the alternative is to rewrite all the script tools in C. That said, some tools have been rewritten in the past (mostly scripts have been replaced by C code versions), but I don't think Junio will accept replacement tools just because they're in one particular language. If anything, it would be to replace the two python scripts with Perl versions, since more tools are implemented in Perl than in Python (so we could drop one dependency), Perl exists on more platforms (so git becomes more portable), and Perl is used inline in four of the shell-scripts (which means we can't get rid of the Perl dependency without major hackery anyway). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Hmm.. I get [torvalds@g5 git]$ ./git-annotate Makefile fatal: 'e83c5163316f89bfbde7d9ab23ca2e25604af290^1..e83c5163316f89bfbde7d9ab23ca2e25604af290': No such file or directory Undefined subroutine &main::all_lines_claimed called at ./git-annotate line 124. where that fatal error is because e83c51.. doesn't _have_ a parent, it's the root (so doing ^1 on it doesn't work). After fixing the "all_lines_claimed" problem as outlined by Dscho, I get a lot of Skipping diff-parse - i = filelines) and no actual output. Doing it on a file that didn't exist in the root commit still have those "Skipping" messages, but at least it did actually output something. However, what it output was clearly not correct, so there's still some tweaking to do. For example, doing ./git-annotate apply.c annotates most of that file to Junio's commit 1c15afb9, which is totally incorrect, that commit actually only changed a few lines. So it looks like there's still some work to be done on this.. Linus -
I still have it, but the reason why I withdrew circulating it
was because I found that on some inputs it did not work
correctly as intended. Not that the algorithm was necessarily
broken but the implementation certainly was.
Unlike yours mine reads and interprets diff output to find which
lines are common and which lines are added, and I think the diff
interpretation logic has various corner cases wrong. I did
combine-diff.c diff interpreter without looking at my
'git-blame', so I do not remember where I got it wrong,
though...
It's been a while since I looked at it the last time so it may
not even work with the current git, but here it is..
--
#!/usr/bin/perl -w
use strict;
package main;
$::debug = 0;
sub read_blob {
my $sha1 = shift;
my $fh = undef;
my $result;
local ($/) = undef;
open $fh, '-|', 'git-cat-file', 'blob', $sha1
or die "cannot read blob $sha1";
$result = join('', <$fh>);
close $fh
or die "failure while closing pipe to git-cat-file";
return $result;
}
sub read_diff_raw {
my ($parent, $filename) = @_;
my $fh = undef;
local ($/) = "\0";
my @result = ();
my ($meta, $status, $sha1_1, $sha1_2, $file1, $file2);
print STDERR "* diff-index --cached $parent $filename\n" if $::debug;
my $has_changes = 0;
open $fh, '-|', 'git-diff-index', '--cached', '-z', $parent, $filename
or die "cannot read git-diff-index $parent $filename";
while (defined ($meta = <$fh>)) {
$has_changes = 1;
}
close $fh
or die "failure while closing pipe to git-diff-index";
if (!$has_changes) {
return ();
}
$fh = undef;
print STDERR "* diff-index -B -C --find-copies-harder --cached $parent\n" if $::debug;
open($fh, '-|', 'git-diff-index', '-B', '-C', '--find-copies-harder',
'--cached', '-z', $parent)
or die "cannot read git-diff-index with $parent";
while (defined ($meta = <$fh>)) {
chomp($meta);
(undef, undef, $sha1_1, $sha1_2, $status) = split(/ ...I tried that approach at first, and it was much much more confusing to try to keep track of. The problem Linus found (that of a missing "all_lines_claimed()") was related to that code. This implementation is simple, though it has to have some problems with guessing at duplicated I'll take a look through this in greater detail later, hopefully your approach can be applied. Diff-analyzing is apparently tricky. -- Ryan Anderson sometimes Pug Majere -
Reading diff is tricky but I was lazy to match up the lines by
hand, which is also a real work ;-).
There are a few things I should add to that ancient code:
- It wants old ls-tree behaviour. The command line used in the
"sub find_file" needs to be updated to something like this:
open $fh, '-|', 'git-ls-tree', '-z', '-r', $commit->{TREE}, $path
or die "cannot read git-ls-tree $commit->{TREE}";
- It only cares about the line numbers and its output is meant
to be postprocessed with the contents from the latest blob.
- It predates the recent rev-list that skips commits that do
not change the specified paths, and it literally follows each
parent and optimizes not to diff with uninteresting parents
by hand.
I suspect if you go with the diff-reading approach, it might be
easy to convert it to C (or even write the initial version in C)
using the machinery similar to what is in combine-diff.c.
The algorithm combine-diff.c uses keeps the lines discarded from
each parent in lline structure linked to the sline structure
(which keeps track of the lines in the final version), but for
your annotate purposes what you care about is only what the
child adds to the parent (IOW, we do not care about the lines
that do not appear in the final version), so the logic and the
data structure could be greatly simplified. You only need to
keep "flag" element in the sline structure, and maybe bol and
len that point at the contents of the resulting line from the
final version. In addition, you would need to store "the
current suspect commit" (starts from the final revision and
updated as you pass the blame along) and another bool that says
if "the current suspect" is known to be the guilty party or if
the true culprit is one of its ancestors (capital vs lowercase
difference in that explanatory note).
-
Reading a diff is tricky, yes, but if you're willing to just throw RAM
at the problem, it might not be quite as bad as I was trying at first.
My current thought on how to get it more correct is this:
foreach $rev (@revqueue) {
foreach $parent (@{$revs{$rev}{parents}}) {
my @templines = @{$revs{$rev}{lines}};
$revs{$parent}{lines} = apply_diff(\@templlines);
}
}
The @lines arrays that get built will be entirely made up of hash or
array references, so they just get reused for each successive file.
When apply_diff() deletes a line from the new copy, it should mark that
line as "claimed" by the current rev.
I'm thinking that each element of @lines will look like this:
{
text => $text,
in_original => [0 | 1],
claimed_by => $rev,
}
at least to start.
This method can sanity check itself by calling git cat-file and actually
reading in each version of the file, and comparing it against the
generated copy, aborting if we get the two out of sync.
I'll see about implementing something along these lines this weekend,
time permitting.
--
Ryan Anderson
sometimes Pug Majere
-
