Re: Git's database structure

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Jon Smirl <jonsmirl@...>, Julian Phillips <julian@...>, Andreas Ericsson <ae@...>, Theodore Tso <tytso@...>, Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Thursday, September 6, 2007 - 2:14 pm

Johannes Schindelin wrote:

And in fact, you can do this today, without modifying git-blame at all, 
by (ab)using its "-S" option (which lets you specify a custom ancestry 
chain to search). By coincidence, I was just showing some people at my 
office how to do this yesterday. I'll cut-and-paste from the email I 
sent them. I am not claiming this is nearly as desirable as a built-in, 
auto-updated secondary index, but it proves the concept, anyway.

Fast-to-generate version:

git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0; 
last=$0;}' > /tmp/revlist

This speeds things up a lot, because git blame doesn't have to examine 
other revisions:

time git blame main.c
   1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
   0.21s user 0.03s system 96% cpu 0.249 total

The bad news is that generating that revision list is a bit slow, and if 
you do it the naive way I suggested above, you can't use the rev list 
with the -M option (to follow renames). The good news is that it's 
possible to have that too if you generate a list of revisions that 
includes the renames:

# Generate a list of all revisions in the right order (only need to do 
this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following 
copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
   egrep '^[0-9a-f]{40}' | \
   cut -d' ' -f1 | \
   fgrep -f - /tmp/all-revs | \
   awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist

Then -M is fast too:

time git blame -M main.c
   1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
   0.29s user 0.03s system 93% cpu 0.341 total

Oddly, if you use the -S option, "git blame -C" actually gets 
significantly *slower*. I am not sure why.

-Steve
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Git's database structure, Jon Smirl, (Tue Sep 4, 11:23 am)
Re: Git's database structure, Julian Phillips, (Tue Sep 4, 1:19 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 1:30 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 2:51 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 12:28 pm)
Re: Git's database structure, Junio C Hamano, (Tue Sep 4, 1:25 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 1:44 pm)
Re: Git's database structure, Theodore Tso, (Tue Sep 4, 5:25 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 5:54 pm)
Re: Git's database structure, Andreas Ericsson, (Wed Sep 5, 3:18 am)
Re: Git's database structure, Jon Smirl, (Wed Sep 5, 9:41 am)
Re: Git's database structure, Andy Parkins, (Wed Sep 5, 3:52 pm)
Re: Git's database structure, Andreas Ericsson, (Wed Sep 5, 10:51 am)
Re: Git's database structure, Jon Smirl, (Wed Sep 5, 11:37 am)
Re: Git's database structure, Julian Phillips, (Wed Sep 5, 11:54 am)
Re: Git's database structure, Jon Smirl, (Wed Sep 5, 12:12 pm)
Re: Git's database structure, Martin Langhoff, (Thu Sep 6, 8:33 pm)
Re: Git's database structure, Johannes Schindelin, (Thu Sep 6, 8:56 am)
Re: Git's database structure, Steven Grimm, (Thu Sep 6, 2:14 pm)
Re: Git's database structure, Andreas Ericsson, (Thu Sep 6, 4:49 am)
Re: Git's database structure, Junio C Hamano, (Thu Sep 6, 5:09 am)
Re: Git's database structure, Wincent Colaiuta, (Thu Sep 6, 7:03 am)
Re: Git's database structure, Mike Hommey, (Wed Sep 5, 1:39 pm)
Re: Git's database structure, Julian Phillips, (Wed Sep 5, 1:31 pm)
Re: Git's database structure, Kyle Moffett, (Wed Sep 5, 9:27 pm)
Re: Git's database structure, Junio C Hamano, (Tue Sep 4, 2:06 pm)
Re: Git's database structure, Mike Hommey, (Tue Sep 4, 2:04 pm)
Re: Git's database structure, Reece Dunn, (Tue Sep 4, 3:44 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 12:31 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 12:47 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 12:51 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 11:55 am)
Re: Git's database structure, Junio C Hamano, (Tue Sep 4, 1:21 pm)
Re: Git's database structure, Jon Smirl, (Tue Sep 4, 12:19 pm)
Re: Git's database structure, David Tweed, (Tue Sep 4, 4:17 pm)
Re: Git's database structure, Jeff King, (Tue Sep 4, 1:09 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 12:29 pm)
Re: Git's database structure, Mike Hommey, (Tue Sep 4, 12:07 pm)
Re: Git's database structure, Andreas Ericsson, (Tue Sep 4, 12:10 pm)