Re: Git and GCC

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Thursday, December 6, 2007 - 12:39 pm

On Thu, 6 Dec 2007, Jon Loeliger wrote:

Well, in a very real sense, what the delta code does is:
 - just list every single object in the whole repository
 - walk over each object, trying to find another object that it can be 
   written as a delta against
 - write out the result as a pack-file

That's simplified: we may not walk _all_ objects, for example: only a 
global repack does that (and most pack creations are actually for pushign 
and pulling between two repositories, so we only walk the objects that are 
in the source but not the destination repository).

The interesting phase is the "walk each object, try to find a delta" part. 
In particular, you don't want to try to find a delta by comparing each 
object to every other object out there (that would be O(n^2) in objects, 
and with a fairly high constant cost too!). So what it does is to sort the 
objects by a few heuristics (type of object, base name that object was 
found as when traversing a tree and size, and how recently it was found in 
the history).

And then over that sorted list, it tries to find deltas between entries 
that are "close" to each other (and that's where the "--window=xyz" thing 
comes in - it says how big the window is for objects being close. A 
smaller window generates somewhat less good deltas, but takes a lot less 
effort to generate).

The source is in git/builtin-pack-objects.c, with the core of it being

 - try_delta() - try to generate a *single* delta when given an object 
   pair.

 - find_deltas() - do the actual list traversal

 - prepare_pack() and type_size_sort() - create the delta sort list from 
   the list of objects.

but that whole file is probably some of the more opaque parts of git.


It's certainly not a simple chain, it's more of a set of acyclic directed 
graphs in the object list. And yes, it's weigted by the size of the delta 
between objects, and the optimization problem is kind of akin to finding 
the smallest spanning tree (well, forest - since you do *not* want to 
create one large graph, you also want to make the individual trees shallow 
enough that you don't have excessive delta depth).

There are good algorithms for finding minimum spanning trees, but this one 
is complicated by the fact that the biggest cost (by far!) is the 
calculation of the weights itself. So rather than really worry about 
finding the minimal tree/forest, the code needs to worry about not having 
to even calculate all the weights!

(That, btw, is a common theme. A lot of git is about traversing graphs, 
like the revision graph. And most of the trivial graph problems all assume 
that you have the whole graph, but since the "whole graph" is the whole 
history of the repository, those algorithms are totally worthless, since 
they are fundamentally much too expensive - if we have to generate the 
whole history, we're already screwed for a big project. So things like 
revision graph calculation, the main performance issue is to avoid having 
to even *look* at parts of the graph that we don't need to see!)

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Git and GCC, David Miller, (Wed Dec 5, 7:52 pm)
Re: Git and GCC, Daniel Berlin, (Wed Dec 5, 8:47 pm)
Re: Git and GCC, David Miller, (Wed Dec 5, 9:20 pm)
Re: Git and GCC, Harvey Harrison, (Wed Dec 5, 9:25 pm)
Re: Git and GCC, Harvey Harrison, (Wed Dec 5, 9:28 pm)
Re: Git and GCC, Daniel Berlin, (Wed Dec 5, 9:32 pm)
Re: Git and GCC, David Miller, (Wed Dec 5, 9:48 pm)
Re: Git and GCC, Linus Torvalds, (Wed Dec 5, 9:54 pm)
Re: Git and GCC, Harvey Harrison, (Wed Dec 5, 10:04 pm)
Re: Git and GCC, Daniel Berlin, (Wed Dec 5, 10:11 pm)
Re: Git and GCC, Harvey Harrison, (Wed Dec 5, 10:15 pm)
Re: Git and GCC, Daniel Berlin, (Wed Dec 5, 10:17 pm)
Re: Git and GCC, Linus Torvalds, (Wed Dec 5, 11:09 pm)
Re: Git and GCC, Jon Smirl, (Wed Dec 5, 11:47 pm)
Re: Git and GCC, Jeff King, (Thu Dec 6, 12:15 am)
Re: Git and GCC, Harvey Harrison, (Thu Dec 6, 12:49 am)
Re: Git and GCC, David Brown, (Thu Dec 6, 1:11 am)
Re: Git and GCC, Johannes Schindelin, (Thu Dec 6, 4:57 am)
[PATCH] gc --aggressive: make it really aggressive, Johannes Schindelin, (Thu Dec 6, 5:03 am)
Re: Git and GCC, Ismail , (Thu Dec 6, 5:04 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Theodore Tso, (Thu Dec 6, 6:42 am)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 7:01 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Nicolas Pitre, (Thu Dec 6, 7:15 am)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 7:18 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Pierre Habouzit, (Thu Dec 6, 7:22 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Harvey Harrison, (Thu Dec 6, 8:30 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Johannes Schindelin, (Thu Dec 6, 8:55 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Johannes Schindelin, (Thu Dec 6, 8:56 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Linus Torvalds, (Thu Dec 6, 9:19 am)
Re: [PATCH] gc --aggressive: make it really aggressive, David Kastrup, (Thu Dec 6, 10:05 am)
Re: Git and GCC, Jeff King, (Thu Dec 6, 10:39 am)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 11:02 am)
Re: Git and GCC, Daniel Berlin, (Thu Dec 6, 11:04 am)
Re: Git and GCC, NightStrike, (Thu Dec 6, 11:24 am)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 11:29 am)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 11:35 am)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 11:45 am)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 11:55 am)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 12:08 pm)
Re: Git and GCC, Jon Loeliger, (Thu Dec 6, 12:12 pm)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 12:39 pm)
Re: Git and GCC, Junio C Hamano, (Thu Dec 6, 1:04 pm)
Re: Git and GCC, Junio C Hamano, (Thu Dec 6, 2:02 pm)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 2:39 pm)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 3:08 pm)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 3:11 pm)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 3:22 pm)
Re: Git and GCC, David Kastrup, (Thu Dec 6, 3:26 pm)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 3:30 pm)
[OT] Re: Git and GCC, Randy Dunlap, (Thu Dec 6, 3:38 pm)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 3:44 pm)
Re: Git and GCC, Jakub Narebski, (Thu Dec 6, 5:29 pm)
Re: Git and GCC, Harvey Harrison, (Thu Dec 6, 7:42 pm)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 8:01 pm)
Re: Git and GCC, David Miller, (Thu Dec 6, 8:31 pm)
Re: Git and GCC, Jon Smirl, (Thu Dec 6, 9:06 pm)
Re: Git and GCC, Nicolas Pitre, (Thu Dec 6, 9:21 pm)
Re: Git and GCC, Linus Torvalds, (Thu Dec 6, 10:21 pm)
Re: Git and GCC, NightStrike, (Thu Dec 6, 10:36 pm)
Re: Git and GCC, Jeff King, (Thu Dec 6, 11:38 pm)
Re: Git and GCC, Jeff King, (Thu Dec 6, 11:50 pm)
Re: Git and GCC, Jon Smirl, (Fri Dec 7, 12:08 am)
Re: Git and GCC, Jon Smirl, (Fri Dec 7, 12:10 am)
Re: Git and GCC, Jeff King, (Fri Dec 7, 12:27 am)
Re: Git and GCC, Jeff King, (Fri Dec 7, 12:31 am)
Re: Git and GCC, David Miller, (Fri Dec 7, 5:53 am)
Re: Git and GCC, Linus Torvalds, (Fri Dec 7, 10:23 am)
Re: Git and GCC, Nicolas Pitre, (Fri Dec 7, 12:36 pm)
Re: Git and GCC, Giovanni Bajo, (Fri Dec 7, 1:26 pm)
Re: Git and GCC, Jakub Narebski, (Fri Dec 7, 3:14 pm)
Re: Git and GCC, Luke Lu, (Fri Dec 7, 4:04 pm)
Re: Git and GCC, Giovanni Bajo, (Fri Dec 7, 4:14 pm)
Re: Git and GCC, Daniel Berlin, (Fri Dec 7, 4:33 pm)
Re: Git and GCC, Harvey Harrison, (Fri Dec 7, 5:47 pm)
Re: Git and GCC, David Miller, (Fri Dec 7, 6:55 pm)
Re: Git and GCC, Johannes Schindelin, (Sat Dec 8, 5:00 am)
Re: Git and GCC, Gabriel Paubert, (Mon Dec 10, 2:54 am)
Re: Git and GCC, David Miller, (Mon Dec 10, 2:57 am)
Re: Git and GCC, Nicolas Pitre, (Mon Dec 10, 8:35 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Johannes Schindelin, (Wed Mar 18, 9:01 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Teemu Likonen, (Wed Mar 18, 9:27 am)
Re: [PATCH] gc --aggressive: make it really aggressive, Nicolas Pitre, (Wed Mar 18, 11:02 am)