On Thu, 6 Dec 2007, Jon Loeliger wrote:Well, in a very real sense, what the delta code does is: - just list every single object in the whole repository - walk over each object, trying to find another object that it can be written as a delta against - write out the result as a pack-file That's simplified: we may not walk _all_ objects, for example: only a global repack does that (and most pack creations are actually for pushign and pulling between two repositories, so we only walk the objects that are in the source but not the destination repository). The interesting phase is the "walk each object, try to find a delta" part. In particular, you don't want to try to find a delta by comparing each object to every other object out there (that would be O(n^2) in objects, and with a fairly high constant cost too!). So what it does is to sort the objects by a few heuristics (type of object, base name that object was found as when traversing a tree and size, and how recently it was found in the history). And then over that sorted list, it tries to find deltas between entries that are "close" to each other (and that's where the "--window=xyz" thing comes in - it says how big the window is for objects being close. A smaller window generates somewhat less good deltas, but takes a lot less effort to generate). The source is in git/builtin-pack-objects.c, with the core of it being - try_delta() - try to generate a *single* delta when given an object pair. - find_deltas() - do the actual list traversal - prepare_pack() and type_size_sort() - create the delta sort list from the list of objects. but that whole file is probably some of the more opaque parts of git. It's certainly not a simple chain, it's more of a set of acyclic directed graphs in the object list. And yes, it's weigted by the size of the delta between objects, and the optimization problem is kind of akin to finding the smallest spanning tree (well, forest - since you do *not* want to create one large graph, you also want to make the individual trees shallow enough that you don't have excessive delta depth). There are good algorithms for finding minimum spanning trees, but this one is complicated by the fact that the biggest cost (by far!) is the calculation of the weights itself. So rather than really worry about finding the minimal tree/forest, the code needs to worry about not having to even calculate all the weights! (That, btw, is a common theme. A lot of git is about traversing graphs, like the revision graph. And most of the trivial graph problems all assume that you have the whole graph, but since the "whole graph" is the whole history of the repository, those algorithms are totally worthless, since they are fundamentally much too expensive - if we have to generate the whole history, we're already screwed for a big project. So things like revision graph calculation, the main performance issue is to avoid having to even *look* at parts of the graph that we don't need to see!) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Glauber de Oliveira Costa | [PATCH 5/25] [PATCH] native versions for system.h functions |
| Paul Menage | Re: [RFC][PATCH 6/7] Account for the number of tasks within container |
| Tejun Heo | [PATCHSET] CUSE: implement CUSE |
| Al Boldi | Re: CFS review |
git: | |
| Ken Pratt | pack operation is thrashing my server |
| Linus Torvalds | Re: git and time |
| Michael Witten | Re: Proposed git mv behavioral change |
| Johannes Schindelin | Re: I'm a total push-over.. |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Bertram Scharpf | First install: Grub doesn't find partitions |
| Chris Bullock | OpenBSD isakmpd and pf vs Cisco PIX or ASA |
| Axton | Re: rouge IPs / user |
| hooanon05 | [PATCH 62/67] aufs magic sysrq handler |
| David Howells | [PATCH 06/17] BLOCK: Move bdev_cache_init() declaration to headerfile [try #2] |
| Miklos Szeredi | [PATCH] update ctime and mtime for mmaped write |
| Linus Torvalds | Re: silent semantic changes with reiser4 |
