On Thu, 10 Jan 2008, Pierre Habouzit wrote:This is really the big point here. Git uses _lots_ of *small* objects, usually much smaller than 12KB. For example, my copy of the gcc repository has an average of 270 _bytes_ per compressed object, and objects must be individually compressed. Performance with really small objects should be the basis for any Git compression algorithm comparison. The delta heads, though, are far from being the most frequently accessed objects. First they're clearly in minority, and often cached in the delta base cache. Remember that delta objects represent the vast majority of all objects. For example, my kernel repo currently has 555015 delta objects out of 677073 objects, or 82% of the total. There is actually only 25869 non deltified blob objects which are likely to be the larger objects, but they represent only 4% of the total. But just let's try not compressing delta objects so to check your assertion with the following hack: diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index a39cb82..252b03e 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -433,7 +433,10 @@ static unsigned long write_object(struct sha1file *f, } /* compress the data to store and put compressed length in datalen */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, pack_compression_level); + if (obj_type == OBJ_REF_DELTA || obj_type == OBJ_OFS_DELTA) + deflateInit(&stream, 0); + else + deflateInit(&stream, pack_compression_level); maxsize = deflateBound(&stream, size); out = xmalloc(maxsize); /* Compress it */ You then only need to run 'git repack -a -f -d' with and without the above patch. Here's my rather surprising results: My kernel repo pack size without the patch: 184275401 bytes Same repo with the above patch applied: 205204930 bytes So it is only 11% larger. I was expecting much more. I'll let someone else do profiling/timing comparisons. Right. Abstracting the zlib code and having different compression algorithms tested in the Git context is the only way to do meaningful comparisons. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Stephen Smalley | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
git: | |
| David Fenyes | sigsetmask()? (LINUX) |
| Stephen Tweedie | Unmounting root (no kidding!) [was: Some Linux problems---solved] |
| Les Andrzejewski | X386/WD90C31/SUMSUNG SYNC MASTER 4 |
| Doug Evans | Re: Stabilizing Linux |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Linus Torvalds | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Herbert Xu | Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment |
