On Thu, 10 Jan 2008, Pierre Habouzit wrote:This is really the big point here. Git uses _lots_ of *small* objects, usually much smaller than 12KB. For example, my copy of the gcc repository has an average of 270 _bytes_ per compressed object, and objects must be individually compressed. Performance with really small objects should be the basis for any Git compression algorithm comparison. The delta heads, though, are far from being the most frequently accessed objects. First they're clearly in minority, and often cached in the delta base cache. Remember that delta objects represent the vast majority of all objects. For example, my kernel repo currently has 555015 delta objects out of 677073 objects, or 82% of the total. There is actually only 25869 non deltified blob objects which are likely to be the larger objects, but they represent only 4% of the total. But just let's try not compressing delta objects so to check your assertion with the following hack: diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index a39cb82..252b03e 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -433,7 +433,10 @@ static unsigned long write_object(struct sha1file *f, } /* compress the data to store and put compressed length in datalen */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, pack_compression_level); + if (obj_type == OBJ_REF_DELTA || obj_type == OBJ_OFS_DELTA) + deflateInit(&stream, 0); + else + deflateInit(&stream, pack_compression_level); maxsize = deflateBound(&stream, size); out = xmalloc(maxsize); /* Compress it */ You then only need to run 'git repack -a -f -d' with and without the above patch. Here's my rather surprising results: My kernel repo pack size without the patch: 184275401 bytes Same repo with the above patch applied: 205204930 bytes So it is only 11% larger. I was expecting much more. I'll let someone else do profiling/timing comparisons. Right. Abstracting the zlib code and having different compression algorithms tested in the Git context is the only way to do meaningful comparisons. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Eric Sandeen | Re: [RFC] Heads up on sys_fallocate() |
| YOSHIFUJI Hideaki / | request_module: runaway loop modprobe net-pf-1 (is Re: Linux 2.6.21-rc1) |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Ben Greear | Re: MACVLANs really best solution? How about a bridge with multiple bridge virtual... |
| Rafael J. Wysocki | 2.6.29-rc8: Reported regressions from 2.6.28 |
