Linus Torvalds wrote:Well, my figures agree with Pierre I think - 6-10% time savings for 'git annotate'. I think Pierre has hit the nail on the head - that skipping compression for small objects is a clear win. He saw the obvious criterion, really. I've knocked it up as a config option that doesn't change the default behaviour below. I can't help but speculate what benefits having a range of one or two of the most elite compression algorithms (eg, lzop or even lzma for the larger blobs) available would be, in general. eg, if gzip takes a stream longer than X kb to offer substantial benefits over lzop, lzop the ones shorter than that. If the uncompressed objects are clustered in the pack, then they might stream compress a lot better, should they be tranmitted over a http transport with gzip encoding. In packs which should be as small as possible, with a format change they could be distributed as one compressed resource. The ordering of the objects would ideally be selected such that it results in optimum compression - which could add a savings akin to bzip2 vs gzip, at the expense of having to scan the small objects for mini-deltas and arrange them clustering objects which share these mini-deltas. Well, interesting ideas anyway :) Subject: [PATCH] pack-objects: add compressionMinSize option Objects smaller than a page don't save much space when compressed, and cause some overhead. Allow the user to specify a minimum size for objects before they are compressed. Credit: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> --- Documentation/config.txt | 5 +++++ builtin-pack-objects.c | 7 ++++++- 2 files changed, 11 insertions(+), 1 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1b6d6d6..245121e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -734,6 +734,11 @@ pack.compression:: compromise between speed and compression (currently equivalent to level 6)." +pack.compressionMinSize:: + Objects smaller than this are not compressed. This can make + operations that deal with many small objects (such as log) + faster. + pack.deltaCacheSize:: The maximum memory in bytes used for caching deltas in linkgit:git-pack-objects[1]. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index a39cb82..316b809 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -76,6 +76,7 @@ static int num_preferred_base; static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static int compression_min_size = 0; static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 0; @@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f, } /* compress the data to store and put compressed length in datalen */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, pack_compression_level); + deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0); maxsize = deflateBound(&stream, size); out = xmalloc(maxsize); /* Compress it */ @@ -1841,6 +1842,10 @@ static int git_pack_config(const char *k, const char *v) pack_compression_seen = 1; return 0; } + if (!strcmp(k, "pack.compressionminsize")) { + compression_min_size = git_config_int(k, v); + return 0; + } if (!strcmp(k, "pack.deltacachesize")) { max_delta_cache_size = git_config_int(k, v); return 0; -- 1.5.3.7.2095.gb2448-dirty - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Kok, Auke | Re: Linux 2.6.21-rc1 |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jeff Garzik | Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Eric Dumazet | [PATCH] net: remove superfluous call to synchronize_net() |
