Re: Decompression speed: zip vs lzo

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Nicolas Pitre <nico@...>, Pierre Habouzit <madcoder@...>, Git Mailing List <git@...>, Johannes Schindelin <Johannes.Schindelin@...>, Marco Costalba <mcostalba@...>, Junio C Hamano <gitster@...>
Date: Friday, January 11, 2008 - 9:52 pm

Linus Torvalds wrote:

Well, my figures agree with Pierre I think - 6-10% time savings for
'git annotate'.

I think Pierre has hit the nail on the head - that skipping
compression for small objects is a clear win.  He saw the obvious
criterion, really.  I've knocked it up as a config option that doesn't
change the default behaviour below.

I can't help but speculate what benefits having a range of one or two
of the most elite compression algorithms (eg, lzop or even lzma for
the larger blobs) available would be, in general.  eg, if gzip takes a
stream longer than X kb to offer substantial benefits over lzop, lzop
the ones shorter than that.

If the uncompressed objects are clustered in the pack, then they might
stream compress a lot better, should they be tranmitted over a http
transport with gzip encoding.  In packs which should be as small as
possible, with a format change they could be distributed as one
compressed resource.  The ordering of the objects would ideally be
selected such that it results in optimum compression - which could add
a savings akin to bzip2 vs gzip, at the expense of having to scan the
small objects for mini-deltas and arrange them clustering objects
which share these mini-deltas.

Well, interesting ideas anyway :)

Subject: [PATCH] pack-objects: add compressionMinSize option

Objects smaller than a page don't save much space when compressed, and
cause some overhead.  Allow the user to specify a minimum size for
objects before they are compressed.

Credit: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz>
---
 Documentation/config.txt |    5 +++++
 builtin-pack-objects.c   |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1b6d6d6..245121e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -734,6 +734,11 @@ pack.compression::
 	compromise between speed and compression (currently equivalent
 	to level 6)."
 
+pack.compressionMinSize::
+	Objects smaller than this are not compressed.  This can make
+	operations that deal with many small objects (such as log)
+	faster.
+
 pack.deltaCacheSize::
 	The maximum memory in bytes used for caching deltas in
 	linkgit:git-pack-objects[1].
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a39cb82..316b809 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -76,6 +76,7 @@ static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
+static int compression_min_size = 0;
 
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 0;
@@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f,
 		}
 		/* compress the data to store and put compressed length in datalen */
 		memset(&stream, 0, sizeof(stream));
-		deflateInit(&stream, pack_compression_level);
+		deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0);
 		maxsize = deflateBound(&stream, size);
 		out = xmalloc(maxsize);
 		/* Compress it */
@@ -1841,6 +1842,10 @@ static int git_pack_config(const char *k, const char *v)
 		pack_compression_seen = 1;
 		return 0;
 	}
+	if (!strcmp(k, "pack.compressionminsize")) {
+		compression_min_size = git_config_int(k, v);
+		return 0;	
+	}
 	if (!strcmp(k, "pack.deltacachesize")) {
 		max_delta_cache_size = git_config_int(k, v);
 		return 0;
-- 
1.5.3.7.2095.gb2448-dirty

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Decompression speed: zip vs lzo, Marco Costalba, (Wed Jan 9, 6:01 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Wed Jan 9, 6:55 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Wed Jan 9, 7:23 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Wed Jan 9, 7:49 pm)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Wed Jan 9, 7:31 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Wed Jan 9, 11:41 pm)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 2:55 am)
Re: Decompression speed: zip vs lzo, Dana How, (Thu Jan 10, 3:34 pm)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 7:45 am)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Thu Jan 10, 8:12 am)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 8:18 am)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Wed Jan 9, 9:02 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 1:02 am)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Thu Jan 10, 5:16 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 4:39 pm)
Re: Decompression speed: zip vs lzo, Morten Welinder, (Fri Jan 11, 10:18 am)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Fri Jan 11, 5:45 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Fri Jan 11, 10:27 am)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 5:51 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 6:18 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 6:01 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 5:01 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 5:45 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 6:03 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 6:28 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 6:56 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 9:01 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 10:10 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 2:29 am)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Fri Jan 11, 12:03 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 9:52 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Sat Jan 12, 12:46 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Fri Jan 11, 10:32 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 11:06 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Sat Jan 12, 12:09 pm)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Sat Jan 12, 12:44 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 3:05 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 5:30 pm)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Fri Jan 11, 4:57 am)