Re: Decompression speed: zip vs lzo

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Pierre Habouzit <madcoder@...>
Cc: Sam Vilain <sam@...>, Git Mailing List <git@...>, Johannes Schindelin <Johannes.Schindelin@...>, Marco Costalba <mcostalba@...>, Junio C Hamano <gitster@...>
Date: Thursday, January 10, 2008 - 4:39 pm

On Thu, 10 Jan 2008, Pierre Habouzit wrote:


This is really the big point here.  Git uses _lots_ of *small* objects, 
usually much smaller than 12KB.  For example, my copy of the gcc 
repository has an average of 270 _bytes_ per compressed object, and 
objects must be individually compressed.

Performance with really small objects should be the basis for any 
Git compression algorithm comparison.


The delta heads, though, are far from being the most frequently accessed 
objects.  First they're clearly in minority, and often cached in the 
delta base cache.


Remember that delta objects represent the vast majority of all objects. 
For example, my kernel repo currently has 555015 delta objects out of 
677073 objects, or 82% of the total.  There is actually only 25869 non 
deltified blob objects which are likely to be the larger objects, but 
they represent only 4% of the total.

But just let's try not compressing delta objects so to check your 
assertion with the following hack:

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a39cb82..252b03e 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -433,7 +433,10 @@ static unsigned long write_object(struct sha1file *f,
 		}
 		/* compress the data to store and put compressed length in datalen */
 		memset(&stream, 0, sizeof(stream));
-		deflateInit(&stream, pack_compression_level);
+		if (obj_type == OBJ_REF_DELTA || obj_type == OBJ_OFS_DELTA)
+			deflateInit(&stream, 0);
+		else
+			deflateInit(&stream, pack_compression_level);
 		maxsize = deflateBound(&stream, size);
 		out = xmalloc(maxsize);
 		/* Compress it */

You then only need to run 'git repack -a -f -d' with and without the 
above patch.

Here's my rather surprising results:

My kernel repo pack size without the patch:	184275401 bytes
Same repo with the above patch applied:		205204930 bytes

So it is only 11% larger.  I was expecting much more.

I'll let someone else do profiling/timing comparisons.


Right.  Abstracting the zlib code and having different compression 
algorithms tested in the Git context is the only way to do meaningful 
comparisons.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Decompression speed: zip vs lzo, Marco Costalba, (Wed Jan 9, 6:01 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Wed Jan 9, 6:55 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Wed Jan 9, 7:23 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Wed Jan 9, 7:49 pm)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Wed Jan 9, 7:31 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Wed Jan 9, 11:41 pm)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 2:55 am)
Re: Decompression speed: zip vs lzo, Dana How, (Thu Jan 10, 3:34 pm)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 7:45 am)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Thu Jan 10, 8:12 am)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 8:18 am)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Wed Jan 9, 9:02 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 1:02 am)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Thu Jan 10, 5:16 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 4:39 pm)
Re: Decompression speed: zip vs lzo, Morten Welinder, (Fri Jan 11, 10:18 am)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Fri Jan 11, 5:45 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Fri Jan 11, 10:27 am)
Re: Decompression speed: zip vs lzo, Marco Costalba, (Thu Jan 10, 5:51 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 6:18 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 6:01 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 5:01 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 5:45 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 6:03 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 6:28 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 6:56 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Thu Jan 10, 9:01 pm)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Thu Jan 10, 10:10 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 2:29 am)
Re: Decompression speed: zip vs lzo, Linus Torvalds, (Fri Jan 11, 12:03 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 9:52 pm)
Re: Decompression speed: zip vs lzo, Junio C Hamano, (Sat Jan 12, 12:46 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Fri Jan 11, 10:32 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 11:06 pm)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Sat Jan 12, 12:09 pm)
Re: Decompression speed: zip vs lzo, Johannes Schindelin, (Sat Jan 12, 12:44 pm)
Re: Decompression speed: zip vs lzo, Sam Vilain, (Fri Jan 11, 3:05 am)
Re: Decompression speed: zip vs lzo, Nicolas Pitre, (Thu Jan 10, 5:30 pm)
Re: Decompression speed: zip vs lzo, Pierre Habouzit, (Fri Jan 11, 4:57 am)