On Thu, Jan 10, 2008 at 07:02:39AM +0000, Sam Vilain wrote:
Well, lzma is excellent for *big* chunks of data, but not that impressive f=
or
small files:
$ ll git.c git.c.gz git.c.lzma git.c.lzop
-rw-r--r-- 1 madcoder madcoder 12915 2008-01-09 13:47 git.c
-rw-r--r-- 1 madcoder madcoder 4225 2008-01-10 10:00 git.c.gz
-rw-r--r-- 1 madcoder madcoder 4094 2008-01-10 10:00 git.c.lzma
-rw-r--r-- 1 madcoder madcoder 5068 2008-01-10 09:59 git.c.lzop
And lzma performs really bad if you have few memory available. The "big" se=
cret
of lzma is that it basically works with a huge window to check for repetiti=
ve
data, and even decompression needs quite a fair amount of memory, making it=
a
really bad choice for git IMNSHO.
Though I don't agree with you (and some others) about the fact that gzip is
fast enough. It's clearly a bottleneck in many log related commands where y=
ou
would expect it to be rather IO bound than CPU bound. LZO seems like a fai=
rer
choice, especially since what it makes gain is basically the compression of=
the
biggest blobs, aka the delta chains heads. It's really unclear to me if we
really gain in compressing the deltas, trees, and other smallish informatio=
ns.
And when it comes to times, for a big file enough to give numbers, here are=
the
decompression times (best of 10 runs, smaller is better, second number is t=
he
size of the packed data, original data was 7.8Mo):
* lzma: 0.374s (2.2Mo)
* gzip: 0.127s (2.9Mo)
* lzop: 0.053s (3.2Mo)
For a 300k original file:
* lzma: 0.022s (124Ko)
* gzip: 0.008s (144Ko)
* lzop: 0.004s (156Ko) /* most of the samples were actually 0.005 */
What is obvious to me is that lzop seems to take 10% more space than gzip,
while being around 1.5 to 2 times faster. Of course this is very sketchy an=
d a
real test with git will be better.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org