Well, the thing is, for things that really don't compress, zlib shouldn't
add much of an overhead on uncompression. It *should* just end up being a
single "memcpy()" after you've done:
- check the header for size and mode ("plain data")
- check the adler checksum (which is *really* nice - we've found real
corruption this way!).
The adler32 checksumming may sound unnecessary when you already have the
SHA1 checksum, but the thing is, we normally don't actually *check* the
SHA1 except when doing a full fsck. So I actually like the fact that
object unpacking always checks at least the adler32 checksum at each
stage, which you get "for free" when you use zlib.
So not using compression at all actually not only gets rid of the
compression, it gets rid of a good safety valve - something that may not
be immediately obvious when you don't think about what all zlib entails.
People think of zlib as just compressing, but I think the checksumming is
almost as important, which is why it isn't an obviously good thing to not
compress small objects just because you don't win on size!
Remember: stability and safety of the data is *the* #1 objective here. The
git SHA1 checksums guarantees that we can find any corruption, but in
every-day git usage, the adler32 checksum is the one that generally would
*notice* the corruption and cause us to say "uhhuh, need to fsck".
Everything else is totally secondary to the goal of "your data is secure".
Yes, performance is a primary goal too, but it's always "performance with
correctness guarantees"!
But I just traced through a simple 60-byte incompressible zlib thing. It's
painful. This should be *the* simplest case, and it should really just be
the memcpy and the adler32 check. But:
[torvalds@woody ~]$ grep '<inflate' trace | wc -l
460
[torvalds@woody ~]$ grep '<adler32' trace | wc -l
403
[torvalds@woody ~]$ grep '<memcpy' trace | wc -l
59
ie we spend *more* instructions on just the stupid setup in "inflate()"
than we spend on the adler32 (or, obviously, on the actual 60-byte memcpy
of the actual incompressible data)
I dunno. I don't mind the adler32 that much. The rest seems to be
pretty annoying, though.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html