On Fri, 16 Mar 2007, Nicolas Pitre wrote:Well, the thing is, for things that really don't compress, zlib shouldn't add much of an overhead on uncompression. It *should* just end up being a single "memcpy()" after you've done: - check the header for size and mode ("plain data") - check the adler checksum (which is *really* nice - we've found real corruption this way!). The adler32 checksumming may sound unnecessary when you already have the SHA1 checksum, but the thing is, we normally don't actually *check* the SHA1 except when doing a full fsck. So I actually like the fact that object unpacking always checks at least the adler32 checksum at each stage, which you get "for free" when you use zlib. So not using compression at all actually not only gets rid of the compression, it gets rid of a good safety valve - something that may not be immediately obvious when you don't think about what all zlib entails. People think of zlib as just compressing, but I think the checksumming is almost as important, which is why it isn't an obviously good thing to not compress small objects just because you don't win on size! Remember: stability and safety of the data is *the* #1 objective here. The git SHA1 checksums guarantees that we can find any corruption, but in every-day git usage, the adler32 checksum is the one that generally would *notice* the corruption and cause us to say "uhhuh, need to fsck". Everything else is totally secondary to the goal of "your data is secure". Yes, performance is a primary goal too, but it's always "performance with correctness guarantees"! But I just traced through a simple 60-byte incompressible zlib thing. It's painful. This should be *the* simplest case, and it should really just be the memcpy and the adler32 check. But: [torvalds@woody ~]$ grep '<inflate' trace | wc -l 460 [torvalds@woody ~]$ grep '<adler32' trace | wc -l 403 [torvalds@woody ~]$ grep '<memcpy' trace | wc -l 59 ie we spend *more* instructions on just the stupid setup in "inflate()" than we spend on the adler32 (or, obviously, on the actual 60-byte memcpy of the actual incompressible data) I dunno. I don't mind the adler32 that much. The rest seems to be pretty annoying, though. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Andrew Morton | 2.6.23-rc6-mm1 |
| Eric Paris | [RFC 0/5] [TALPA] Intro to a linux interface for on access scanning |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
