Martin Langhoff wrote:Nightmare indeed. I actually wrote a proof of concept for this idea for gzip. http://git.catalyst.net.nz/gw?p=git.git;a=shortlog;h=archive-blobs (see also http://planet.catalyst.net.nz/blog/2006/07/17/samv/xteddy_caught_consuming_rampant_amo...) I usually warn people that this undertaking is "slightly insane". My implementation was designed to be called like "git-hash-object". What it did was look at the input stream, and detect quickly whether it looked like a gzip stream. If it was, it would decompress it and then try to compress the first few blocks using different compression libraries and settings to determine what settings were used. If it could find the right settings for the first meg or so, then it would bank on the rest being identical as well, record which compressor and what settings were used and write the uncompressed object, as well as the information needed to reconstruct the gzip header, to a new type of object called an "archive" object. If the stream could not be reproduced then it would save the raw stream instead. For something like a Debian archive, it is very likely that all compressed streams will be reproducible, because they will almost all be compressed using the same implementation of gzip. For tar and .ar files, this can be slightly more deterministic of course. It doesn't even need to be particularly savvy of what all the fields are - just locate the files in the .tar, write out a tree, and then write a TOC that lists tree entries and contains any extra data (ie headers, etc). In hindsight, making a new object type was probably a mistake. If I were to re-undertake this I would not go down that path, though I'd certainly consider using tag objects for the extra data, and throwing them in the tree like submodules. It would also be essential in a "real" solution to bundle reference copies of the zlib and gzip compressors (yes, their output streams differ with longer inputs and even some short ones). Sam. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Justin Piszcz | exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen |
| Heiko Carstens | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
| Radu Rendec | htb parallelism on multi-core platforms |
