Re: Git as a filesystem

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Martin Langhoff <martin.langhoff@...>
Cc: Dmitry Potapov <dpotapov@...>, Peter Stahlir <peter.stahlir@...>, Karl Hasselström <kha@...>, Johannes Schindelin <Johannes.Schindelin@...>, <git@...>
Date: Friday, September 21, 2007 - 11:09 pm

Martin Langhoff wrote:

Nightmare indeed.  I actually wrote a proof of concept for this idea for
gzip.

http://git.catalyst.net.nz/gw?p=git.git;a=shortlog;h=archive-blobs
(see also
http://planet.catalyst.net.nz/blog/2006/07/17/samv/xteddy_caught_consuming_rampant_amo...)

I usually warn people that this undertaking is "slightly insane".

My implementation was designed to be called like "git-hash-object". 
What it did was look at the input stream, and detect quickly whether it
looked like a gzip stream.  If it was, it would decompress it and then
try to compress the first few blocks using different compression
libraries and settings to determine what settings were used.  If it
could find the right settings for the first meg or so, then it would
bank on the rest being identical as well, record which compressor and
what settings were used and write the uncompressed object, as well as
the information needed to reconstruct the gzip header, to a new type of
object called an "archive" object.  If the stream could not be
reproduced then it would save the raw stream instead.  For something
like a Debian archive, it is very likely that all compressed streams
will be reproducible, because they will almost all be compressed using
the same implementation of gzip.

For tar and .ar files, this can be slightly more deterministic of
course.  It doesn't even need to be particularly savvy of what all the
fields are - just locate the files in the .tar, write out a tree, and
then write a TOC that lists tree entries and contains any extra data (ie
headers, etc).

In hindsight, making a new object type was probably a mistake.  If I
were to re-undertake this I would not go down that path, though I'd
certainly consider using tag objects for the extra data, and throwing
them in the tree like submodules.  It would also be essential in a
"real" solution to bundle reference copies of the zlib and gzip
compressors (yes, their output streams differ with longer inputs and
even some short ones).

Sam.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Git as a filesystem, Peter Stahlir, (Fri Sep 21, 6:51 am)
Re: Git as a filesystem, Johannes Schindelin, (Fri Sep 21, 7:11 am)
Re: Git as a filesystem, Miklos Vajna, (Fri Sep 21, 10:22 am)
Re: Git as a filesystem, Peter Stahlir, (Fri Sep 21, 7:41 am)
Re: Git as a filesystem, Nicolas Pitre, (Fri Sep 21, 9:22 am)
Re: Git as a filesystem, Eric Wong, (Fri Sep 21, 7:33 pm)
Re: Git as a filesystem, Johannes Schindelin, (Fri Sep 21, 7:42 pm)
Re: Git as a filesystem, Eric Wong, (Fri Sep 21, 10:06 pm)
Re: Git as a filesystem, Johannes Schindelin, (Sat Sep 22, 8:06 am)
Re: Git as a filesystem, Peter Stahlir, (Fri Sep 21, 9:35 am)
Re: Git as a filesystem, Christian von Kietzell, (Fri Sep 21, 11:46 am)
Re: Git as a filesystem, Nicolas Pitre, (Fri Sep 21, 9:45 am)
Re: Git as a filesystem, Karl , (Fri Sep 21, 8:53 am)
Re: Git as a filesystem, Peter Stahlir, (Fri Sep 21, 9:28 am)
Re: Git as a filesystem, Dmitry Potapov, (Fri Sep 21, 1:29 pm)
Re: Git as a filesystem, Martin Langhoff, (Fri Sep 21, 7:56 pm)
Re: Git as a filesystem, Sam Vilain, (Fri Sep 21, 11:09 pm)
Re: Git as a filesystem, jlh, (Fri Sep 21, 10:38 am)
Re: Git as a filesystem, Michael Poole, (Fri Sep 21, 9:41 am)