But that canonical pack could be any particular pack.
Sure, but that's not sufficient. All this 20-byte SHA1 gives you is a
set of objects. That says nothing about their encoding.
Ordering does matter a big deal. Since object IDs are the SHA1 of their
content, those IDs are totally random. So if you store objects
according to their sorted IDs, then the placement of objects belonging
to, say, the top commit will be totally random. And since you are the
filesystem expert, I don't have to tell you what performance impacts
this random access of small segments of data scattered throughout a
400MB file will have on a checkout operation.
Well, this would still be a non negligible maintenance cost. And for
what purpose already? What is the real advantage?
Sure. But I don't think it is worth making Git less flexible just for
the purpose of ensuring that people could independently create identical
packs. I'd advocate for "no code to write at all" instead, and simply
have one person create and seed the reference pack.
And if you are willing to participate in the seeding of such a torrent,
then you better not be bandwidth limited, meaning that you certainly can
afford to download that reference pack in the first place.
And that reference pack doesn't have to change that often either. If
you update it only on every major kernel releases, then you'll need to
fetch it about once every 3 months. Incremental updates from those
points should be relatively small.
Yet... it should be possible in practice to produce identical packs,
given that the Git version is specified, the zlib version is specified,
the number of threads for the repack is equal to 1, the -f flag is used
meaning a full repack is performed, the delta depth and window size is
specified, and the head branches are specified. Given that torrents are
also identified by a hash of their content, it should be pretty easy to
see if the attempt to reproduce the reference pack worked, and start
seeding right away if it did.
But again, I don't think it is worth freezing the pack format into a
canonical encoding for this purpose.
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html