I might have sounded as if I was looking for failure report, but
success stories are of course welcome ;-). It's always good to
hear their git experiences first-hand from people in the top
echelon of public projects.
I think that 40% sounds about right. My understanding of the
underlying format CVS uses, RCS, is that it stores an full copy
of the tip of trunk uncompressed, and other versions of the file
are represented as incremental delta from that. The packed git
format does not favor particular version based on the distance
from the tip, and stores either a compressed full copy, or a
delta from some other revision (which may not necessarily be
represented as a full copy). When we store something as a delta
from something else, we limit the length of the delta chain to a
full copy to 10 (by default), so that you can get to a specific
object with at most 10 applications of delta on top of a full
copy.
Comparing these two formats for storage efficiency is tricky:
- A full copy of the version at the tip in CVS is not
compressed but in git a full copy is compressed -- zlib gives
50% for typical text sources -- git has some advantage here.
- Because of delta-length limit, we store full copy, albeit
compressed [*1*], every ten or so versions. This trades off
storage effciency for run-time efficiency.
- CVS storage records most things as delta for a long-lived
project, and delta are less compressible (IOW, you could
think of them as already compressed somewhat), so it is not
_that_ inefficient to begin with.
- Delta representation is used only when representing something
as a delta from something else buys as enough space reduction
than compressing it as a full copy in git. This is a pure
improvement from the CVS format.
[Footnote]
*1* You could make different trade-off by using --depth flag
when running git-pack-objects.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html