I actually have worked through the last night on the day job, have
urgent stuff piling up in my freelance work queue, and the next thing
I need to finish for git is some smart stuff for delta packing.
So it's unlikely I'll get to _that_ anytime soon. However, I had a
hilarious idea on the way home that kept me rather amused (perhaps my
programmer's humour is affected by sleep deprivation).
I was annoyed at needing double the space because of having to keep
score of both hash and file number. So I came up with a rather cute
manner to avoid this: first do all files in isolation with full
precision, but store the resulting list of hash as difference to the
last value. When merging the data of 2^k and 2^k (or somewhat less)
files, we multiply the values by two (this will not carry except for
utterly improbable cases or very small data sets which we can do
differently) and add one bit of identification. When we have just a
single sequence remaining, undeltafying will tell us about collisions
in the high bits, and the affected files in the low bits.
Of course, using a merge-like algorithm means that we temporarily need
double space anyway. Which takes some of the fun.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html