Right. I think it would be a good idea to extend the delta format as
well to allow for larger offsets in pack v4.
One thing that could be done with really large blobs is to create a
sparser index, i.e. have a larger step than 16. Because the delta match
loop scans backward after a match the sparse index shouldn't affect
compression that much on large blobs and the index could be
significantly smaller.
I'm surprised that your patch makes so much of a difference. Normally
the first window should always match in the case you're trying to
optimize and the current code should already perform more or less the
same as your common prefix match does.
Ah, no, actually what your patch does is a pessimisation of the matching
code by not considering other and possibly better matches elsewhere in
the reference buffer whenever there is a match at the beginning of both
buffers. I don't think this is a good idea in general.
What you should try instead if you want to make the process faster is to
lower the treshold used to consider a match sufficiently large to stop
searching. That has the potential for even faster processing as the
"optimization" would then be effective throughout the buffer and not
only at the beginning.
Currently the treshold is implicit and equal to 65536. Please consider
this patch instead of yours for testing:
diff --git a/diff-delta.c b/diff-delta.c
index 9f998d0..755c0a9 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -315,6 +315,9 @@ create_delta(const struct delta_index *index,
/* this is our best match so far */
msize = ref - entry->ptr;
moff = entry->ptr - ref_data;
+ /* a sufficiently large match is good enough */
+ if (msize >= 4096)
+ break;
}
}
You could experiment with that value to determine the best speed vs size
compromize.
Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html