On Mon, Feb 15, 2010 at 8:05 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
Probably, you are right. Heuristic is a bad idea. Still, we may want to
add an option to disable mmap() during hash calculation if we preserve
mmap() here. Though, I don't like keeping mmap() there if we go for #2..
See below...
I have not had time to look closely at this, but there is one problem
that I noticed -- the header of any git object contains the blob length.
We know this length in advance (without reading all data) only for
regular files and only if they do not have any filter to be applied.
In all other cases, it seems we cannot do much better than we do now,
assuming that we do not want to change the storage format...
If so, the question remains what to do about regular files with some
filter. Currently, we use mmap() for the original data but store the
processed data in memory anyway. The question is whether want to keep
this use of mmap() here? Considering that it is a potential source of a
repository corruption and these filters should not be used for big files
because they take a lot of memory anyway, I think we should get rid of
mmap() in hashing file completely, once we can process regular files
without filters in chunks.
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html