Re: [PATCH] don't use mmap() to hash files

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Dmitry Potapov
Date: Monday, February 15, 2010 - 5:23 am

On Mon, Feb 15, 2010 at 8:05 AM, Nicolas Pitre <nico@fluxnic.net> wrote:

Probably, you are right. Heuristic is a bad idea. Still, we may want to
add an option to disable mmap() during hash calculation if we preserve
mmap() here. Though, I don't like keeping mmap() there if we go for #2..
See below...


I have not had time to look closely at this, but there is one problem
that I noticed -- the header of any git object contains the blob length.
We know this length in advance (without reading all data) only for
regular files and only if they do not have any filter to be applied.
In all other cases, it seems we cannot do much better than we do now,
assuming that we do not want to change the storage format...

If so, the question remains what to do about regular files with some
filter. Currently, we use mmap() for the original data but store the
processed data in memory anyway. The question is whether want to keep
this use of mmap() here? Considering that it is a potential source of a
repository corruption and these filters should not be used for big files
because they take a lot of memory anyway, I think we should get rid of
mmap() in hashing file completely, once we can process regular files
without filters in chunks.


Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 6:18 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 6:37 pm)
Re: mmap with MAP_PRIVATE is useless, Junio C Hamano, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 7:00 pm)
Re: mmap with MAP_PRIVATE is useless, Paolo Bonzini, (Sat Feb 13, 7:11 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:18 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:42 pm)
[PATCH v2] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 8:05 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 8:14 pm)
Re: [PATCH] don't use mmap() to hash files, Jakub Narebski, (Sun Feb 14, 4:07 am)
Re: [PATCH] don't use mmap() to hash files, Thomas Rast, (Sun Feb 14, 4:14 am)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sun Feb 14, 4:46 am)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Sun Feb 14, 4:55 am)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 11:10 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:06 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:22 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:28 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:55 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:56 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 4:13 pm)
Re: [PATCH] don't use mmap() to hash files, Zygo Blaxell, (Sun Feb 14, 4:52 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 9:16 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 10:01 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:05 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:48 pm)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Mon Feb 15, 12:48 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:23 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:25 am)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Mon Feb 15, 12:19 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Mon Feb 15, 12:29 pm)
16 gig, 350,000 file repository, Bill Lear, (Thu Feb 18, 1:11 pm)
Re: 16 gig, 350,000 file repository, Nicolas Pitre, (Thu Feb 18, 1:58 pm)
Re: 16 gig, 350,000 file repository, Erik Faye-Lund, (Fri Feb 19, 2:27 am)
Re: 16 gig, 350,000 file repository, Bill Lear, (Mon Feb 22, 3:20 pm)
Re: 16 gig, 350,000 file repository, Nicolas Pitre, (Mon Feb 22, 3:31 pm)