Re: 'git add' corrupts repository if the working directory is modified as it runs

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Thomas Rast
Date: Saturday, February 13, 2010 - 7:39 am

On Saturday 13 February 2010 14:39:52 Ilari Liusvaara wrote:

That is still racy.  The real problem is that the file is mmap()ed,
and git then first computes the SHA1 of that buffer, next it
compresses it.[*]

Due to the last sentence in the following snippet from mmap(2):

  MAP_PRIVATE
           Create a private copy-on-write mapping.  Updates to the map-
           ping are not visible to other  processes  mapping  the  same
           file,  and  are  not carried through to the underlying file.
           It is unspecified whether changes made to the file after the
           mmap() call are visible in the mapped region.

This is racy despite the use of MAP_PRIVATE: the mapped contents can
change at any time.

AFAICS there are only two possible solutions:

* Copy the file (possibly block-by-block) as we go, to make sure that
  the data we SHA1 is the same we compress.

* Unpack and re-hash the compressed data to verify that the SHA1 is
  correct.  In case of failure either retry (but you could have to do
  this infinitely often if the user just hates you!) or abort.

(Of course, in neither case does the user have any sort of guarantee
about what data ended up in the repository, but he never had that, we
only try to ensure repo consistency.)


[*] The "do we have this" check actually happens before the
compression, and that arm is thus race-free.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: 'git add' corrupts repository if the working directory ..., Thomas Rast, (Sat Feb 13, 7:39 am)
[PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 6:18 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 6:37 pm)
Re: mmap with MAP_PRIVATE is useless, Junio C Hamano, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 7:00 pm)
Re: mmap with MAP_PRIVATE is useless, Paolo Bonzini, (Sat Feb 13, 7:11 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:18 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:42 pm)
[PATCH v2] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 8:05 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 8:14 pm)
Re: [PATCH] don't use mmap() to hash files, Jakub Narebski, (Sun Feb 14, 4:07 am)
Re: [PATCH] don't use mmap() to hash files, Thomas Rast, (Sun Feb 14, 4:14 am)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sun Feb 14, 4:46 am)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Sun Feb 14, 4:55 am)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 11:10 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:06 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:22 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:28 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:55 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:56 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 4:13 pm)
Re: [PATCH] don't use mmap() to hash files, Zygo Blaxell, (Sun Feb 14, 4:52 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 9:16 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 10:01 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:05 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:48 pm)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Mon Feb 15, 12:48 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:23 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:25 am)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Mon Feb 15, 12:19 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Mon Feb 15, 12:29 pm)
16 gig, 350,000 file repository, Bill Lear, (Thu Feb 18, 1:11 pm)
Re: 16 gig, 350,000 file repository, Nicolas Pitre, (Thu Feb 18, 1:58 pm)
Re: 16 gig, 350,000 file repository, Erik Faye-Lund, (Fri Feb 19, 2:27 am)
Re: 16 gig, 350,000 file repository, Bill Lear, (Mon Feb 22, 3:20 pm)
Re: 16 gig, 350,000 file repository, Nicolas Pitre, (Mon Feb 22, 3:31 pm)