Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP

Previous thread: How well is blue pill working for you? by lag on Thursday, January 17, 2008 - 8:06 pm. (1 message)

Next thread: Re: git on MacOSX and files with decomposed utf-8 file names by Brian Dessent on Thursday, January 17, 2008 - 9:08 pm. (1 message)
From: Linus Torvalds
Date: Thursday, January 17, 2008 - 9:27 pm

In fact, even with mmap(), it's not guaranteed. There are really crappy 
mmap implementations out there, partly due to bad CPU design (virtual CPU 
caches without coherency), but more often due to total crap OS.

(Yeah, Linux did count in that area at some point. Long ago. Early 90's. 
Maybe)

I think HP-UX used to have non-coherent mmap for the longest time, due to 
carrying around some totally crap memory management based on some ancient 
BSD version that everybody else (including the BSD's) had long since 
jettisoned.

That said, I suspect any unix you can run today (without calling it a 
retro setup) probably has coherent-enough mmap. The possible virtual cache 
coherency issue is unlikely to be able to trigger this (and not relevant 
on any sane hardware anyway).

				Linus
-

From: Charles Bailey
Date: Friday, January 18, 2008 - 1:42 am

I've just checked the Mac OS X build and it looks like there is a mmap
and git is indeed using it, so this is obviously an example of a
"really crappy" mmap implementation.

This adds more ammunition to the fight against the whole Mac OS X is
powered/built/based on UNIX myth.

Charles.
-

From: Linus Torvalds
Date: Friday, January 18, 2008 - 10:08 am

Looking closer, this is not necessarily the case here.

Git uses MAP_PRIVATE, because that whole pack-file mapping was really 
*meant* to map an existing read-only pack-file, and fast-import seems to 
really be screwing with it.

It so happens that Linux has a particularly clean and streamlined VM, and 
if you do only reads to a MAP_PRIVATE mapping on a normal filesystem, 
you'll always be as coherent as with MAP_SHARED because Linux will simply 
map in the page cache pages directly.

But this is definitely not portable, and the git fast-import mmap window 
usage before Shawn's patch it was simply wrong.

So in this case, it really was git that was crap.

It just happened to work because the Linux mmap handling is just generally 
pretty sane. It probably also worked fine on pretty much any other modern 
UNIX (ie Solaris).

I'm not quite sure what OS X does to MAP_PRIVATE mappings, but if OS X is 
still based on Mach (with FreeBSD just as a single-server on top), I 
suspect that may be why it broke on OS X. The Mach VM is insanely complex 
and does really odd things.

But the fact is, without MAP_SHARED, you shouldn't expect things to be 
coherent, even if they often will be (especially for PROT_READ).

Btw, even with Shawn's patch, I wonder if the index_data usage is correct.

			Linus
-

From: Junio C Hamano
Date: Friday, January 18, 2008 - 8:25 pm

Hmph.

gfi uses data in a "pack" in two quite different ways.

A new object is written to an unfinalized pack.  Such a pack
already has "struct packed_git" allocated for it and a pointer
to it is held in pack_data.  As far as the core part of git
(that is, sha1_file.c) is concerned, however, this pack does not
even exist.  It is still not part of packed_git list in
sha1_file.c, and read_sha1_file() will not see objects in it, as
no idx into the packfile exists yet.  gfi has a table of objects
in this pack and uses gfi_unpack_entry() function to retrieve
data from it.

A packfile is finalized in end_packfile().  The pack header and
footer is recomputed, an idx file is written, and the pack is
finally registered.  Before that time p->index_data is not even
used for that pack (it is initialized with NULL).

So I do not think "index_data usage" can be incorrect, as there
won't be any index_data usage with unfinalized pack, and the
core part of git would not even have any mmap(2) (nor open fd)
into its idx file before it is finalized.

By the way, I was quite puzzled how the gfi_unpack_entry()
function manages to work correctly when it has to read an object
it deltified based on another object it wrote into the same
unfinalized pack earlier.  It knows where in the unfinalized
pack it wrote the object, so it can find from its own "struct
object_entry" the offset for the object, and calls
unpack_entry() defined in the core to do the rest.

However, most of the core does not really know about the other
objects in this half-built pack.  If the object is a delta,
unpack_delta_entry() needs to find the delta base.  And it needs
to do that without having the idx.

The trick (the code really needs a bit more documentation) is
that gfi never writes anything but OFS_DELTA.  So the core, even
though it does not have the corresponding idx file, does not
have to look up the object (in fact it does not even know what
object to look up for the base, it only knows the offset).


-

From: Linus Torvalds
Date: Friday, January 18, 2008 - 8:55 pm

Oh, ok. I did try to grep for index_data, and didn't find anything that 
looked bad, but the incestuous things that fast-import.c does just made me 
worry - but I was too lazy to really follow it all. It's one of those 

In that case, I think Shawn fixed it all, and we're all good, and it's not 
just hidden well enough that it "just happens" to work.

				Linus
-

From: Shawn O. Pearce
Date: Sunday, January 20, 2008 - 8:57 pm

Yup.

Older fast-imports (pre OFS_DELTA) had to replicate a good chunk of
the unpack_entry() logic directly inside of fast-import.  But it
later occurred to me that OFS_DELTA simplifies the code and lets
me reuse more of the existing sha1_file.c implementation.

So yea, fast-import only emits OFS_DELTA, and does so only so that it
can pull this delta-base-unpack trick on the core, without actually
giving the core the corresponding idx file first.

-- 
Shawn.
-

Previous thread: How well is blue pill working for you? by lag on Thursday, January 17, 2008 - 8:06 pm. (1 message)

Next thread: Re: git on MacOSX and files with decomposed utf-8 file names by Brian Dessent on Thursday, January 17, 2008 - 9:08 pm. (1 message)