In fact, even with mmap(), it's not guaranteed. There are really crappy
mmap implementations out there, partly due to bad CPU design (virtual CPU
caches without coherency), but more often due to total crap OS.(Yeah, Linux did count in that area at some point. Long ago. Early 90's.
Maybe)I think HP-UX used to have non-coherent mmap for the longest time, due to
carrying around some totally crap memory management based on some ancient
BSD version that everybody else (including the BSD's) had long since
jettisoned.That said, I suspect any unix you can run today (without calling it a
retro setup) probably has coherent-enough mmap. The possible virtual cache
coherency issue is unlikely to be able to trigger this (and not relevant
on any sane hardware anyway).Linus
-
I've just checked the Mac OS X build and it looks like there is a mmap
and git is indeed using it, so this is obviously an example of a
"really crappy" mmap implementation.This adds more ammunition to the fight against the whole Mac OS X is
powered/built/based on UNIX myth.Charles.
-
Looking closer, this is not necessarily the case here.
Git uses MAP_PRIVATE, because that whole pack-file mapping was really
*meant* to map an existing read-only pack-file, and fast-import seems to
really be screwing with it.It so happens that Linux has a particularly clean and streamlined VM, and
if you do only reads to a MAP_PRIVATE mapping on a normal filesystem,
you'll always be as coherent as with MAP_SHARED because Linux will simply
map in the page cache pages directly.But this is definitely not portable, and the git fast-import mmap window
usage before Shawn's patch it was simply wrong.So in this case, it really was git that was crap.
It just happened to work because the Linux mmap handling is just generally
pretty sane. It probably also worked fine on pretty much any other modern
UNIX (ie Solaris).I'm not quite sure what OS X does to MAP_PRIVATE mappings, but if OS X is
still based on Mach (with FreeBSD just as a single-server on top), I
suspect that may be why it broke on OS X. The Mach VM is insanely complex
and does really odd things.But the fact is, without MAP_SHARED, you shouldn't expect things to be
coherent, even if they often will be (especially for PROT_READ).Btw, even with Shawn's patch, I wonder if the index_data usage is correct.
Linus
-
Hmph.
gfi uses data in a "pack" in two quite different ways.
A new object is written to an unfinalized pack. Such a pack
already has "struct packed_git" allocated for it and a pointer
to it is held in pack_data. As far as the core part of git
(that is, sha1_file.c) is concerned, however, this pack does not
even exist. It is still not part of packed_git list in
sha1_file.c, and read_sha1_file() will not see objects in it, as
no idx into the packfile exists yet. gfi has a table of objects
in this pack and uses gfi_unpack_entry() function to retrieve
data from it.A packfile is finalized in end_packfile(). The pack header and
footer is recomputed, an idx file is written, and the pack is
finally registered. Before that time p->index_data is not even
used for that pack (it is initialized with NULL).So I do not think "index_data usage" can be incorrect, as there
won't be any index_data usage with unfinalized pack, and the
core part of git would not even have any mmap(2) (nor open fd)
into its idx file before it is finalized.By the way, I was quite puzzled how the gfi_unpack_entry()
function manages to work correctly when it has to read an object
it deltified based on another object it wrote into the same
unfinalized pack earlier. It knows where in the unfinalized
pack it wrote the object, so it can find from its own "struct
object_entry" the offset for the object, and calls
unpack_entry() defined in the core to do the rest.However, most of the core does not really know about the other
objects in this half-built pack. If the object is a delta,
unpack_delta_entry() needs to find the delta base. And it needs
to do that without having the idx.The trick (the code really needs a bit more documentation) is
that gfi never writes anything but OFS_DELTA. So the core, even
though it does not have the corresponding idx file, does not
have to look up the object (in fact it does not even know what
object to look up for the base, it only knows the offset).-
Yup.
Older fast-imports (pre OFS_DELTA) had to replicate a good chunk of
the unpack_entry() logic directly inside of fast-import. But it
later occurred to me that OFS_DELTA simplifies the code and lets
me reuse more of the existing sha1_file.c implementation.So yea, fast-import only emits OFS_DELTA, and does so only so that it
can pull this delta-base-unpack trick on the core, without actually
giving the core the corresponding idx file first.--
Shawn.
-
Oh, ok. I did try to grep for index_data, and didn't find anything that
looked bad, but the incestuous things that fast-import.c does just made me
worry - but I was too lazy to really follow it all. It's one of thoseIn that case, I think Shawn fixed it all, and we're all good, and it's not
just hidden well enough that it "just happens" to work.Linus
-
| Heiko Carstens | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Eric W. Biederman | [PATCH 0/10] sysfs network namespace support |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
