Say the project upstream has the file being "README", but, for some
reason, it has ended up checked out as "readme" in your directory. Since
your filesystem is case insensitive, it's supposed to be the same file,
but when git goes through the list of files in the directory, it sees
"readme", and there's nothing between reachable.h and read-cache.c in the
list of tracked files. We've got a sorted list of filenames we're tracking
along with their most-recently-seen content, and we want to merge the
results of readdir with them, and this is obviously more straightforward
if the filename that's the match for "README" is provided byte-for-byte
the same, and therefore sorts the same.
We want both lists sorted, so that we can step through the pair together
and always reach matches together. This requires that the equivalent names
sort together, as well as comparing equal.
Ah, that's helpful. We don't actually care too much about the particular
info in stat; we just want to know quickly if the file has changed, so we
can hash only the ones that have been touched and get the actual content
changes.
No, we get our memory with malloc like normal people. The mmap is because
we want to feed files and parts of files to zlib, and mmap makes that
easy.
Git is built around a database of objects, which includes "blobs" (file
content), "trees" (directory structure), "commits" (history linkage), and
"tags" (additional annotations). Each of these objects gets hashed, and is
referenced by hash. So we need to be able to get the object with a given
hash quickly, and write an object and take its hash (ideally, stream the
write and find out the hash at the end, with the database key set at that
point). Also, this database should be compressed effectively, because it
ought to compress really well, since a lot of the blobs and trees are only
slightly different from other blobs or trees (by whatever changes were
made between that revision and other revisions).
The current implementation of the persistant storage of this database is a
bit complicated, with the goal being that creating objects is really fast,
and looking up objects doesn't degrade too quickly, and there are
optimization operations available that take some time and speed up future
lookups and reduce the storage overhead (especially so that data can be
transferred efficiently). The tricky thing is that, while the optimization
process is running, other programs may be reading the database, so (1) the
files that are no longer needed, because better-optimized versions are in
place, may be open in another task, and (2) complete and correct new
files have to appear and be such that pre-existing tasks will find them
before old files can be removed. The optimization creates "pack files" and
"pack indices", where the pack file has a lot of objects with delta
compression between them and zlib compression of them, and the index files
tell where everything in the pack file is. So we mmap the index files to
search through, and mmap portions of the pack files to get the data out
of, and we may be using them as they're replaced with more comprehensive
pack files by another task.
Now, it's entirely possible that a completely different database
implementation would be better on Windows, but our current one does a lot
of creating files under different names, moving them to names where
they'll be seen (since this is atomic under POSIX, and partial files are
never seen by other tasks). Also, once we have new files in place, we
unlink the files that they replace, so that new tasks will use the new
ones and tasks that already have old ones open can still get the data out
of them. Also, the files generally get mmaped,
I'm probably the one missing something here; I don't really know anything
about Windows, and I only know what code other people have had problems
porting. Mostly what we use for IPC is pipelines, so, if they work well, I
don't know what the problem is.
-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html