Re: Switching from CVS to GIT

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Daniel Barkalow
Date: Monday, October 15, 2007 - 10:56 pm

On Tue, 16 Oct 2007, Eli Zaretskii wrote:


Say the project upstream has the file being "README", but, for some 
reason, it has ended up checked out as "readme" in your directory. Since 
your filesystem is case insensitive, it's supposed to be the same file, 
but when git goes through the list of files in the directory, it sees 
"readme", and there's nothing between reachable.h and read-cache.c in the 
list of tracked files. We've got a sorted list of filenames we're tracking 
along with their most-recently-seen content, and we want to merge the 
results of readdir with them, and this is obviously more straightforward 
if the filename that's the match for "README" is provided byte-for-byte 
the same, and therefore sorts the same.


We want both lists sorted, so that we can step through the pair together 
and always reach matches together. This requires that the equivalent names 
sort together, as well as comparing equal.


Ah, that's helpful. We don't actually care too much about the particular 
info in stat; we just want to know quickly if the file has changed, so we 
can hash only the ones that have been touched and get the actual content 
changes.


No, we get our memory with malloc like normal people. The mmap is because 
we want to feed files and parts of files to zlib, and mmap makes that 
easy.


Git is built around a database of objects, which includes "blobs" (file 
content), "trees" (directory structure), "commits" (history linkage), and 
"tags" (additional annotations). Each of these objects gets hashed, and is 
referenced by hash. So we need to be able to get the object with a given 
hash quickly, and write an object and take its hash (ideally, stream the 
write and find out the hash at the end, with the database key set at that 
point). Also, this database should be compressed effectively, because it 
ought to compress really well, since a lot of the blobs and trees are only 
slightly different from other blobs or trees (by whatever changes were 
made between that revision and other revisions).

The current implementation of the persistant storage of this database is a 
bit complicated, with the goal being that creating objects is really fast, 
and looking up objects doesn't degrade too quickly, and there are 
optimization operations available that take some time and speed up future 
lookups and reduce the storage overhead (especially so that data can be 
transferred efficiently). The tricky thing is that, while the optimization 
process is running, other programs may be reading the database, so (1) the 
files that are no longer needed, because better-optimized versions are in 
place, may be open in another task, and (2) complete and correct new 
files have to appear and be such that pre-existing tasks will find them 
before old files can be removed. The optimization creates "pack files" and 
"pack indices", where the pack file has a lot of objects with delta 
compression between them and zlib compression of them, and the index files 
tell where everything in the pack file is. So we mmap the index files to 
search through, and mmap portions of the pack files to get the data out 
of, and we may be using them as they're replaced with more comprehensive 
pack files by another task.

Now, it's entirely possible that a completely different database 
implementation would be better on Windows, but our current one does a lot 
of creating files under different names, moving them to names where 
they'll be seen (since this is atomic under POSIX, and partial files are 
never seen by other tasks). Also, once we have new files in place, we 
unlink the files that they replace, so that new tasks will use the new 
ones and tasks that already have old ones open can still get the data out 
of them. Also, the files generally get mmaped, 


I'm probably the one missing something here; I don't really know anything 
about Windows, and I only know what code other people have had problems 
porting. Mostly what we use for IPC is pipelines, so, if they work well, I 
don't know what the problem is.

	-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Switching from CVS to GIT, Benoit SIGOURE, (Sun Oct 14, 10:10 am)
Re: Switching from CVS to GIT, Marco Costalba, (Sun Oct 14, 11:06 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Sun Oct 14, 11:20 am)
Re: Switching from CVS to GIT, Andreas Ericsson, (Sun Oct 14, 11:27 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Sun Oct 14, 11:39 am)
Re: Switching from CVS to GIT, Andreas Ericsson, (Sun Oct 14, 12:09 pm)
Re: Switching from CVS to GIT, Johannes Schindelin, (Sun Oct 14, 1:14 pm)
Re: Switching from CVS to GIT, Alex Riesen, (Sun Oct 14, 3:14 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Sun Oct 14, 3:41 pm)
RE: Switching from CVS to GIT, Dave Korn, (Sun Oct 14, 3:59 pm)
Re: Switching from CVS to GIT, Johannes Schindelin, (Sun Oct 14, 4:45 pm)
Re: Switching from CVS to GIT, Andreas Ericsson, (Sun Oct 14, 4:55 pm)
RE: Switching from CVS to GIT, Johannes Schindelin, (Sun Oct 14, 5:01 pm)
Re: Switching from CVS to GIT, David Brown, (Sun Oct 14, 5:03 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Sun Oct 14, 9:06 pm)
Re: Switching from CVS to GIT, Martin Langhoff, (Sun Oct 14, 10:35 pm)
Re: Switching from CVS to GIT, Martin Langhoff, (Sun Oct 14, 10:43 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Sun Oct 14, 10:56 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Sun Oct 14, 11:08 pm)
Re: Switching from CVS to GIT, Johannes Sixt, (Sun Oct 14, 11:39 pm)
Re: Switching from CVS to GIT, Johannes Schindelin, (Mon Oct 15, 1:44 am)
Re: Switching from CVS to GIT, David Kastrup, (Mon Oct 15, 1:57 am)
Re: Switching from CVS to GIT, Andreas Ericsson, (Mon Oct 15, 3:16 am)
Re: Switching from CVS to GIT, Johannes Sixt, (Mon Oct 15, 3:38 am)
Re: Switching from CVS to GIT, Andreas Ericsson, (Mon Oct 15, 3:52 am)
RE: Switching from CVS to GIT, Dave Korn, (Mon Oct 15, 4:16 am)
Re: Switching from CVS to GIT, Alex Riesen, (Mon Oct 15, 10:36 am)
Re: Switching from CVS to GIT, Alex Riesen, (Mon Oct 15, 10:49 am)
Re: Switching from CVS to GIT, Alex Riesen, (Mon Oct 15, 10:53 am)
RE: Switching from CVS to GIT, Dave Korn, (Mon Oct 15, 11:25 am)
RE: Switching from CVS to GIT, Johannes Schindelin, (Mon Oct 15, 11:34 am)
Re: Switching from CVS to GIT, Alex Riesen, (Mon Oct 15, 12:34 pm)
Re: Switching from CVS to GIT, Shawn O. Pearce, (Mon Oct 15, 4:12 pm)
Re: Switching from CVS to GIT, Daniel Barkalow, (Mon Oct 15, 5:45 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Mon Oct 15, 9:30 pm)
Re: Switching from CVS to GIT, Andreas Ericsson, (Mon Oct 15, 10:14 pm)
Re: Switching from CVS to GIT, Daniel Barkalow, (Mon Oct 15, 10:56 pm)
Re: Switching from CVS to GIT, David Kastrup, (Mon Oct 15, 11:06 pm)
Re: Switching from CVS to GIT, Johannes Sixt, (Mon Oct 15, 11:10 pm)
Re: Switching from CVS to GIT, Shawn O. Pearce, (Mon Oct 15, 11:21 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Mon Oct 15, 11:25 pm)
Re: Switching from CVS to GIT, Johannes Sixt, (Mon Oct 15, 11:29 pm)
Re: Switching from CVS to GIT, Johannes Sixt, (Mon Oct 15, 11:42 pm)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 12:03 am)
Re: Switching from CVS to GIT, Daniel Barkalow, (Tue Oct 16, 12:07 am)
Re: Switching from CVS to GIT, Steffen Prohaska, (Tue Oct 16, 12:14 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 12:17 am)
Re: Switching from CVS to GIT, Peter Karlsson, (Tue Oct 16, 4:13 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 5:29 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 5:33 am)
Re: Switching from CVS to GIT, Peter Karlsson, (Tue Oct 16, 5:38 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 5:39 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 5:53 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 6:04 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 6:15 am)
Re: Switching from CVS to GIT, Steffen Prohaska, (Tue Oct 16, 6:16 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 6:16 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 6:21 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 6:24 am)
Re: Switching from CVS to GIT, Steffen Prohaska, (Tue Oct 16, 6:50 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 7:14 am)
Re: Switching from CVS to GIT, Steffen Prohaska, (Tue Oct 16, 7:36 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 8:02 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 8:12 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 8:16 am)
Re: Switching from CVS to GIT, Johannes Schindelin, (Tue Oct 16, 8:18 am)
Re: Switching from CVS to GIT, Eli Zaretskii, (Tue Oct 16, 8:43 am)
RE: Switching from CVS to GIT, Dave Korn, (Tue Oct 16, 8:47 am)
Re: Switching from CVS to GIT, David Brown, (Tue Oct 16, 8:56 am)
Re: Switching from CVS to GIT, Nicolas Pitre, (Tue Oct 16, 9:04 am)
RE: Switching from CVS to GIT, Dave Korn, (Tue Oct 16, 9:23 am)
Re: Switching from CVS to GIT, Andreas Ericsson, (Tue Oct 16, 9:59 am)
Re: Switching from CVS to GIT, Daniel Barkalow, (Tue Oct 16, 10:04 am)
Re: Switching from CVS to GIT, Christopher Faylor, (Tue Oct 16, 11:06 am)
Re: Switching from CVS to GIT, Robin Rosenberg, (Wed Oct 17, 12:33 pm)