login
Header Space

 
 

Re: Handling large files with GIT

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Ben Clifford <benc@...>
Cc: Martin Langhoff <martin.langhoff@...>, Florian Weimer <fw@...>, <git@...>
Date: Monday, February 13, 2006 - 12:57 am

On Sun, 12 Feb 2006, Linus Torvalds wrote:

Btw, one thing to realize is that git is inherently a lot better at 
handling lots of files in _subdirectories_, especially if those 
subdirectories don't change.

I've never used maildir layout, but if it is a couple of large _flat_ 
subdirectories, git will potentially handle that a lot worse than if you 
have a hierarchy of directories.

I say "potentially", because if the directories are all mutable and 
change, then the flat approach is better. But if they tend to have some 
kind of stability, a lot of git operations (diffing and merging in 
particular) are able to see that two subdirectories are 100% equal, and 
will entirely skip them.

This is a large part of why git performs well on the kernel. Most merges 
don't actually touch all - or even a very big percentage - of the over 
thousand subdirectories in the kernel. Git can quickly see and ignore the 
whole subdirectory when that happens - the SHA1 is exactly the same, so 
git knows that every file under that subdirectory (and every recursive 
directory) is the same.

In contrast, if you have a million files in one directory, and 10 of them 
changed, git will still have to check the SHA1's for matches for the other 
999,990 files. Which is going to be slow.

That said, I suspect there's space for optimization. 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Handling large files with GIT, Martin Langhoff, (Wed Feb 8, 5:14 am)
Re: Handling large files with GIT, Johannes Schindelin, (Wed Feb 8, 7:54 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 8, 12:34 pm)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 8, 1:01 pm)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 8, 4:11 pm)
Re: Handling large files with GIT, Florian Weimer, (Wed Feb 8, 5:20 pm)
Re: Handling large files with GIT, Martin Langhoff, (Wed Feb 8, 6:35 pm)
Re: Handling large files with GIT, Ben Clifford, (Sun Feb 12, 9:26 pm)
Re: Handling large files with GIT, Linus Torvalds, (Sun Feb 12, 11:42 pm)
Re: Handling large files with GIT, Linus Torvalds, (Mon Feb 13, 12:57 am)
Re: Handling large files with GIT, Linus Torvalds, (Mon Feb 13, 1:05 am)
Re: Handling large files with GIT, Ian Molton, (Mon Feb 13, 7:17 pm)
Re: Handling large files with GIT, Johannes Schindelin, (Tue Feb 14, 2:56 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 3:52 pm)
Re: Handling large files with GIT, Sam Vilain, (Tue Feb 14, 5:21 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 6:01 pm)
Re: Handling large files with GIT, Junio C Hamano, (Tue Feb 14, 6:30 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 10:05 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 10:18 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 10:33 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 11:58 pm)
Re: Handling large files with GIT, Fredrik Kuivinen, (Thu Feb 16, 4:32 pm)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 15, 5:54 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 11:25 pm)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 15, 11:29 pm)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 11:44 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 1:16 pm)
Re: Handling large files with GIT, Sam Vilain, (Tue Feb 14, 8:40 pm)
Re: Handling large files with GIT, Martin Langhoff, (Tue Feb 14, 10:07 pm)
Re: Handling large files with GIT, Junio C Hamano, (Tue Feb 14, 9:39 pm)
Re: Handling large files with GIT, Sam Vilain, (Wed Feb 15, 12:03 am)
Re: Handling large files with GIT, Martin Langhoff, (Mon Feb 13, 7:19 pm)
Re: Handling large files with GIT, Jeff Garzik, (Mon Feb 13, 1:55 am)
Re: Handling large files with GIT, Keith Packard, (Mon Feb 13, 2:07 am)
Re: Handling large files with GIT, Martin Langhoff, (Mon Feb 13, 8:07 pm)
Re: Handling large files with GIT, Linus Torvalds, (Mon Feb 13, 12:19 pm)
Re: Handling large files with GIT, Martin Langhoff, (Mon Feb 13, 12:40 am)
Re: Handling large files with GIT, Greg KH, (Thu Feb 9, 12:54 am)
Re: Handling large files with GIT, Martin Langhoff, (Thu Feb 9, 1:38 am)
speck-geostationary