On Aug 13, 2008, at 10:35, Nicolas Pitre wrote:This is fine, as long as we're not trying to create deltas of the large objects, or do other things that requires keeping the inflated data in memory. Yes, you lose potentially in terms of disk space, but you avoid the large memory footprint during pack generation. For very large blobs, it is best to degenerate to having each revision of each file on its own (whether we call it a single-file pack, loose object or whatever). That way, the large file can stay immutable on disk, and will only need to be accessed during checkout. GIT will then scale with good performance until we run out of disk space. The alternative is that people need to keep large binary data out of their SCMs and handle it on the side. Consider a large web site where I have all scripts, HTML content, as well as a few movies to manage. The movies basically should be copied and stored, only to be accessed when a checkout (or push) is requested. If we mix the very large movies with the 100,000 objects representing the webpages, the resulting pack will become unwieldy and slow even to just copy around during repacks. Why? The only time we'd need to access their contents for checkout or when pushing across the network. These should all be steaming operations with small memory footprint. Agreed, but still, at least very large objects. If I have a 600MB file in my repository, it should just not get in the way. If it gets copied around during each repack, that just wastes I/O time for no good reason. Even worse, it causes incremental backups or filesystem checkpoints to become way more expensive. Just leaving large files alone as immutable objects on disk avoids all these issues. -Geert -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Greg Kroah-Hartman | [PATCH 027/196] tifm: Convert from class_device to device for TI flash media |
| Kok, Auke | Re: Linux 2.6.21-rc1 |
| Trent Piepho | Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Arjan van de Ven | Re: [GIT]: Networking |
| Ingo Molnar | Re: [PATCH 01/10] x86: add Kconfig entry for DMA-API debugging |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
