Hi Geert, I wrote the blob-size-threshold patch last year to which Jakub Narebski referred. I think there will eventually be a way to better handle large objects in Git. Some possible elements: * Loose objects have a format which can be streamed directly into or out of packs. This avoids a round-trip through zlib, which is a big deal for big objects. This was the effect of the "new" loose object format to which Shawn referred. This was removed apparently because it was ugly and/or difficult to maintain, which I didn't understand since I didn't personally suffer. * Loose objects actually _are_ singleton packs, but saved in .git/objects/xx. Workable, but would never happen due to the extra pack header at the beginning it would add. This takes advantage of the existing pack-to-pack streaming. * Large loose objects are never deltified and/or never packed. The latter was the focus of my patch. * Large loose objects are placed in their own packs in .git/packs . Doesn't work for me since I have too many large objects, thus slowing down _all_ pack operations. All this is complicated by the dual nature of packfiles -- they are used as a "wire format" for serial transmission, as well as a database format for random access. The "magic" entropy detection idea is cute, but probably not needed -- using the blob size should be sufficient. Trying to (re)compress an incompressible _smallish_ blob is probably not worth trying to avoid, and any computation on sufficiently large blobs should be avoided. Hopefully I can return to this problem after New Year's. And perhaps with the expanding Git userbase, more people will have "large blob" problems ;-) and there will be more interest in better addressing this usage pattern. At the moment, I am thinking about how to better structure git's handling of very large repositories in a team entirely connected by high-speed LAN. It seems a method where each user has a repository with deep history, but shallow blobs, would be ideal, but that's also very different from how git does things now. Have fun, Dana How On Wed, Aug 13, 2008 at 9:01 AM, Geert Bosch <bosch@adacore.com> wrote:-- Dana L. How danahow@gmail.com +1 650 804 5991 cell -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Alexandre Oliva | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Eric W. Biederman | Re: [net-2.6.24][patch 2/2] Dynamically allocate the loopback device |
| Ingo Molnar | Re: containers (was Re: -mm merge plans for 2.6.23) |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Michael Riepe | Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too |
