Hi Geert, I wrote the blob-size-threshold patch last year to which Jakub Narebski referred. I think there will eventually be a way to better handle large objects in Git. Some possible elements: * Loose objects have a format which can be streamed directly into or out of packs. This avoids a round-trip through zlib, which is a big deal for big objects. This was the effect of the "new" loose object format to which Shawn referred. This was removed apparently because it was ugly and/or difficult to maintain, which I didn't understand since I didn't personally suffer. * Loose objects actually _are_ singleton packs, but saved in .git/objects/xx. Workable, but would never happen due to the extra pack header at the beginning it would add. This takes advantage of the existing pack-to-pack streaming. * Large loose objects are never deltified and/or never packed. The latter was the focus of my patch. * Large loose objects are placed in their own packs in .git/packs . Doesn't work for me since I have too many large objects, thus slowing down _all_ pack operations. All this is complicated by the dual nature of packfiles -- they are used as a "wire format" for serial transmission, as well as a database format for random access. The "magic" entropy detection idea is cute, but probably not needed -- using the blob size should be sufficient. Trying to (re)compress an incompressible _smallish_ blob is probably not worth trying to avoid, and any computation on sufficiently large blobs should be avoided. Hopefully I can return to this problem after New Year's. And perhaps with the expanding Git userbase, more people will have "large blob" problems ;-) and there will be more interest in better addressing this usage pattern. At the moment, I am thinking about how to better structure git's handling of very large repositories in a team entirely connected by high-speed LAN. It seems a method where each user has a repository with deep history, but shallow blobs, would be ideal, but that's also very different from how git does things now. Have fun, Dana How On Wed, Aug 13, 2008 at 9:01 AM, Geert Bosch <bosch@adacore.com> wrote:-- Dana L. How danahow@gmail.com +1 650 804 5991 cell -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Andrew Morton | -mm merge plans for 2.6.23 |
| KAMEZAWA Hiroyuki | Re: 2.6.23-mm1 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
git: | |
| Alan Cox | Re: [PATCH] drivers/net: remove network drivers' last few uses of IRQF_SAMPLE_RANDOM |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
