Hi Geert,
I wrote the blob-size-threshold patch last year to which
Jakub Narebski referred.I think there will eventually be a way to better handle large
objects in Git. Some possible elements:
* Loose objects have a format which can be streamed
directly into or out of packs. This avoids a round-trip through zlib,
which is a big deal for big objects. This was the effect of the "new"
loose object format to which Shawn referred. This was
removed apparently because it was ugly and/or difficult
to maintain, which I didn't understand since I didn't personally
suffer.
* Loose objects actually _are_ singleton packs, but saved
in .git/objects/xx. Workable, but would never happen due to
the extra pack header at the beginning it would add. This
takes advantage of the existing pack-to-pack streaming.
* Large loose objects are never deltified and/or never packed.
The latter was the focus of my patch.
* Large loose objects are placed in their own packs in .git/packs .
Doesn't work for me since I have too many large objects,
thus slowing down _all_ pack operations.
All this is complicated by the dual nature of packfiles --
they are used as a "wire format" for serial transmission,
as well as a database format for random access.The "magic" entropy detection idea is cute, but probably not
needed -- using the blob size should be sufficient. Trying to
(re)compress an incompressible _smallish_ blob is probably
not worth trying to avoid, and any computation on sufficiently large
blobs should be avoided.Hopefully I can return to this problem after New Year's. And
perhaps with the expanding Git userbase, more people will have
"large blob" problems ;-) and there will be more interest in
better addressing this usage pattern.At the moment, I am thinking about how to better structure
git's handling of very large repositories in a team entirely
connected by high-speed LAN. It seems a method where
each user has a repository with deep history, but shallow
blobs, would be ideal, but that's also very different from
how git does things now.Have fun,
Dana How
On Wed, Aug 13, 2008 at 9:01 AM, Geert Bosch wrote:
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| James Bottomley | Re: Announce: Linux-next (Or Andrew's dream :-)) |
| Michal Piotrowski | Re: 2.6.21-rc5-mm4 |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
| Lovich, Vitali | RE: [PATCH] Packet socket: mmapped IO: PACKET_TX_RING |
git: | |
