Nicolas Pitre wrote:
quoted text > On Wed, 13 Aug 2008, Shawn O. Pearce wrote:
>
>> Nicolas Pitre wrote:
>>> Well, we are talking about 50MB which is not that bad.
>> I think we're closer to 100MB here due to the extra overheads
>> I just alluded to above, and which weren't in your 104 byte
>> per object figure.
>
> Sure. That should still be workable on a machine with 256MB of RAM.
>
>>> However there is a point where we should be realistic and just admit
>>> that you need a sufficiently big machine if you have huge repositories
>>> to deal with. Git should be fine serving pull requests with relatively
>>> little memory usage, but anything else such as the initial repack simply
>>> require enough RAM to be effective.
>> Yea. But it would also be nice to be able to just concat packs
>> together. Especially if the repository in question is an open source
>> one and everything published is already known to be in the wild,
>> as say it is also available over dumb HTTP. Yea, I know people
>> like the 'security feature' of the packer not including objects
>> which aren't reachable.
>
> It is not only that, even if it is a point I consider important. If you
> end up with 10 packs, it is likely that a base object in each of those
> packs could simply be a delta against a single common base object, and
> therefore the amount of data to transfer might be up to 10 times higher
> than necessary.
>
[cut]
>> This is also true for many internal corporate repositories.
quoted text >> Users probably have full read access to the object database anyway,
>> and maybe even have direct write access to it. Doing the object
>> enumeration there is pointless as a security measure.
>
> It is good for network bandwidth efficiency as I mentioned.
>
As a corporate git user, I can say that I'm very rarely worried
about how much data gets sent over our in-office gigabit network.
My primary concern wrt server side git is cpu- and IO-heavy
operations, as we run the entire machine in a vmware guest os
which just plain sucks at such things.
With that in mind, a config variable in /etc/gitconfig would
work wonderfully for that situation, as our central watering
hole only ever serves locally.
>> I'm too busy to write a pack concat implementation proposal, so
quoted text >> I'll just shutup now. But it wouldn't be hard if someone wanted
>> to improve at least the initial clone serving case.
>
> A much better solution would consist of finding just _why_ object
> enumeration is so slow. This is indeed my biggest grip with git
> performance at the moment.
>
> |nico@xanadu:linux-2.6> time git rev-list --objects --all > /dev/null
> |
> |real 0m21.742s
> |user 0m21.379s
> |sys 0m0.360s
>
> That's way too long for 1030198 objects (roughly 48k objects/sec). And
> it gets even worse with the gcc repository:
>
> |nico@xanadu:gcc> time git rev-list --objects --all > /dev/null
> |
> |real 1m51.591s
> |user 1m50.757s
> |sys 0m0.810s
>
> That's for 1267993 objects, or about 11400 objects/sec.
>
> Clearly something is not scaling here.
>
What are the different packing options for the two repositories?
A longer deltachain and larger packwindow would increase the
enumeration time, wouldn't it?
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html