Optimizing cloning of a high object count repository

Previous thread: Re: Saving patches from this list by Stefan on Saturday, December 13, 2008 - 6:28 am. (1 message)

Next thread: Read This Text PLS by bLeNdI bOy on Saturday, December 13, 2008 - 3:24 am. (1 message)
From: Resul Cetin
Date: Saturday, December 13, 2008 - 8:24 am

Hi,
there are currently different ideas to move gentoo's cvs repository to an 
other scm. Current tests showed that svn will not make anything better (it 
gets in most perfomance and size based benchmarks even worse). Another idea is 
to move to git. It looks really promising in size based benchmarks but cloning 
seems nearly impossible. The current test repository is available at 
git://git.overlays.gentoo.org/exp/gentoo-x86.git and is around 900MB in size 
and has 4696137 objects. It really takes ages to do the counting of the 
objects on the server and compressing takes much longer.
The size of the linux repository seems to be smaller but in the same range 
object count and repository size but clones are much much faster. Is there any 
way to optimize the server operations like counting and compressing of objects 
to get the same speed as we get from git.kernel.org (which does it in nearly 
no time and the only limiting factor seems to be my bandwith)?
The only other information I have is that Robin H. Johnson made a single 
~910MiB pack for the whole repository.

Thx in advance,
	Resul
--

From: Nguyen Thai Ngoc Duy
Date: Saturday, December 13, 2008 - 8:46 am

Make yearly packed repository snapshots and publish them via http.
People can wget the latest snapshot, then pull updates later.
-- 
Duy
--

From: Resul Cetin
Date: Saturday, December 13, 2008 - 9:14 am

On Saturday 13 December 2008 16:46:50 you wrote:
That would be a workaround but it doesn't explain why git.kernel.org deliveres 
torvalds repository without any notable counting and compressing time. Maybe 
it has something todo with the config I found inside the repository:
http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
It says that it isnt a bare repository.
Before I forget. I was wrong that it is a single 910mb file. Somebody seems to 
have repacked it into 7 single packs.

Regards,
	Resul
--

From: Jean-Luc Herren
Date: Saturday, December 13, 2008 - 9:44 am

If I remember right, git.kernel.org is a quite beefy machine.  But
then again it has a lot more traffic too.  It might be interesting
to know what machine you're on, compared to git.kernel.org.

jlh
--

From: Resul Cetin
Date: Saturday, December 13, 2008 - 11:20 am

I dont know what type of machine git.overlay.g.o is but my athlon64 3500+ with 
4GB ram has exactly the same problem without any other load. I made a clone  
over http and did no other changes to the repository until now.

http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/ is the http clone 
url.

I will try some stuff to reduce the time spend before sending anything..... If 
anyone has some ideas how to do that....

Regards,
	Resul

--

From: Nicolas Pitre
Date: Saturday, December 13, 2008 - 11:56 am

That's not relevant.

The counting time is a bit unfortunate (although I have plans to speed 
that up, if only I can find the time).

You should be able to skip the compression time entirely though, if you 
do repack the repository first.  And you want it to be as tightly packed 
as possible for public access.  I'm currently cloning it and the 
counting phase is not _that_ bad compared to the compression phase.  Try 
something like 'git repack -a -f -d --window=200' and let it run 
overnight if necessary.  You need to do this only once, and preferably 
on a machine with lots of RAM, and preferably on a 64-bit machine.  Once 
this is done then things should go much more smoothly afterwards.


Nicolas
--

From: Nicolas Pitre
Date: Saturday, December 13, 2008 - 2:50 pm

FYI, I repacked that repository after cloning it, and that operation 
required around 2.5G of resident memory.  Given the address space 
fragmentation, it is possible that a full repack cannot be performed on 
a 32-bit machine.

I did 'git repack -a -f -d --window=500 --depth=100'.  This took less 
than an hour on a quad core machine.  The resulting pack is 695MB in 
size.  That's the amount of data that would be transfered during a 
clone of this repository, and nothing would have to be compressed during 
the clone as everything is already fully compressed.


Nicolas
--

Previous thread: Re: Saving patches from this list by Stefan on Saturday, December 13, 2008 - 6:28 am. (1 message)

Next thread: Read This Text PLS by bLeNdI bOy on Saturday, December 13, 2008 - 3:24 am. (1 message)