Note: this is on a different repo from the 'git reflog expire --all' I
reported a bit earlier.
I have a git-svn checkout of a subversion repo which I wanted to compress
as much as possible. 'git gc --aggressive' starts to run fairly well, but
eats more and more memory and gets slower and slower. After it gets to
about 45% or 50% progress slows down noticeably and so far I haven't had
the patience to let it finish (40 minutes is already way too long).
A regular 'git gc' run completes without any problems.
$ du -sh .git/
612M .git/
Special about this repo is that it contains two huge objects [1], which
could maybe be a factor:
size pack SHA
- packages/po/sublevel4/da.po:
495661 4654 801cd6451ece536c0ab41f79e09fc52efdf3361f
- packages/arch/powerpc/quik-installer/debian/po/da.po
149515 1403 83a787b20817dc4d72db052de4055e7a7c9221d7
Below some output from top and of the progress of the command showing the
problem. Check the change in number of compressed objects against the
timestamps from top.
Cheers,
FJP
[1] Caused by a bug in a script a couple of years back.
$ git gc --aggressive
Counting objects: 843342, done.
Delta compression using up to 2 threads.
Compressing objects: 53% (449663/836424)
top - 22:55:02 up 18 min, 1 user, load average: 1.83, 1.68, 1.07
Tasks: 161 total, 1 running, 160 sleeping, 0 stopped, 0 zombie
Cpu0 : 91.4%us, 0.7%sy, 0.0%ni, 1.3%id, 6.6%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 97.7%us, 0.3%sy, 0.0%ni, 1.3%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2034284k total, 2018288k used, 15996k free, 10188k buffers
Swap: 2097148k total, 22612k used, 2074536k free, 449444k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5861 fjp 20 0 1775m 1.3g 194m S 188 66.7 21:10.89 git
Counting objects: 843342, done.
Delta compression using up to 2 threads.
Compressing objects: 58% (486001/836424)
top - 23:00:12 up 23 min, 1 user, load average: ...To avoid confusion: these sizes are in kB. --
There's your problem.
$ git help gc | sed -n /--aggressive$/,+3p
--aggressive
Usually git gc runs very quickly while
providing good disk space utilization
and performance. This option will
cause git gc to more aggressively
optimize the repository at the expense
of taking much more time. The effects
of this optimization are persistent, so
this option only needs to be used
occasionally; every few hundred
changesets or so.
Last time I used this option (on Linus's Linux repo), I let the
algorithm do its thing for a couple of hours. Maybe the efficiency
could be vastly improved, but it does finish if you let it.
SIncerely,
Michael Witten
--
As an aside: I didn't realize I copied that in there; this would probably be better: $ git help gc | sed -n /--aggressive$/,/^$/p --
Yes, I had seen that. But there's a difference between taking much more time and slowing down to such an extend that it never finishes. I've tried it today on my linux-2.6 repo as well and the same thing happened. At first the progress is not fast but reasonable. When it gets to about 45% percent it starts slowing down a lot: from ~1500 objects per update of the counters to ~300 objects per update. And who knows what the progress is going to be when it reaches 70% or 90%: 10 per update? With a total of over 2 milion objects in the repository such a low speed is simply not going to work, ever. So I maintain that it is effectively unusable. Cheers, FJP --
Well, all I can do is quote myself:
Last time I used this option (on Linus's Linux repo),
I let the algorithm do its thing for a couple of hours.
Maybe the efficiency could be vastly improved, but
it does finish if you let it.
I think I must have run gc with 1.7.0.2.199.g90a2bf9; perhaps you
could use something like oprofile to figure out where gc is spending
most of its time.
--
Are you sure it doesn't subsequently speed up again? -Miles -- Idiot, n. A member of a large and powerful tribe whose influence in human affairs has always been dominant and controlling. --
I have seen asymptotic slowdown as "git gc --aggressive" progresses on certain repositories. It is particularly bad with git://git.infradead.org/gcc.git (on an x86-64 system with 4 GB RAM). git seemed to be thrashing swap badly as time went on. I don't know that git gc --aggressive would *never* finish on my gcc-git repository. I just know that it got to about 80% done in less than an hour, to 90% after twelve hours, and about 94% after another twelve hours. (The same operation on linux-2.6.git takes about 40 minutes with all the default settings.) I may have been dreaming, but I thought with some 1.6.x version of git, reducing core.packedGitLimit and pack.windowLimit (now windowMemory?) mostly made the thrashing go away. When I try again with v1.7.0.2, though, it doesn't seem to help very much -- there is still a lot of swapping, and the git process got to about 7 GB virtual size before I killed it after about 10 hours of operation. Michael Poole --
I packed Frans' sample kernel repo with "git gc --aggressive" last night. It did finish after about 9 hours. I didn't take memory usage measurements, but here's what time said: real 535m38.898s user 216m46.437s sys 0m24.186s That's 3.6 hours of CPU time over almost 9 hours (on a dual-core machine). The non-agressive pack was about 680M, and the result was 480M. The machine has 2G of RAM, and not much else running. So I would really not expect there to be much disk I/O required, but clearly we were waiting quite a bit. I'll try tweaking a few of the pack memory limits and try again. -Peff --
Hmm, this may be relevant: http://thread.gmane.org/gmane.comp.version-control.git/67791/focus=94797 In my experiments, memory usage is increasing but valgrind doesn't leaks. So perhaps it is fragmentation in the memory allocator. -Peff --
To verify this, simply try with pack.threads = 1. That should help the memory allocator not to fragment memory allocation across threads randomly. Also, going multithreaded _may_ be faster only if you can afford the increased memory usage. Especially with gc --aggressive, each thread is adding its own share of memory usage in the delta window. First thing to try for the biggest possible improvement is pack.threads=1. On a quad core machine this means repacking 4 times slower, but this is certainly much faster than 100 times slower when the system starts swapping. That might even make the resulting pack a tad tighter due to delta windows not being fragmented across different threads. If that is not enough, then try: pack.deltaCacheSize = 1 core.packedGitWindowSize = 16m core.packedGitLimit = 128m This should reduce Git's memory usage while making it slower without affecting the packing outcome. Again "slower" could mean "much faster" if by reducing memory usage then swapping is completely avoided. If that still doesn't help much, then the next tweaks will affect the packing result: pack.windowMemory = 256m Here 256m is arbitrary and must be guessed from the size of the objects being packed. The idea is to let smallish objects completely fill the search window (it has 250 entries by default with --aggressive) while not letting that many huge objects completely eat up all memory. If there is still swapping going on then you can try 64m instead. That means that if you have a large set of 1MB objects then the delta search window will be scaled down to less than 64 entries in that case. This is why packing might be less optimal as there are fewer delta combinations being considered. If this still doesn't prevent swapping then you should really consider installing more RAM. There are fundamental object accounting structures that can hardly be shrunk such as struct object_entry in builtin/pack-objects.c, and one instance of such ...
As a data point, when I do gc, I routinely use --aggressive. It takes a while here, but not forever. (I'm a tad short of 2 million objects) Repo is mainline + next + tip + stable >= 2.6.22 + local branches. git@marge:..git/linux-2.6> time git gc --aggressive Counting objects: 1909894, done. Delta compression using up to 4 threads. Compressing objects: 100% (1889774/1889774), done. Writing objects: 100% (1909894/1909894), done. Total 1909894 (delta 1674098), reused 0 (delta 0) real 22m24.943s user 55m33.756s sys 0m8.149s git is 1.7.0.3 -Mike --
