To verify this, simply try with pack.threads = 1. That should help the
memory allocator not to fragment memory allocation across threads
randomly.
Also, going multithreaded _may_ be faster only if you can afford the
increased memory usage. Especially with gc --aggressive, each thread is
adding its own share of memory usage in the delta window.
First thing to try for the biggest possible improvement is
pack.threads=1. On a quad core machine this means repacking 4 times
slower, but this is certainly much faster than 100 times slower when the
system starts swapping. That might even make the resulting pack a tad
tighter due to delta windows not being fragmented across different
threads.
If that is not enough, then try:
pack.deltaCacheSize = 1
core.packedGitWindowSize = 16m
core.packedGitLimit = 128m
This should reduce Git's memory usage while making it slower without
affecting the packing outcome. Again "slower" could mean "much faster"
if by reducing memory usage then swapping is completely avoided.
If that still doesn't help much, then the next tweaks will affect the
packing result:
pack.windowMemory = 256m
Here 256m is arbitrary and must be guessed from the size of the objects
being packed. The idea is to let smallish objects completely fill the
search window (it has 250 entries by default with --aggressive) while
not letting that many huge objects completely eat up all memory. If
there is still swapping going on then you can try 64m instead. That
means that if you have a large set of 1MB objects then the delta search
window will be scaled down to less than 64 entries in that case. This
is why packing might be less optimal as there are fewer delta
combinations being considered.
If this still doesn't prevent swapping then you should really consider
installing more RAM. There are fundamental object accounting structures
that can hardly be shrunk such as struct object_entry in
builtin/pack-objects.c, and one instance of such structure is needed for
each object. On a 64-bit machine this structure occupies 120 bytes,
meaning 2M objects requires 240MB of RAM just for that. The data set
also has to fit in the file cache to avoid IO trashing. So if your
repository is larger than the available RAM then some trashing is almost
unavoidable. Sometimes a badly packed repository may require 2GB of
disk space in the .git directory alone while the fully packed version is
only a few hundred megabytes. Such repositories may need to be repacked
on a big machine first, before machines with less RAM are able to handle
it afterwards.
Hope this helps.
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html