On Tue, 11 Dec 2007, Jon Smirl wrote:Well... This is weird. It seems that memory fragmentation is really really killing us here. The fact that the Google allocator did manage to waste quite less memory is a good indicator already. I did modify the progress display to show accounted memory that was allocated vs memory that was freed but still not released to the system. At least that gives you an idea of memory allocation and fragmentation with glibc in real time: diff --git a/progress.c b/progress.c index d19f80c..46ac9ef 100644 --- a/progress.c +++ b/progress.c @@ -8,6 +8,7 @@ * published by the Free Software Foundation. */ +#include <malloc.h> #include "git-compat-util.h" #include "progress.h" @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done) if (progress->total) { unsigned percent = n * 100 / progress->total; if (percent != progress->last_percent || progress_update) { + struct mallinfo m = mallinfo(); progress->last_percent = percent; - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", - progress->title, percent, n, - progress->total, tp, eol); + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", + progress->title, percent, n, progress->total, + m.uordblks >> 18, m.fordblks >> 18, + tp, eol); fflush(stderr); progress_update = 0; return 1; This shows that at some point the repack goes into a big memory surge. I don't have enough RAM to see how fragmented memory gets though, since it starts swapping around 50% done with 2 threads. With only 1 thread, memory usage grows significantly at around 11% with a pretty noticeable slowdown in the progress rate. So I think the theory goes like this: There is a block of big objects together in the list somewhere. Initially, all those big objects are assigned to thread #1 out of 4. Because those objects are big, they get really slow to delta compress, and storing them all in a window with 250 slots takes significant memory. Threads 2, 3, and 4 have "easy" work loads, so they complete fairly quicly compared to thread #1. But since the progress display is global then you won't notice that one thread is actually crawling slowly. To keep all threads busy until the end, those threads that are done with their work load will steal some work from another thread, choosing the one with the largest remaining work. That is most likely thread #1. So as threads 2, 3, and 4 complete, they will steal from thread 1 and populate their own window with those big objects too, and get slow too. And because all threads gets to work on those big objects towards the end, the progress display will then show a significant slowdown, and memory usage will almost quadruple. Add memory fragmentation to that and you have a clogged system. Solution: pack.deltacachesize=1 pack.windowmemory=16M Limiting the window memory to 16MB will automatically shrink the window size when big objects are encountered, therefore keeping much fewer of those objects at the same time in memory, which in turn means they will be processed much more quickly. And somehow that must help with memory fragmentation as well. Setting pack.deltacachesize to 1 is simply to disable the caching of delta results entirely which will only slow down the writing phase, but I wanted to keep it out of the picture for now. With the above settings, I'm currently repacking the gcc repo with 2 threads, and memory allocation never exceeded 700m virt and 400m res, while the mallinfo shows about 350MB, and progress has reached 90% which has never occurred on this machine with the 300MB source pack so far. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Peter Zijlstra | [PATCH 6/6] sched: disabled rt-bandwidth by default |
| Tejun Heo | [PATCHSET] CUSE: implement CUSE |
| Richard Jonsson | forcedeth: MAC-address reversed on resume from suspend |
git: | |
| Junio C Hamano | [0/4] What's not in 1.5.2 (overview) |
| Jan Hudec | Smart fetch via HTTP? |
| Johannes Schindelin | Re: git log filtering |
| Junio C Hamano | [PATCH] combine-diff: reuse diff from the same blob. |
| Julien TOUCHE | setting up ssh tunnel/vpn |
| Jordi Prats | OpenBSD with pf on a mini-ITX? |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Reyk Floeter | Re: hoststated(8): DNS Relay uses unexpected source IP address |
| David Miller | Re: [ANNOUNCE] Btrfs v0.12 released |
| Christophe Saout | Re: silent semantic changes with reiser4 |
| Anton Altaparmakov | Re: [RFC] add FIEMAP ioctl to efficiently map file allocation |
| Rik van Riel | Re: [RFD] Incremental fsck |
