I wanted to to a 'git gc' on my kernel repo, but that seemed to end in a
loop: loads of CPU usage, no output. Using 'ps' I found it's not 'git gc'
itself, but 'git reflog' that's causing the problem.
From the strace below it does seem like it still makes some progress, but
I've never had it take anywhere near this long before. Normally it starts
the count of objects almost immediately.
It's using hardly any memory at all but has one core going flat out.
I'm seeing this with both git 1.6.6.1 and 1.7.0.3 on the same repo.
Environment:
- Debian amd64/Lenny; Core Duo x86_64 2.6.34-rc3 -> 1.6.6.1
- Debian i386/Sid; chroot on the same machine -> 1.7.0.3
I've also tried with 2.6.33 to rule out a kernel issue.
Here's the tail end of an strace I ran. I broke it off after 9+ minutes,
but I had let it go for longer than that earlier. You can clearly see
where it starts to "stall" at 21:40:14.
Cheers,
FJP
$ strace -t git reflog expire --all
[...]
21:40:11 open(".git/logs/HEAD.lock", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
21:40:11 open(".git/objects/db/315842dd99ce7fd59697df36185df4b807000a",
O_RDONLY|O_NOATIME) = 23
21:40:11 fstat(23, {st_mode=S_IFREG|0444, st_size=289, ...}) = 0
21:40:11 mmap(NULL, 289, PROT_READ, MAP_PRIVATE, 23, 0) = 0x7f9b135bb000
21:40:11 close(23) = 0
21:40:11 munmap(0x7f9b135bb000, 289) = 0
21:40:11 open(".git/logs/HEAD", O_RDONLY) = 23
21:40:11 fstat(23, {st_mode=S_IFREG|0644, st_size=171397, ...}) = 0
21:40:11 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0) = 0x7f9b135bb000
21:40:11 read(23, "bea4c899f2b5fad80099aea979780ef19"..., 4096) = 4096
21:40:11 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
21:40:11 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0) = 0x7f9b135ba000
21:40:11 read(23, "aca305e8a0209efda740b522024cfdefe"..., 4096) = 4096
21:40:11 read(23, "es\n134a0f063d0c48f08b8731118c4732"..., 4096) = 4096
21:40:12 read(23, "c82b7849a0fe ...FWIW, I have seen this, too, and managed to get an strace snippet that looked similar to what you saw (mostly memory allocation, and otherwise chewing on the CPU). I'm guessing there is some O(n^2) loop in there somewhere. Unfortunately, mine actually completed after a few minutes and I wasn't able to replicate. Can you reproduce the problem on your repo? If so, can you possibly tar it up and make it available (probably just the .git directory would be enough)? -Peff --
Yes, I can reproduce. I've run it a few times and broke it off each time before it finished as I really had no idea how much longer it might take. $ du -sh .git 1008M .git I can make that available, but it's going to take a while to upload and I don't want to leave it up too long as I'll be abusing a project service for that. So the people who want to look at it would have grab it fairly promptly (within 2 days or so). Is that OK? Cheers, FJP --
That would be fine. If even that is a problem, we can arrange off-list to get it from you privately and I can stick it somewhere for a week or two. -Peff --
The tarball is up at: http://alioth.debian.org/~fjp/.tmp/linux-2.6_reflog-issue.tar Because of the Easter weekend I'll leave it up a bit longer. I plan to remove it sometime on Thursday. Cheers, FJP --
Thanks, I was able to get it and reproduce your problem. The slowness is in the expire-unreachable code. You can work around it with: git config gc.reflogExpireUnreachable never Obviously that's not really a fix, but it should let your "git gc" work. It looks like we do two merge-base calculations for each reflog entry, which is what takes so long. Perhaps if we know we are going to do a large number of reachability checks, we can pre-mark all reachable commits, and then each reflog entry would just need to check the commit mark. I don't have any more time now to look at it, but I am cc'ing Junio (who wrote the original expire-unreachable code) and Shawn (the resident reflog expert), who may have more input. -Peff --
[sorry, resend, I managed to totally screw up Shawn's email in the last one. Stupid mutt aliases.] Thanks, I was able to get it and reproduce your problem. The slowness is in the expire-unreachable code. You can work around it with: git config gc.reflogExpireUnreachable never Obviously that's not really a fix, but it should let your "git gc" work. It looks like we do two merge-base calculations for each reflog entry, which is what takes so long. Perhaps if we know we are going to do a large number of reachability checks, we can pre-mark all reachable commits, and then each reflog entry would just need to check the commit mark. I don't have any more time now to look at it, but I am cc'ing Junio (who wrote the original expire-unreachable code) and Shawn (the resident reflog expert), who may have more input. -Peff --
Thanks for the analysis, but expire_reflog() that is run for each ref already does that, I think. It first runs mark_reachable(), then walks each reflog entry for the ref to call expire_reflog_ent(), which in turn calls unreachable() that first checks if mark_reachable() has marked the commit, and if so we don't run in_merge_bases(). But if the commit in question is not reachable, then we end up running in_merge_bases() to double-check anyway, which is probably the symptom that was observed. So perhaps this is a workable compromise? builtin/reflog.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/builtin/reflog.c b/builtin/reflog.c index 64e45bd..7e278b8 100644 --- a/builtin/reflog.c +++ b/builtin/reflog.c @@ -230,6 +230,13 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig /* Reachable from the current ref? Don't prune. */ if (commit->object.flags & REACHABLE) return 0; + /* + * Unless there was a clock skew, younger ones that are + * reachable should have been marked by mark_reachable(). + */ + if (cb->cmd->expire_total < commit->date) + return 1; + if (in_merge_bases(commit, &cb->ref_commit, 1)) return 0; --
Hmm. It looks like mark_reachable() stops traversing when it hits a commit older than expire_total. I imagine that's to avoid going all the way to the roots. But if we hit any unreachable entry, in_merge_bases() is going to have to go all the way to the roots, anyway. If we just marked everything, couldn't we then trust the REACHABLE value and not have to do the in_merge_bases double-check? It has worse best-case cost (since we always go to the root), but better worst-case (since we can potentially go to the roots for each unreachable reflog entry). If we haven't done a "reflog expire" in a while, won't we have a bunch of old commits that will still need double-checked, and produce the same slow behavior? Or is that what your "clock skew" is meant to mean? That we would have removed those old ones already assuming the commit date and the reflog entry for them match up? That's not necessarily true if you move to an older commit, which freshens its reflog entry compared to the commit date. I wonder if, in addition to your patch, we should remove the double-check in_merge_bases and simply report those old ones as reachable. We may be wrong, but we are erring on the side of keeping entries, and they will eventually expire in the regular cycle (i.e., 90 days instead of 30). All of that being said, your patch does drop Frans' case down to about 1s of CPU time, so perhaps it is not worth worrying about beyond that. -Peff --
Yeah, an alternative is to keep the list of commits where the initial mark_reachable() run stopped, and instead of doing in_merge_bases(), lazily restart the traversal all the way down to root, and then rely What I meant was you commit A, a child B, rewind clock to commit its child C at an incorrect and old time, and fix clock to commit its child D which is at the tip of a ref. mark_reachable() will stop at C saying that is I think a reasonable solution would be along the lines you described, but the patch you are responding to does err on the wrong side when a clock skew is there. Does it matter? Probably not. --
Ah, yeah, that is much more clever. It has the same worst case
performance as what I proposed, but is much more optimistic that we
True. With the technique you mentioned above, you would reverse your
test and do:
if (flags & REACHABLE)
return 0;
if (expanded_reachable_to_root)
return 1; /* we know it's not */
expand_reachable_to_root();
return !(flags & REACHABLE);
I don't think I care enough to do a patch, though. I don't have a
problem with you applying what you posted earlier.
-Peff
--
Actually I do; I think it breaks correctness a big way (the second paragraph of the proposed log message of the following). -- >8 -- Subject: [PATCH] reflog --expire-unreachable: avoid merge-base computation The option tells the command to expire older reflog entries that refer to commits that are no longer reachable from the tip of the ref the reflog is associated with. To avoid repeated merge_base() invocations, we used to mark commits that are known to be reachable by walking the history from the tip until we hit commits that are older than expire-total (which is the timestamp before which all the reflog entries are expired). However, it is a different matter if a commit is _not_ known to be reachable and the commit is known to be unreachable. Because you can rewind a ref to an ancient commit and then reset it back to the original tip, a recent reflog entry can point at a commit that older than the expire-total timestamp and we shouldn't expire it. For that reason, we had to run merge-base computation when a commit is _not_ known to be reachable. This introduces a lazy/on-demand traversal of the history to mark reachable commits in steps. As before, we mark commits that are newer than expire-total to optimize the normal case before walking reflog, but we dig deeper from the commits the initial step left off when we encounter a commit that is not known to be reachable. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * By the way, the following diff is extremely hard to read, as for some reason "diff" failed to notice that the only change is that we have added lines at the beginning of mark_reachable(), modified only the latter half of unreachable(), and moved mark_reachable() from after unreachable() to before. Neither "git diff --patience" nor running GNU diff on to blobs helps. Hmph... builtin-reflog.c | 96 +++++++++++++++++++++++++++++++---------------------- 1 files changed, 56 insertions(+), 40 deletions(-) diff --git ...
Side note.
It may be an improvement to dig the history even more incrementally.
Inside unreachable(), we currently dig immediately down to root, but it
may give us a better performance in a long history with reflog entries
that wildly jump everywhere in that history if we dug down to the
timestamp of the commit we are looking at. A patch to do so on top of the
previous one may look like this.
builtin-reflog.c | 17 ++++++++++-------
1 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/builtin-reflog.c b/builtin-reflog.c
index 9792090..42d225f 100644
--- a/builtin-reflog.c
+++ b/builtin-reflog.c
@@ -274,16 +274,19 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig
return 0;
}
- /* Reachable from the current ref? Don't prune. */
- if (commit->object.flags & REACHABLE)
- return 0;
+ while (1) {
+ /* Reachable from the current ref? Don't prune. */
+ if (commit->object.flags & REACHABLE)
+ return 0;
- if (cb->mark_list && cb->mark_limit) {
- cb->mark_limit = 0; /* dig down to the root */
+ /* Did we mark everything? Then we know we cannot reach it. */
+ if (!cb->mark_list || !cb->mark_limit)
+ return 1;
+
+ /* Dig down to the timestamp of this commit, or down to root. */
+ cb->mark_limit = (cb->mark_limit < commit->date) ? 0 : commit->date;
mark_reachable(cb);
}
-
- return !(commit->object.flags & REACHABLE);
}
static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
--
1.7.1.rc0.212.gbd88f
--
I doubt it matters much in practice. The important features of the
solution are:
- for most refs, which are either all-reachable or which don't have
clock skew, don't go to the roots at all. This is the fast case that
we should do most of the time.
- for others, don't go to the roots over and over for each entry. This
is the slow case, but we just need to make sure it's a not the
horrible slow case that Frans saw.
Your suggestion speeds up the slow case a little bit, but it is already
acceptably fast. Plus this is an optimistic optimization. There are
still cases where you might have to dig to the roots anyway (e.g.,
Why dig to the timestamp? You know you're looking for a particular
commit, so you can dig down to that commit. If it's reachable, stop
there and you have your answer. Any work you do beyond that might be
used for a further entry lookup, but it might not at all. If it's not
reachable, then you're going to end up digging down to the roots anyway.
With an unreachable ref and your scheme, you would just end up not
finding it, extending again, not finding it, extending again, etc. But
you can never be sure it's truly unreachable without going to the roots
or assuming no clock skew, so that is what we'll end up doing.
-Peff
--
Oh, right. Didn't I even mention that case earlier in the thread? I was just being dumb. Or maybe I was pretending to be dumb, so that I could Patch looked fine from my reading. I no longer have Frans' gigantic test repo available, though, so I can't test whether it fixes the problem (but I'm pretty sure it must from my earlier analysis). -Peff --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel |
