I ran into this trying 2.6.27-rc3 today:
[6.461876] kernel BUG at mm/vmalloc.c:217!
(full boot log attached)
I bisected it to:
git-bisect bad 1c89ac55017f982355c7761e1c912c88c941483d
which is:
commit 1c89ac55017f982355c7761e1c912c88c941483d
Merge: 88fa08f... b1b135c...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue Aug 12 08:40:19 2008 -0700Merge
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus*
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
fix spinlock recursion in hvc_console
stop_machine: remove unused variable
modules: extend initcall_debug functionality to the module loader
export virtio_rng.h
lguest: use get_user_pages_fast() instead of get_user_pages()
mm: Make generic weak get_user_pages_fast and EXPORT_GPL it
lguest: don't set MAC address for guest unless specifiedBacking out to the previous commit boots just fine. I've attached the
boot.log, config file, lspci.txt and git-bisect-log. If there is
anything else I can do, I'd be happy to oblige, but I'm checking out
right now. I'll check in tomorrow morning (Eastern USA).Alan
On Wed, 13 Aug 2008 19:03:20 -0400
Please confirm that this happened at
static struct vm_struct *
__get_vm_area_node(unsigned long size, unsigned long flags, unsigned long start,
unsigned long end, int node, gfp_t gfp_mask, void *caller)
{
struct vm_struct **p, *tmp, *area;
unsigned long align = 1;
unsigned long addr;Thing is, the code *isn't* in an interrupt. Something got screwed up.
What can happen is that some code has a lock imbalance or a
preempt_disable imbalance and it does preempt_disable() so many times
that the counter overflows eight bits and starts to increment the
softirq counter, then the hardirq counter, then in_interrupt() starts
incorrectly returning true then blam.You could confirm/debug it with something along the lines of this:
--- a/mm/vmalloc.c~a
+++ a/mm/vmalloc.c
@@ -214,7 +214,9 @@ __get_vm_area_node(unsigned long size, u
unsigned long align = 1;
unsigned long addr;- BUG_ON(in_interrupt());
+ if (preempt_count() > 10)
+ printk("%s: preempt_count()=%d\n", __func__, preempt_count());
+ WARN_ON(in_interrupt());
if (flags & VM_IOREMAP) {
int bit = fls(size);_
But this bug could be in practically anywhere in the kernel and
bisection is by far the best way to find it. It's sad and odd that
bisection landed you on a merge commit. I'd suggest that you persist
with the bisection (please).
http://www.kernel.org/doc/local/git-quick.html#example might be useful.--
On Tue, Aug 19, 2008 at 9:46 PM, Andrew Morton
Hm.
Config doesn't even have CONFIG_PREEMPT, so preempt_disable() should be a no-op.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
Sorry, was off on other things for the last couple of days:
I /did/ bisect it down to the aforementioned merge, the question is: How
to crack open that merge into it's composite pieces? (Where do I go to
bisect within that?)I'll try the patch above, and see what that reports.
Alan
--
urgh, it's irritating when git-bisect directs you to a merge commit - it
hasn't done it for me for ages.One (probably wrong) approach is to run
gitk 1c89ac55017f982355c7761e1c912c88c941483d
then peer at the output, work out which real commits were in that
merge.It looks like the merge ended with
b1b135c8d619cb2c7045d6ee4e48375882518bb5 and started with
40c42076ebd362dc69210cccea101ac80b6d4bd4, so perhaps you can dogit bisect bad b1b135c8d619cb2c7045d6ee4e48375882518bb5
git bisect good 40c42076ebd362dc69210cccea101ac80b6d4bd4That has an off-by-one error, in that it assumes that
40c42076ebd362dc69210cccea101ac80b6d4bd4 is good, which we don't know.
Perhaps one can pick the preceding commit in the `git log' output to
fix that.Anyway, I've sent sufficient wrongness to the right list for us to work
out how to do this for real ;)--
Hmm, but doesn't that happen only when it's actually really the merge
commit that introduces the bug? Both parents of the merge commit were...I don't quite get this - according to the bisection log,
# good: [b1b135c8d619cb2c7045d6ee4e48375882518bb5] fix spinlock recursion in hvc_console
and now you want to mark it as bad?
You could try to revert some of the merged commits at the point of the
merge, though.--
Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC. -- Bill Gates
--
On Fri, 22 Aug 2008 19:16:51 +0200
A merge commit doesn't contain any kernel changes? It's the individual
commits (aka "patches") which were in that merge which broke stuff.
Confused.We're trying to dive inside that merge commit to find out which of the
<what bisection log?>
I assume that Alan's bisection search ended up saying that the merge
commit (1c89ac55017f982355c7761e1c912c88c941483d) was the first bad
commit.--
You may find neither parents were buggy, but the result of the merge is.
A trivial example is when one branch changes the semantics of an existing
function and converts all the call sites to the updated semantics, while
the other branch adds a new call site that still relies on the old
behaviour of that function. The merge most likely won't textually
conflict, and neither git merge nor quilt patch would report conflicts,
but the end result is that the new call site added by the latter branch
now gets an unexpected outcome from the function and can misbehave. You
cannot blame the breakage to either branch for such a breakage.
--
On Fri, 22 Aug 2008 14:05:21 -0700
Sure, but
a) whoever added a change like that without also causing the build to
break is slappable andb) there are now two commits (one in each branch) either one of
which, when applied on top of the other branch will introduce the
regression.That's useful infomation, but we don't know how to get
git-bisect to give it to us.It's pretty simple. If git-bisect tells us that the regression was
introduced by a merge commit, we want to perform a bisection within
that merge's individual commits.--
bisect already did that. It asked for the left side and the right side,
both were good. What you can still do is creating a _new_ history where
the commits are not in parallel but linearized, like Jeff described. But
that's (in general) not a trivial task as you need to reapply individual
commit patches which can cause conflicts that were already solved in the
existing merges to show up again. Or, it can even produce new conflicts,
for example when a commit on one side was reverted before the merge
happened. In that case, you get into a state with changes that were
never visible before.Björn
--
I mentioned linearizing the two lines of development in another
message in this thread. But there are, of course, two linearizations,
neither of which might be possible to generate automatically, and one
might give you a better answer than the other.-Peff
--
Alan provided his bisection log as an attachment to the original bug
Yep, and that's totally correct as far as bisect is concerned. The
parents of that merge commit are:
88fa08f67bee1a0c765237bdac106a32872f57d2
b1b135c8d619cb2c7045d6ee4e48375882518bb5And Alan marked both of them as good.
So, unless Alan made a mistake during his bisection, each of the
branches is correct, but the merge did not lead to a correct result. So
while there were no textual conflicts, there were still incompatible
changes regarding the code semantics and compatibility was not restored
during the merge.To get an overview over what got merged together you can can use
something like:
gitk --left-right 1c89ac55017^1...1c89ac55017^2Which shows all commits that were on only one side of the merge, with
nice "arrows" that indicate from which side the commit is coming. The
conflict should be between one commit from the left and one commit from
the right side, obviously.Björn
--
One thing that I have seen proposed (but never tried myself) is that you
can linearize the changes using "rebase -i" (or cherry-picking), and
then bisect that result. That is, given a historyA-B-C-D
\ /
E-Fwhere the merge "D" introduces the bug, you could try creating:
A-B-C-E'-F'
and bisecting that. And you should know that C is good from your
previous bisection, but that F' probably is not, since it should be
textually the same as D (unless, of course, you had textual conflicts
during the rebase that you fixed up differently).So in essence you are testing each of E and F, but based on the other
work. So you should be able to find the one patch that causes the
conflict. And depending on the conflict, you may get more information by
doing it the other way. I.e.,:A-E-F-B'-C'
-Peff
--
Bj
--
You're right - this is where the bisection ended up (in fact, just
removing this merge commit allowed later kernels to boot just fine).I've patched the kernel w/ your suggestion, and now I'm getting
something a bit different - and I'm looking into this...Begin: Loading essential drivers... ...
[ 6.594525] fuse init (API version 7.9)
[ 6.618686] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM CPU_TM2 1
MSFT 100000E)
[ 6.621481] BUG: unable to handle kernel NULL pointer dereference at
0000000000000858
[ 6.625017] IP: [<ffffffff8025e282>] debug_mutex_add_waiter+0x32/0x80[ 6.625017] PGD 21a456067 PUD 21a45f067 PMD 0
[ 6.625017] Oops: 0002 [1] SMP
[ 6.625017] CPU 1
[ 6.625017] Modules linked in: processor(+) fan thermal_sys fuse
[ 6.625017] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #24
[ 6.625017] RIP: 0010:[<ffffffff8025e282>] [<ffffffff8025e282>]
debug_mutex_add_waiter+0x32/0x80[ 6.625017] RSP: 0018:ffff88021a073998 EFLAGS: 00010002
[ 6.625017] RAX: 0000000000000000 RBX: ffff88021a0739d8 RCX:
0000000000000000[ 6.625017] RDX: 0000000000000001 RSI: ffff88021a0739d8 RDI:
ffffffffa0091a60[ 6.625017] RBP: ffff88021a0739b8 R08: ffffffff811deea0 R09:
ffff8800a6fdb000[ 6.625017] R10: ffffffffa008f524 R11: 0000000000000000 R12:
ffffffffa0091a60[ 6.625017] R13: ffff88021a072000 R14: ffff88021a9840a0 R15:
ffffffffa0091a98[ 6.625017] FS: 00007f40ce2de6e0(0000) GS:ffff88022fc02a00(0000)
knlGS:0000000000000000[ 6.625017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6.625017] CR2: 0000000000000858 CR3: 000000022d86b000 CR4:
00000000000006e0[ 6.625017] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000[ 6.625017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400[ 6.625017] Process modprobe (pid: 1259, threadinfo ffff88021a072000,
task ffff88021a9840a0)
[ 6.625017] Stack: 0000000000000000 ffffffffa0091a6...
This is a merge commit that introduced several changes into the
Thanks,
Rafael
--
I came in this morning, and did a 'git bisect reset' and 'git pull'
built, and everything booted fine. I don't know if some fix came in for
something else, or if the problem got "hidden" (some sort of timing
thing, so the problem still lurks). If you feel it's important to track
this down, if you could give me instructions on how to break apart that
one merge, I could then try individual kernels from there...Thanks,
Alan
--
I have created a Bugzilla entry at
http://bugzilla.kernel.org/show_bug.cgi?id=11342
for this problem. I'll close it now, but please reopen if if the problem
reappears.Thanks,
Rafael
--
Sorry, full boot log attached...
