Hi all,
a problem report to something giving me a real headache:
[1.] Kernel hangs when initializing ohci-controller
[2.] The version 2.6.22 of the linux kernel hangs when initializing the
integrated ohci controller of the nvidia MCP51 chipset (pci device ids
vendor:product == 10de:26d). I have traced through various printks that
pci_init calls pci_fixup_device, later on in quirk_usb_ohci_handoff
(file linux/drivers/usb/host/pci-quirks.c) kernel freezes in this
section:
...
if (control & OHCI_CTRL_IR) {
int wait_time = 500;
writel(OHCI_INTR_OC, base + OHCI_INTRENABLE);
writel(OHCI_ORC, base + OHCI_CMDSTATUS); // this never returns
...
after this, kernel apparently goes into busy waiting (fans gradually
turn louder) and hangs indefinitely. I have also made sure that writel
(in linux/include/asm/io.h) really is entered, but never returns.[3.] keywords: pci ohci kernel
[4.] /proc/version can not be read, as kernel freezes in startup
[5.] No Oops, no panic
[6.] Reproducible by booting any version 2.6.21+ on that machine
(nvidia MCP51-Chipset, see the lspci output)[7.1] the ver_linux output under 2.6.20.6, in the directory of 2.6.22,
says:Gnu C 4.2.1
Gnu make 3.81
binutils 2.17.50.0.17
util-linux 2.12r
mount 2.12r
module-init-tools 3.2.2
e2fsprogs 1.40
jfsutils 1.1.11
reiserfsprogs 3.6.20
xfsprogs 2.8.21
pcmciautils 014
PPP 2.4.4
Linux C Library > libc.2.6
Dynamic linker (ldd) 2.6
Linux C++ Library so.6.0
Procps 3.2.7
Net-tools 1.60
Kbd 1.12
Sh-utils 6.9
udev 113
wireless-tools 29
Modules Loaded rt2500* nvidia* forcedeth* nvidia and rt2500 are most assuredly not involved in this. They are
not loaded by that kernel.[7.2] Processor information:
proce...
Hi Timo,
Thanks for your report!
2.6.20 works, 2.6.21 doesn't, right? You could try git-bisect on Linus'
tree (if you can use git) to find the offending commit that broke it:http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
and
http://www.reactivated.net/weblog/archives/2006/01/
using-git-bisect-to-find-buggy-kernel-patches/
(long URL broken in 2 lines)You don't need to "make clean" between git-bisect builds, but be
Others have reported problems booting with gcc-4.2-compiled
You're saying there are no modules, then how come those three
are loaded? Also try reproducing the problem without proprietaryAs mentioned earlier, git-bisect could help us narrow this down.
It's not a silver bullet, but often useful.[ BTW, just-after a new kernel release is often an unlucky period to
report bugs, it appears ... everybody gets busy with not missing the
merge window to push in their shiny new stuff :-) ]Thanks,
Satyam
-
Note that hangs in that file almost always mean "your BIOS is goofy".
Hunt for BIOS settings related to USB, and change them. As a rule, if
you tell your BIOS to ignore USB devices (mostly keyboards and disks),Does the current kernel.org GIT tree do the same thing? A bunch
of USB patches were recently merged, including ISTR one in thatShould be unrelated. That patch related to how vendor-specific
implementation differences get detected and handled ... basically
just switching to a table-driven approach that can even handle
board-specific wiring braindamage, rather than the original scheme
which was just a big if/then/else looking only at chip vendors.- Dave
-
This laptop's BIOS only offers "legacy support" enabled or disabled,
both of which lead to frozen kernel. I will investigate whether the GIT
It does the same thing, git5, that is. Sorry I took so long, but I didnt
get to testing this earlier.It is just odd that up to (not including) the 2.6.21-series every kernel
boots, and after that, they just freeze.I am kinda stumped here.
Regards
TL
--
-
Hey, just try git-bisect already :-)
In fact, you can first try by just reverting / un-applying that patch that
you initially had a suspicion on. Or, because you've already spent
some time tracking down the issue, you could simply go through the
git history of that file / subsystem in question and play around reverting
individual patches that you find suspicious -- but really, there's no need
to try and be cute with this: you could simply do a git-bisect (say
between 2.6.20 and 2.6.21) and find the offending patch (or at least the
one that un-hides the bug) that makes the boot fail ...[ BTW you haven't sent your dmesg / boot-time output ... if it isn't
getting saved to disk, you could try serial / netconsole, copy it by
hand, or simply take a photo and post it here. ]Cheers,
Satyam
-
On *your* system, note -- all my OHCI+PCI systems that have
been upgraded to 2.6.22 are behaving just peachy-keen-swell.It gets that way sometimes. Thing is, pci-quirks.c runs early
enough in the boot process -- before the OHCI driver can even
run!! -- that you can probably rule out the USB stack as being
the cause of this regression. Disable the USB host controllersExtremely unlikely to matter, since it wouldn't have been able
to run that early. Plus, you were seeing problems even beforeWhere the subsystem in question is early PCI/ACPI initialization,
before the drivers start binding to PCI devices... it's always
annoying when changes in that area cause USB to break, since the
only involvement of USB is to display a "rude failure" symptom.
It took a long time to get the IRQ setup glitches fixed!One thing you might do is enable all the ACPI debug messaging and
disable the usb/host/pci-quirks.c stuff (just comment it all out),
assuming you can boot without USB keyboard/mouse. Then compare
the relevant diagnostics between "good" and "bad" kernels. It's
likely something interesting will appear.- Dave
-
There could be another thing, of course. The kernel sources (or .config)
needn't be the only variable here -- if you're using the "old" kernel image
for the 2.6.20 kernel that works, it could be the case that perhaps you've
upgraded userspace packages (compiler/toolchain) in the meanwhile
that's causing this breakage ... so to test, try compiling the 2.6.20 on
your system again (with same .config) and see if it works now ...
-
To sum this up:
the userspace 2.6.20.6 (the "good" kernel) and 2.6.22 (the "bad" kernel)
were compiled in is exactly the same setup. I recompiled "good" to check
for that, earlier, but "good" also works then."good" does not exhibit the printks I placed in the section (the same
ones I did for "bad"), making it plausible that the section is not
executed at all.dmesg is not captured to disk, netconsole and serial console also do not
work (they both did in the "good" kernel). Also, my keyboard does not
work with "bad" during that phase -- Magic SysRq is also not working then.I can try to hook up the laptop to an external monitor to capture some
more dmesg, and just shoot a photo, but I am right now trying to work
with git, as Satyam suggested.Thanks very much for reading and helping :-)
Regards,
TL--
-
Hi Timo,
Any updates on this for us? Or did the kernel start booting magically again
ca. 2.6.23-rc6? ;-)Anyway, it appears the bug got introduced sometime between 2.6.20 and
2.6.22 so probably bugzilla becomes a better place to track this one. Could
you open up a bug report (similar to your original post) there?Thanks,
Satyam
-
Should again add that best would still be to simply git-bisect Linus' (mainline)
kernel tree between 2.6.20 (not 2.6.20.6) and 2.6.22 and just find the commit
-
Ok, opened up: http://bugzilla.kernel.org/show_bug.cgi?id=9026
and brought it up to date with the discussion and David's comments on this
thread. Timo, please feel free to revisit this later and update us when you find
the time to do so.[ BTW I think the "add CC:" thing in bugzilla is broken, I was simply unable to
add David Brownell, linux-acpi@ and Timo to the CC: for that bug, if somebody
knows how to do this, please add them ... ]Thanks,
Satyam
-
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Mike Travis | [RFC 00/15] x86_64: Optimize percpu accesses |
| Dave Jones | agp / cpufreq. |
| Willy Tarreau | Re: [PATCH] tcp: splice as many packets as possible at once |
| Gerrit Renker | [PATCH 14/37] dccp: Tidy up setsockopt calls |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
