Re: [Bugme-new] [Bug 15621] New: BUG: unable to handle kernel paging request - comm: pccardd

Previous thread: Re: execve() returns ENOENT when ld-linux.so isn't found by drepper on Wednesday, March 24, 2010 - 6:49 am. (2 messages)

Next thread: One Million Pounds by British Telecom Promo on Wednesday, March 24, 2010 - 7:20 am. (1 message)
From: Andrew Morton
Date: Wednesday, March 24, 2010 - 4:12 am

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


It looks like the iomem_resource tree got wrecked.  Has anyone been

--

From: Bjorn Helgaas
Date: Thursday, March 25, 2010 - 9:51 am

My pci=use_crs patches change the contents of the iomem_resource tree,
and it's possible they broke some assumptions PCMCIA was making, so
you might see if "pci=nocrs" makes any difference.  If it does, please
attach an acpidump and the entire dmesg logs with and without that option.
--

From: Dominik Brodowski
Date: Thursday, March 25, 2010 - 10:01 am

... and /proc/iomem as well as /proc/ioports , please.
--

From: Ozgur Yuksel
Date: Monday, March 29, 2010 - 2:12 am

Using pci=nocrs workarounds the problem. For data collection, since the boot
does not complete without the w/a - only dmesg is available. 

With pci=nocrs, accessing /proc/iomem gets killed by kernel for some reason.

/proc/iomem /proc/ioports and acpidump are provided for 2.6.31-20-generic-pae
kernel for convenience / comparison.
--

From: Bjorn Helgaas
Date: Tuesday, March 30, 2010 - 4:10 pm

Rafael, this is a regression from 2.6.33, in case it's not on your
list yet.

Ozgur, thanks for attaching the logs.  There's some interesting stuff
there that I don't understand yet, such as this from the pci=nocrs dmesg:

  [    1.577758] pci 0000:00:1e.0: PCI bridge to [bus 03-04]
  [    1.583031] pci 0000:00:1e.0:   bridge window [io  0x5000-0x5fff]
  [    1.551889] pci 0000:03:01.0: CardBus bridge to [bus 04-07]
  [    1.557507] pci 0000:03:01.0:   bridge window [io  0x5000-0x50ff]
  [    1.603303] PCI: No. 2 try to assign unassigned res
  [    1.688208] pci 0000:03:01.0: CardBus bridge to [bus 04-07]
  [    1.693826] pci 0000:03:01.0:   bridge window [io  0x0000-0x00ff]

Apparently we moved that CardBus I/O window from [0x5000-0x5fff] to
[0x0-0xff].  I'm dubious about that because the upstream bridge at
00:1e.0 only positively decodes [0x5000-0x5fff] (though it *is* in
subtractive decode mode, so it will forward more).  I wish we had
a little more debug output about when & why we moved that window.

I'm especially dubious because your /proc/ioports with pci=nocrs
from comment 8 (which is the case that's supposed to be working)
contains this:

  5000-5fff : PCI Bus 0000:03
    0000-00ff : PCI CardBus 0000:04
    0000-00ff : PCI CardBus 0000:04

That looks completely broken in terms of the hierarchy.  It looks
like you have a USB device in the CardBus slot (ohci_hcd 0000:04:00.0).
Maybe the broken hierarchy doesn't cause problems with this device
because it doesn't use I/O ports.

Anyway, I'd like to see the entire dmesg log when booted *without*
pci=nocrs, because that's the case that fails.  Since the system doesn't
boot, you'll have to use a serial console or netconsole to collect the
whole thing.  The serial console log in comment 7 is corrupted; it looks
like all the lines got truncated to 80 columns or something.  And please
boot with "ignore_loglevel" so we see all the debug messages on the console.
Also, no need to tar up and compress your attachments -- I ...
From: Ozgur Yuksel
Date: Thursday, April 1, 2010 - 2:18 am

Interestingly when ignore_loglevel is used, the problem does not reproduce. Now
I'll proceed with actions in comment #11.
--

From: Bjorn Helgaas
Date: Thursday, April 1, 2010 - 10:34 am

Using ignore_loglevel shouldn't affect the problem, so I'm confused.
Can you reproduce the original problem and attach the entire serial
console log?
--

From: Ozgur Yuksel
Date: Friday, April 2, 2010 - 9:59 am

It seems that the problem does not reproduce at all now. Unfortunately I do not
have the images I have built on 2010-03-29 08:46 and building from a fresh
ae6be51ed01d6c4aaf249a207b4434bc7785853b does not reproduce the problem. It is
most likely the specific .config I used at the time (which I do not have
anymore). Also I have been doing other builds on the same system, so maybe it
was just a stale module or smth. 

FWIW the problem does not reproduce with 2.6.34-rc3 at all too (on the very same
hardware).
--

Previous thread: Re: execve() returns ENOENT when ld-linux.so isn't found by drepper on Wednesday, March 24, 2010 - 6:49 am. (2 messages)

Next thread: One Million Pounds by British Telecom Promo on Wednesday, March 24, 2010 - 7:20 am. (1 message)