Re: All 2.6.26-rcX hang immediately after loading ohci_hcd

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andrey Borzenkov <arvidjaar@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <dbrownell@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-usb@...>, Alan Stern <stern@...>, Greg Kroah-Hartman <gregkh@...>
Date: Saturday, July 5, 2008 - 4:51 pm

The problem seems to be:

On Sat, 5 Jul 2008, Andrey Borzenkov wrote:
...

ie it looks like modprobe is stuck in some endless loop thanks to some 
OHCI probe thing.

There's a lot of other processes then in 'D' state. Some are waiting for 
some IO to complete, others look like they are waiting for some semaphore. 
But those issues look like they may be a secondary result of the primary 
issue (for example, softirq's aren't completing due to the lockup looking 
like it may be in a irq/softirq handler, and the semaphore they are 
waiting for seems to be the device layer semaphore that is held by the 
probing routine already)

There are in fact several runnable tasks, but only the above one is the 
one that seems to be actually hogging the CPU constantly:


It's also a bit sad that the core device infrastructure uses the old-style 
semaphores rather than mutexes, because if it used mutexes the "locks 
held" debugging would show those locks too. As it is, it is silent about 
it, and only points out some relatively uninteresting stuff. But that 
event lock is perhaps relevant:


Anyway, the "show registers" one is the smoking gun, since it shows the 
same modprobe one still running, and still in that same area:


It really looks lik it's some endless loop - possibly due to endless 
interrupts happening while in ohci_hub_status_data().

And I don't think this is due to the recently fixed IRQF_DISABLED bug. 
Admittedly, that bug would likely never show up on UP unless you have 
spinlock debugging enabled, which you obviously do have. That might 
explain why the Mandriva cooker kernel binary works for you. But if it's 
the IRQF_DISABLED thing, any lockup would probably show up as spinning 
recursively on a spinlock, which is not the case for you.

If it _is_ the IRQF_DISABLED bug, then it's fixed in commit 
de85422b94ddb23c021126815ea49414047c13dc, which isn't in any released -rc 
yet (I'm doing -rc9 today which will have it), but has been in the last 
few daily snapshots and obviously is in the current -git tree.

That said, it really looks like it's stuck in some endless loop in 
__do_softirq(). Not that that should be possible (there's an explicit 
loop limit there). So the only "endless" softirq thing would be if 
there are endless hardirqs re-raising it.

So it smells a bit like an interrupt flood to me.

Alan - if it _is_ an interrupt flood, it looks like it basically starts 
just as ohci_hub_status_data() does that spin_unlock_irqrestore() to 
re-enable interrupts. And there has been some changes to 
ohci_root_hub_state_changes() recently, and that OHCI_INTR_RHSC enable 
logic in particular.

That root hub status interrupt changing thing is commit e872154921a6b5 
("USB: don't explicitly reenable root-hub status interrupts"), and it 
_did_ happen after 2.6.25.

Hmm? Alan? Greg?

		Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
All 2.6.26-rcX hang immediately after loading ohci_hcd, Andrey Borzenkov, (Sat Jul 5, 3:08 am)
Re: All 2.6.26-rcX hang immediately after loading ohci_hcd, Linus Torvalds, (Sat Jul 5, 4:51 pm)
Re: All 2.6.26-rcX hang immediately after loading ohci_hcd, Andrey Borzenkov, (Sun Jul 6, 12:59 am)
Re: All 2.6.26-rcX hang immediately after loading ohci_hcd, Andrey Borzenkov, (Fri Jul 11, 1:03 am)
Re: All 2.6.26-rcX hang immediately after loading ohci_hcd, Andrey Borzenkov, (Sun Jul 6, 1:35 am)