hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
contexts, so the non-IRQ context has to disable IRQs in order to prevent
deadlocking with IRQ context.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
drivers/usb/core/hcd.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index bf10e9c..19279ed 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1000,8 +1000,9 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);
int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
{
int rc = 0;
+ unsigned long flags;
- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);
/* Check that the URB isn't being killed */
if (unlikely(urb->reject)) {
@@ -1034,7 +1035,7 @@ int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
goto done;
}
done:
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
return rc;
}
EXPORT_SYMBOL_GPL(usb_hcd_link_urb_to_ep);
@@ -1106,10 +1107,11 @@ EXPORT_SYMBOL_GPL(usb_hcd_check_unlink_urb);
*/
void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb)
{
+ unsigned long flags;
/* clear all state linking urb to this dev (and hcd) */
- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);
list_del_init(&urb->urb_list);
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
}
EXPORT_SYMBOL_GPL(usb_hcd_unlink_urb_from_ep);
--
Which non-irq context is that? Regards Oliver --
One example -- assume usb_submit_urb() called from non-IRQ context. Then usb_hcd_submit_urb() -> rh_urb_enqueue() -> rh_queue_status() -> usb_hcd_link_urb_to_ep(). -- Jiri Kosina SUSE Labs --
This turns out not to be the case. Interrupts are disabled.
static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
{
int retval;
unsigned long flags;
int len = 1 + (urb->dev->maxchild / 8);
spin_lock_irqsave (&hcd_root_hub_lock, flags);
if (hcd->status_urb || urb->transfer_buffer_length < len) {
dev_dbg (hcd->self.controller, "not queuing rh status urb\n");
retval = -EINVAL;
goto done;
}
retval = usb_hcd_link_urb_to_ep(hcd, urb);
I'll investigate.
Regards
Oliver
--
You're right, this callchain can't cause the deadlock indeed. I'll go through the other possibilities. To give some background to others reading this thread -- Leonardo (in CC) reported [1] that this patch fixes hard kernel hangs he has been seeing when using USB CDMA modem. Still, we don't currently seem to see the place where the interrupts get enabled that makes the deadlock to trigger. [1] https://bugzilla.novell.com/show_bug.cgi?id=378509 -- Jiri Kosina SUSE Labs --
usb_hcd_flush_endpoint() is an obvious case. Used in the suspend code. Your patch is indeed correct, but I fear there might be a second bug caused by wrong calling conditions. Regards Oliver --
The functions you are worried about (usb_hcd_link_urb_to_ep and usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts be disabled by their callers. This patch isn't needed. (Of course, other patches may be needed to insure that the callers do It's not obvious to me. Where does usb_hcd_flush_endpoint() acquire The problem in the Novell bugzilla entry was caused by the fact that the OHCI irq routine was invoked with interrupts enabled, owing to a missing IRQF_DISABLED flag. That bug has already been fixed in 2.6.25. Alan Stern --
Sorry, yes looking again, it does complicated stuff with dropping the lock but leaving interrupts disabled. Sorry Oliver --
This requirement is the only reason the whci-hcd driver disables interrupts. Removing this requirement would reduce the time that interrupts are disabled in the whci-hcd driver. David -- David Vrabel, Senior Software Engineer, Drivers CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562 Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/ --
That doesn't sound like a valid approach. If you don't disable interrupts then you aren't protected against an interrupt handler submitting an URB and accessing your data structures while whci-hcd is in the middle of updating them. Alan Stern --
I can't see how urbs can be submitted to whci-hcd from an interrupt handler (urb callbacks are always called from a workqueue thread) but they could be submitted from a timer, so you are correct. David -- David Vrabel, Senior Software Engineer, Drivers CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562 Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/ --
_Your_ callbacks are always called from a workqueue thread. Another driver's callbacks might not be, and that other driver might submit an Alan Stern --
That indeed is 2.6.25 kernel. I guess you are talking about commit 442258e2ff69 here. If so, the reporter is definitely using the kernel containing this commit, and the lockups still trigger. Seems that my patch is papering over the real bug (someone enabling interrupts somewhere) indeed, but I can't seem to be able to find such codepath. Thanks, -- Jiri Kosina SUSE Labs --
You could try testing the interrupt-enable flag at various places in ohci-hcd (start with finish_urb) and printing an error message if interrupts are enabled. One possibility is that in an earlier call to finish_urb, usb_hcd_giveback_urb was called with interrupts disabled and returned with interrupts enabled. In other words, some driver's callback routine may have enabled interrupts incorrectly. Alan Stern --
Alan, I am trying your suggestion right now. Lets see if it finds something. Thanks, Leonardo --
Are we sure the interrupt line is not shared with another driver that calls spin_unlock_irq() in its interrupt handler? Regards Oliver --
Sorry, yes other code paths take it with enabled interrupts. The patch is good and necessary for 2.6.26. Regards Oliver --
Just wondering,... doesn't lockdep say anything about the situation? --
