From: Benedikt Spranger <bene@linutronix.de>
An USB error interrupt (e.g. disconnect) nukes the pending requests for
an ethernet gadget device asynchronously. This can race against
eth_start_xmit(), where we end up dereferencing the list head itself.
The nuke code is serialized against eth_start_xmit via dev->req_lock,
but we need to check the list for empty first instead of unconditionally
accessing dev->tx_reqs.next.
This is a long standing bug, which should be fixed in stable as well.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
index 593e235..f1d7c82 100644
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * dev->tx_reqs may be empty due to an error interrupt which
+ * nuked all requests.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ netif_stop_queue(net);
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* last request in list: stop queue */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
-
Looks OK, although the comments are confusingly incorrect which made it hard to see what this was doing: - There is no "nuke()" method in this driver, even comments don't use that term. - When "nuke" is used in the gadget stack, it means the way that controller drivers scrub out a queue of live requests queued to an endpoint ... *NOT* recycling memory that sits on a freelist, which is what's involved here. - Disconnect is *NOT* an error, it's a routine occurrence which can happen at essentially any time. Please reword: "may be empty because disconnecting an -
From: Benedikt Spranger <bene@linutronix.de>
eth_start_xmit() can race against a disconnect interrupt in the gadget
device driver, which nukes all pending request. Right now we access the
pending request list unconditionally and dereference the request list
head itself in such a case, which results in an Oops.
Check whether the list is empty before actually dereferencing
dev->tx_reqs.next. Also add a comment for the second list_empty check
further down to avoid confusion.
Long standing bug. Patch should be applied to stable as well.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
index 593e235..f2a7bd5 100644
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,21 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * dev->tx_reqs may be empty. We raced against a disconnect
+ * interrupt in the gadget device driver, which nuked all
+ * pending requests.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ netif_stop_queue(net);
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* last request in list: stop queue */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
-
I think you misread my comment. Those requests are **NOT** pending!! So this update has a *MORE* incorrect description of the issue. That's just the freelist ... it's a fairly conventional model whereby there's a pool of "free" request slots which can be issued. When the pool empties, the TX queue shuts down until one of the requests which is pending in the hardware completes, and makes a slot free. The problem you're addressing is that there's a small window where a disconnect IRQ can shut down the TX queue (and empty that freelist) after upper layers in the network stack started a transmission on an active (pre-disconnect) TX queue. That problem is *NOT* related to any pending requests at all!! -
Sorry, I misunderstood your comment. Can you please add the correct comment yourself before we play some more rounds of ping pong ? tglx -
How's this? Note that the queue should already have been stopped,
so I removed what should be an extra call (as well as fixing the
comments).
- Dave
========
From: Thomas Gleixner <tglx@linutronix.de>
This patch fixes a longstanding race in the Ethernet gadget driver,
which can cause an oops on device disconnect. The fix is just to
make the TX path check whether its freelist is empty. That check
is otherwise not necessary, since the queue is always stopped when
that list empties (and restarted when request completion puts an
entry back on that freelist).
The race window starts when the network code decides to transmit a
packet, and ends when hard_start_xmit() grabs the freelist lock.
If disconnect() is called inside that window, it shuts down the
TX queue and breaks the otherwise-solid assumption that packets are
never sent when the TX queue is stopped.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * the freelist can be empty if an interrupt triggered disconnect()
+ * and reconfigured the gadget (shutting down this queue) after the
+ * network stack decided to xmit but before we got the spinlock.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* temporarily stop TX queue when the freelist empties */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
-
Please change to: From: Benedikt Spranger <bene@linutronix.de> He did all the grump work of figuring out what's going wrong. I was just Sigh. I need a real deep look inside that code to understand, why Please add our signed offs as well Signed-off-by: Benedikt Spranger <bene@linutronix.de> Thanks, -
It *is* a list of requests: free ones -- the only kind this level of
driver is allowed to remember! ;)
Yeah, I had to go back and read the driver again before I understood
just what problem this patch was trying to fix. Which is why I wanted
to make sure the mismatch between comments and contents was resolved.
- Dave
========== CUT HERE
From: Benedikt Spranger <bene@linutronix.de>
This patch fixes a longstanding race in the Ethernet gadget driver,
which can cause an oops on device disconnect. The fix is just to
make the TX path check whether its freelist is empty. That check
is otherwise not necessary, since the queue is always stopped when
that list empties (and restarted when request completion puts an
entry back on that freelist).
The race window starts when the network code decides to transmit a
packet, and ends when hard_start_xmit() grabs the freelist lock.
When disconnect() is called inside that window, it shuts down the
TX queue and breaks the otherwise-solid assumption that packets are
never sent through a TX queue that's stopped.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * this freelist can be empty if an interrupt triggered disconnect()
+ * and reconfigured the gadget (shutting down this queue) after the
+ * network stack decided to xmit but before we got the spinlock.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* temporarily stop TX queue when the freelist empties */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
...Fair enough. Thanks for sanitizing the comments. tglx -
