I tried searching on the list for this, before posting, but searching the mailing list archives with keywords such as b43, suspend, resume... brings up such a ludicrous amount of threads that it's not realistic to check them all, so just tell me what to look for if it's been asked already. Whenever I do a suspend to disk after using b43, the computer freezes hard as soon as it attempts again to access b43 after resume. Minimal how to reproduce the freeze: 1) modprobe b43 2) hibernate (using any suspend to disk, which one is irrelevant) 3) resume 4) ifconfig wlan0 up This has been happening (at least) since b43 was included in the mainline kernel, on my Asus A6K laptop running an x86_64 kernel (now the latest 2.6.25 stable release or compiled from the latest released debian sid sources). The nvidia module is not responsible: I explicitely booted my laptop in single user mode without any unnecessary modules, same result. It does not happen using the windows driver with ndiswrapper (which I would prefer to avoid for other reasons), so it definitely depends on b43 or something it depends on. Unloading and reloading the b43 module and all the other modules it depends on does not change anything. Just loading the module once, hibernating and resuming means freeze-up as soon as the module is actually initialised next time, regardless of it having been unloaded and reloaded any number of times before or after the suspend-resume cycle. No oopses, nothing on system logs, just instant freeze-up. Is there some testing a user can do to help nailing this? I am not a kernel developer, even if I am a decent C programmer. Please CC me on replies, I am not on the list. Thanks in advance, Giacomo Mulas -- _________________________________________________________________ Giacomo Mulas <gmulas@ca.astro.it> _________________________________________________________________ OSSERVATORIO ASTRONOMICO DI CAGLIARI Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA) Tel. (OAC): +39 ...
I think you need the appended patch, but it only applies to linux-next.
Thanks,
Rafael
---
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
net/mac80211/tx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-next/net/mac80211/tx.c
===================================================================
--- linux-next.orig/net/mac80211/tx.c
+++ linux-next/net/mac80211/tx.c
@@ -1144,7 +1144,7 @@ static int ieee80211_tx(struct net_devic
struct ieee80211_tx_data tx;
ieee80211_tx_result res_prepare;
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- int ret, i;
+ int ret, i, retries = 0;
u16 queue;
queue = skb_get_queue_mapping(skb);
@@ -1206,6 +1206,13 @@ retry:
*/
if (!__netif_subqueue_stopped(local->mdev, queue)) {
clear_bit(queue, local->queues_pending);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
store->skb = skb;
--
A different version has been merged into what will become 2.6.26. I'll see what we can do about stable. johannes
http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux-2.6.git;a=3Dcomm= itdiff;h=3Def3a62d272f033989e83eb1f26505f93f93e3e69;hp=3D6d1a3fb567a728d314= 74636e167c324702a0c38b Anybody have a stable tree around to see if that applies? I think it should. johannes
It didn't, but this version will. It has been compile tested only.
Larry
===========================
Index: linux-2.6/net/mac80211/tx.c
===================================================================
--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
ieee80211_tx_handler *handler;
struct ieee80211_txrx_data tx;
ieee80211_txrx_result res = TXRX_DROP, res_prepare;
- int ret, i;
+ int ret, i, retries = 0;
WARN_ON(__ieee80211_queue_pending(local, control->queue));
@@ -1181,6 +1181,13 @@ retry:
if (!__ieee80211_queue_stopped(local, control->queue)) {
clear_bit(IEEE80211_LINK_STATE_PENDING,
&local->state[control->queue]);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
memcpy(&store->control, control,
Ah, the TXRX result thing, thanks a bunch. Adding stable to CC, can you
pick this up?
Subject: mac80211: detect driver tx bugs
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.
Thanks to Larry Finger <Larry.Finger@lwfinger.net> for doing the -stable
port.
--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
ieee80211_tx_handler *handler;
struct ieee80211_txrx_data tx;
ieee80211_txrx_result res = TXRX_DROP, res_prepare;
- int ret, i;
+ int ret, i, retries = 0;
WARN_ON(__ieee80211_queue_pending(local, control->queue));
@@ -1181,6 +1181,13 @@ retry:
if (!__ieee80211_queue_stopped(local, control->queue)) {
clear_bit(IEEE80211_LINK_STATE_PENDING,
&local->state[control->queue]);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
memcpy(&store->control, control,
--
Rafael, you misled me :) This is a completely different thing. johannes
Ah, sorry then. I was too quick with my response. Rafael --
No trouble, it reminded me that I wanted to ask stable to pick up that patch anyway although I don't think we ever ran into the issue there. This seems very odd though, Giacomo, are you sure it also happens if you unload the module? johannes
I'm confused. Should the "mac80211: detect driver tx bugs" patch be sent to stable? Larry --
to stable? Yeah I think it still should even if that's not the bug here. johannes
yes, absolutely (unfortunately). I can unload the module before suspending,
reload it after resuming, same result; I can actually do any number of
suspend/resumes, unload and reload the modules any number of times,
everything still works until I try to ifconfig up the interface, then it
hangs solid. I also tried old-style suspend to disk, suspend2, user-space
suspend... exactly the same.
I am now compining a kernel with the patch you sent, to see whether this
improves things. I will let you know. By the way, is there a module
debugging option I could use to cause the b43 and/or mac80211 modules to use
lots of printk's, so that I could at least give you a hint as to where the
code hangs?
Thanks, bye
Giacomo
P.S. please CC replies to me, I'm not on the list(s)
--
_________________________________________________________________
Giacomo Mulas <gmulas@ca.astro.it>
_________________________________________________________________
OSSERVATORIO ASTRONOMICO DI CAGLIARI
Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA)
Tel. (OAC): +39 070 71180 248 Fax : +39 070 71180 222
Tel. (UNICA): +39 070 675 4916
_________________________________________________________________
"When the storms are raging around you, stay right where you are"
(Freddy Mercury)
_________________________________________________________________
--
That starts to sound like some core problem -- bug in b43 does not explain symptoms you see. What other drivers does b43 share interrupt Adding printks to b43's resume should be easy... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
