b43 locks the machine when resuming after suspend to disk

Previous thread: Re: [x86-tip] panic during cpu_up by Dhaval Giani on Wednesday, July 2, 2008 - 1:55 pm. (4 messages)

Next thread: none
From: Giacomo Mulas
Date: Wednesday, July 2, 2008 - 1:51 pm

I tried searching on the list for this, before posting, but searching the
mailing list archives with keywords such as b43, suspend, resume... brings
up such a ludicrous amount of threads that it's not realistic to check them
all, so just tell me what to look for if it's been asked already.

Whenever I do a suspend to disk after using b43, the computer freezes hard
as soon as it attempts again to access b43 after resume.

Minimal how to reproduce the freeze:
1) modprobe b43
2) hibernate (using any suspend to disk, which one is irrelevant)
3) resume
4) ifconfig wlan0 up

This has been happening (at least) since b43 was included in the mainline
kernel, on my Asus A6K laptop running an x86_64 kernel (now the latest
2.6.25 stable release or compiled from the latest released debian sid
sources). The nvidia module is not responsible: I explicitely booted my
laptop in single user mode without any unnecessary modules, same result. It
does not happen using the windows driver with ndiswrapper (which I would
prefer to avoid for other reasons), so it definitely depends on b43 or
something it depends on. Unloading and reloading the b43 module and all the
other modules it depends on does not change anything. Just loading the
module once, hibernating and resuming means freeze-up as soon as the module
is actually initialised next time, regardless of it having been unloaded and
reloaded any number of times before or after the suspend-resume cycle. No
oopses, nothing on system logs, just instant freeze-up. Is there some
testing a user can do to help nailing this? I am not a kernel developer,
even if I am a decent C programmer.

Please CC me on replies, I am not on the list.

Thanks in advance,
Giacomo Mulas

-- 
_________________________________________________________________

Giacomo Mulas <gmulas@ca.astro.it>
_________________________________________________________________

OSSERVATORIO ASTRONOMICO DI CAGLIARI
Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA)

Tel. (OAC): +39 ...
From: Rafael J. Wysocki
Date: Wednesday, July 2, 2008 - 2:40 pm

I think you need the appended patch, but it only applies to linux-next.

Thanks,
Rafael

---
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
 net/mac80211/tx.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux-next/net/mac80211/tx.c
===================================================================
--- linux-next.orig/net/mac80211/tx.c
+++ linux-next/net/mac80211/tx.c
@@ -1144,7 +1144,7 @@ static int ieee80211_tx(struct net_devic
 	struct ieee80211_tx_data tx;
 	ieee80211_tx_result res_prepare;
 	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
-	int ret, i;
+	int ret, i, retries = 0;
 	u16 queue;
 
 	queue = skb_get_queue_mapping(skb);
@@ -1206,6 +1206,13 @@ retry:
 		 */
 		if (!__netif_subqueue_stopped(local->mdev, queue)) {
 			clear_bit(queue, local->queues_pending);
+			retries++;
+			/*
+			 * Driver bug, it's rejecting packets but
+			 * not stopping queues.
+			 */
+			if (WARN_ON_ONCE(retries > 5))
+				goto drop;
 			goto retry;
 		}
 		store->skb = skb;
--

From: Johannes Berg
Date: Wednesday, July 2, 2008 - 2:46 pm

A different version has been merged into what will become 2.6.26. I'll
see what we can do about stable.

johannes
From: Johannes Berg
Date: Wednesday, July 2, 2008 - 2:56 pm

http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux-2.6.git;a=3Dcomm=
itdiff;h=3Def3a62d272f033989e83eb1f26505f93f93e3e69;hp=3D6d1a3fb567a728d314=
74636e167c324702a0c38b

Anybody have a stable tree around to see if that applies? I think it
should.

johannes
From: Larry Finger
Date: Wednesday, July 2, 2008 - 3:32 pm

It didn't, but this version will. It has been compile tested only.

Larry

===========================


Index: linux-2.6/net/mac80211/tx.c
===================================================================
--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
  	ieee80211_tx_handler *handler;
  	struct ieee80211_txrx_data tx;
  	ieee80211_txrx_result res = TXRX_DROP, res_prepare;
-	int ret, i;
+	int ret, i, retries = 0;

  	WARN_ON(__ieee80211_queue_pending(local, control->queue));

@@ -1181,6 +1181,13 @@ retry:
  		if (!__ieee80211_queue_stopped(local, control->queue)) {
  			clear_bit(IEEE80211_LINK_STATE_PENDING,
  				  &local->state[control->queue]);
+			retries++;
+			/*
+			 * Driver bug, it's rejecting packets but
+			 * not stopping queues.
+			 */
+			if (WARN_ON_ONCE(retries > 5))
+				goto drop;
  			goto retry;
  		}
  		memcpy(&store->control, control,

From: Johannes Berg
Date: Wednesday, July 2, 2008 - 3:37 pm

Ah, the TXRX result thing, thanks a bunch. Adding stable to CC, can you
pick this up?


Subject: mac80211: detect driver tx bugs

When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Thanks to Larry Finger <Larry.Finger@lwfinger.net> for doing the -stable
port.


--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
 	ieee80211_tx_handler *handler;
 	struct ieee80211_txrx_data tx;
 	ieee80211_txrx_result res = TXRX_DROP, res_prepare;
-	int ret, i;
+	int ret, i, retries = 0;
 
 	WARN_ON(__ieee80211_queue_pending(local, control->queue));
 
@@ -1181,6 +1181,13 @@ retry:
 		if (!__ieee80211_queue_stopped(local, control->queue)) {
 			clear_bit(IEEE80211_LINK_STATE_PENDING,
 				  &local->state[control->queue]);
+			retries++;
+			/*
+			 * Driver bug, it's rejecting packets but
+			 * not stopping queues.
+			 */
+			if (WARN_ON_ONCE(retries > 5))
+				goto drop;
 			goto retry;
 		}
 		memcpy(&store->control, control,



--

From: Johannes Berg
Date: Wednesday, July 2, 2008 - 3:56 pm

Rafael, you misled me :) This is a completely different thing.

johannes
From: Rafael J. Wysocki
Date: Wednesday, July 2, 2008 - 4:08 pm

Ah, sorry then.  I was too quick with my response.

Rafael
--

From: Johannes Berg
Date: Wednesday, July 2, 2008 - 4:07 pm

No trouble, it reminded me that I wanted to ask stable to pick up that
patch anyway although I don't think we ever ran into the issue there.

This seems very odd though, Giacomo, are you sure it also happens if you
unload the module?

johannes
From: Larry Finger
Date: Wednesday, July 2, 2008 - 4:27 pm

I'm confused. Should the "mac80211: detect driver tx bugs" patch be sent to stable?

Larry


--

From: Johannes Berg
Date: Wednesday, July 2, 2008 - 4:41 pm

to stable?

Yeah I think it still should even if that's not the bug here.

johannes
From: Giacomo Mulas
Date: Thursday, July 3, 2008 - 2:19 am

yes, absolutely (unfortunately). I can unload the module before suspending,
reload it after resuming, same result; I can actually do any number of
suspend/resumes, unload and reload the modules any number of times,
everything still works until I try to ifconfig up the interface, then it
hangs solid. I also tried old-style suspend to disk, suspend2, user-space
suspend... exactly the same.

I am now compining a kernel with the patch you sent, to see whether this
improves things. I will let you know. By the way, is there a module
debugging option I could use to cause the b43 and/or mac80211 modules to use
lots of printk's, so that I could at least give you a hint as to where the
code hangs?

Thanks, bye
Giacomo

P.S. please CC replies to me, I'm not on the list(s)

-- 
_________________________________________________________________

Giacomo Mulas <gmulas@ca.astro.it>
_________________________________________________________________

OSSERVATORIO ASTRONOMICO DI CAGLIARI
Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA)

Tel. (OAC): +39 070 71180 248     Fax : +39 070 71180 222
Tel. (UNICA): +39 070 675 4916
_________________________________________________________________

"When the storms are raging around you, stay right where you are"
                          (Freddy Mercury)
_________________________________________________________________
--

From: Pavel Machek
Date: Thursday, July 10, 2008 - 11:10 am

That starts to sound like some core problem -- bug in b43 does not
explain symptoms you see. What other drivers does b43 share interrupt

Adding printks to b43's resume should be easy...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

Previous thread: Re: [x86-tip] panic during cpu_up by Dhaval Giani on Wednesday, July 2, 2008 - 1:55 pm. (4 messages)

Next thread: none