Re: new: regression iwl3945/mac80211 endless after suspend associate/deassociate loop

Previous thread: [RFC][PATCH] MMC: Use write timeout value as read from CSR by Matt Fleming on Monday, September 1, 2008 - 8:12 am. (12 messages)

Next thread: [PATCH] compat: generic compat get/settimeofday by Christoph Hellwig on Monday, September 1, 2008 - 9:22 am. (2 messages)
From: Michael S. Tsirkin
Date: Monday, September 1, 2008 - 9:07 am

Starting with 2.6.27-rc5, my T60p sometimes fails to associate
with an access point after suspend to ram/resume.

This does not seem to ever happen on 2.6.27-rc4.

When this happens, I observe the following messages in dmesg:
(AP mac masked out with XX):


[16433.458084] iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[16433.502490] PM: Adding info for No Bus:iwl-phy4:radio
[16433.502666] Registered led device: iwl-phy4:radio
[16433.502764] PM: Adding info for No Bus:iwl-phy4:assoc
[16433.502822] Registered led device: iwl-phy4:assoc
[16433.502870] PM: Adding info for No Bus:iwl-phy4:RX
[16433.503006] Registered led device: iwl-phy4:RX
[16433.503055] PM: Adding info for No Bus:iwl-phy4:TX
[16433.503111] Registered led device: iwl-phy4:TX
[16433.505268] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16433.522344] ADDRCONF(NETDEV_UP): eth1: link is not ready
[16433.704125] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16433.705986] eth1: authenticated
[16433.705997] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16433.708406] eth1: RX AssocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16433.708412] eth1: associated
[16433.709659] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[16443.712280] eth1: disassociating by local choice (reason=3)
[16443.712902] eth1: disassociating by local choice (reason=3)
[16444.692059] eth1: no IPv6 routers present
[16445.997384] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16445.999304] eth1: authenticated
[16445.999312] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16446.001784] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16446.001790] eth1: associated
[16446.008222] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16446.009539] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16446.012433] eth1: authenticated
[16446.012440] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16446.015149] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16446.015155] eth1: ...
From: Zhu Yi
Date: Monday, September 1, 2008 - 6:38 pm

It is exactly 10 seconds the local STA sends the deauth_leaving frame
before associated everytime. I wonder if it is the timeout for some
wireless config tools. NM?

What did you do to make it reassociate after resume? If you are using
some automatic config tools, can you try iwconfig manually instead?

Thanks,
-yi

--

From: Johannes Berg
Date: Monday, September 1, 2008 - 11:10 pm

"by local choice" means wext requested this, not a kernel bug.

johannes

--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 6:04 am

Sure, it's wpa_supplicant in ubuntu gutsy doing it.

$wpa_supplicant -version
 wpa_supplicant v0.5.8
 Copyright (c) 2003-2007, Jouni Malinen <j@w1.fi> and contributors

But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.

So yes, this could be some assumption that wpa_supplicant makes and that
fails now. I guess I will try the latest version from source and report.
--

From: Dan Williams
Date: Tuesday, September 2, 2008 - 6:30 am

That's really odd; that patch (the don't-send-empty-extended-rates-IE
patch) shouldn't have anything to do with this behavior, if it did you
wouldn't get associated to the AP in the first place I think.  You're
100% sure that this patch is causing the problem?  Can you post the
output of running wpa_supplicant with the "-dddt" flags and also post
the 'iwlist scan' output for your AP?

Dan

--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 7:58 am

No, not sure. But reverting it seems to have helped.
And there does not seem to be anything else relevant between rc4 and

I put up a (sanitized) -dd output here

OK, I will look into collecting this data.

--

From: Dan Williams
Date: Tuesday, September 2, 2008 - 8:10 am

Logs seem to indicate a normal failure of the 4 way handshake; are you
sure your PSK is correct?  It actually looks like the AP doesn't ever
get message #2 from your machine, and just retransmits message #1 of the
handshake.  I'm pretty sure that if
8ab65b03b7893da4a49009e7e356e36e27b0c407 had anything to do with it, you
wouldn't be getting this far and the AP would have denied association in
the first place.

To me this looks like the 4-way handshake is going wrong and either the
supplicant is killing the connection or the AP is terminating the
connection due to that.  This doesn't rule out driver problems but we'd
need better analysis of the wpa_supplicant logs (be smarter people than
I) to figure out whether I'm on the right track or not.

Dan

--

From: Jouni Malinen
Date: Tuesday, September 2, 2008 - 10:24 am

Well.. The disassociation part not, but what is trickering that

Would it be possible to get unsanitized version and with -t (i.e.,
timestamps)? If you don't want to post it publicly, feel free to send
one directly to me. The one attached to that bug removed quite a bit of

There are more than one different types of issues shown in the log. I'm
not sure which one would be the key one here. Anyway, I would agree that
the commit that removed an empty ext. supp rates IE has nothing to do
with this.


As far as suspend is concerned, which distro is used here? Does it
unload the driver (i.e., remove the netdev) on suspend? If yes, you will
need to update wpa_supplicant 0.6.4 which handles that disappearing and
reappearing interface. If this is something else, I would need to get
more complete debug log from wpa_supplicant..

-- 
Jouni Malinen                                            PGP id EFC895FA
--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 1:50 pm

Could you tell me which bits are needed, and don't include


Hmm, since it manages to reassociate with -rc4 I would guess not.
In any case killing wpa_supplicant after reboot and restarting it does not
--

From: Jouni Malinen
Date: Wednesday, September 3, 2008 - 5:07 am

As long as you don't include -K on the command line, wpa_supplicant
should not include PSK or password or any other keys, etc. in the debug
log. It will include MAC addresses, though (and they are useful for

OK. If wpa_supplicant restart does not resolve the issue, this points
towards something in the kernel getting into an unexpected state. And
yes, that would indeed mean that it has nothing to do with the fix I
mentioned.

-- 
Jouni Malinen                                            PGP id EFC895FA
--

From: John W. Linville
Date: Tuesday, September 2, 2008 - 8:06 am

So you determined that commit by eye rather than by git bisect?
Any chance I could talk you into confirming that using git bisect?

Thanks,

John
-- 
John W. Linville
linville@tuxdriver.com
--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 1:53 pm

It's a bit time consuming because the bug triggers often, but not in
100% of suspends. So you need to fiddle with it a bit after each
--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 6:18 am

I think that resume scripts try to do is up the interface, and
network scripts run wpa supplicant (or rather, wpa_cli which talks to
wpa_supplicant).

I could try, but I have wpa ...
-- 
MST
--

From: Michael S. Tsirkin
Date: Tuesday, September 2, 2008 - 6:20 am

FYI I created a bugzilla entry to track this:
http://bugzilla.kernel.org/show_bug.cgi?id=11476

-- 
MST



--

Previous thread: [RFC][PATCH] MMC: Use write timeout value as read from CSR by Matt Fleming on Monday, September 1, 2008 - 8:12 am. (12 messages)

Next thread: [PATCH] compat: generic compat get/settimeofday by Christoph Hellwig on Monday, September 1, 2008 - 9:22 am. (2 messages)