[Bug #12805] QinQ vlan trunking regression

Previous thread: IGMP Join dropping multicast packets by Dave Boutcher on Saturday, March 14, 2009 - 1:16 pm. (11 messages)

Next thread: 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 by Rafael J. Wysocki on Saturday, March 14, 2009 - 12:11 pm. (2 messages)
From: Rafael J. Wysocki
Date: Saturday, March 14, 2009 - 12:01 pm

This message contains a list of some regressions from 2.6.28, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.28, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2009-03-14      124       36          32
  2009-03-03      108       33          28
  2009-02-24       95       32          24
  2009-02-14       85       33          27
  2009-02-08       82       45          36
  2009-02-04       66       51          39
  2009-01-20       38       35          27
  2009-01-11       13       13          10


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12872
Subject		: pwc mmap always fails with EAGAIN
Submitter	: Markus <M4rkusXXL@web.de>
Date		: 2009-03-14 16:42 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=123704902201378&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12871
Subject		: usb bluetooth crashes system
Submitter	: Pavel Machek <pavel@ucw.cz>
Date		: 2009-03-10 11:23 (5 days old)
References	: http://marc.info/?l=linux-kernel&m=123668450400940&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12870
Subject		: 2.6.29-rc "TKIP: replay detected" regression
Submitter	: Hugh Dickins <hugh@veritas.com>
Date		: 2009-03-11 12:07 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=123677337219148&w=4
Handled-By	: John W. Linville <linville@tuxdriver.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12867
Subject		: 2.6.29-rc7 broke r8169 MAC on Thecus n2100 ARM ...
From: Rafael J. Wysocki
Date: Saturday, March 14, 2009 - 12:05 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.28.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12805
Subject		: QinQ vlan trunking regression
Submitter	: Bart Trojanowski <bart@jukie.net>
Date		: 2009-02-28 18:05 (15 days old)
References	: http://marc.info/?l=linux-kernel&m=123584439115868&w=4


--

From: David Miller
Date: Saturday, March 14, 2009 - 3:04 pm

From: "Rafael J. Wysocki" <rjw@sisk.pl>

Fixed by:

commit 9d40bbda599def1e1d155d7f7dca14fe8744bd2b
Author: David S. Miller <davem@davemloft.net>
Date:   Wed Mar 4 23:46:25 2009 -0800

    vlan: Fix vlan-in-vlan crashes.
    
    As analyzed by Patrick McHardy, vlan needs to reset it's
    netdev_ops pointer in it's ->init() function but this
    leaves the compat method pointers stale.
    
    Add a netdev_resync_ops() and call it from the vlan code.
    
    Any other driver which changes ->netdev_ops after register_netdevice()
    will need to call this new function after doing so too.
    
    With help from Patrick McHardy.
    
    Tested-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

--

From: Rafael J. Wysocki
Date: Saturday, March 14, 2009 - 3:26 pm

Thanks, closed.

Rafael
--

From: Jeff Chua
Date: Saturday, March 14, 2009 - 7:58 pm

Still not working in linux-2.6.29-rc8. Broken after the commit below.
There were many changes to wireless after this commit, and simply
reverting this commit will break compiling.

Jeff.



commit 41bb73eeac5ff5fb217257ba33b654747b3abf11
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Wed Oct 29 01:09:37 2008 +0100

    mac80211: remove SSID driver code
--

From: Jeff Chua
Date: Saturday, March 14, 2009 - 8:06 pm

The commit below is causing problem with associating with the hidden AP as well.

Thanks,
Jeff.


71c11fb57b924c160297ccd9e1761db598d00ac2 is first bad commit
commit 71c11fb57b924c160297ccd9e1761db598d00ac2
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Tue Oct 28 18:29:48 2008 +0100

    b43/legacy: remove SSID code

    The SSID programmed into the device is used by the ucode only
    to reply to probe requests, a functionality we disable anyway
    because it doesn't fit with the mac80211/hostapd programming
    model. Therefore, it isn't useful to program the SSID into
    device.

    Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
fc2ca13f0efcd9e51403b5b55ed730e2457a29c0 M      drivers
--

From: Rafael J. Wysocki
Date: Sunday, March 15, 2009 - 3:41 am

Thanks for the update.

Rafael
--

From: Johannes Berg
Date: Sunday, March 15, 2009 - 11:11 am

That's not believable, sorry. I know exactly what the microcode uses the
SSID here for, and it never uses it when we're in station mode.

johannes
From: Linus Torvalds
Date: Sunday, March 15, 2009 - 11:44 am

Jeff - can you test the kernels before-and-after this commit (with _no_ 
other changes) and descibe the differences?

Johannes - "not believable" is simply not an argument. If Jeff can show a 
difference, then your disbelief is totally irrelevant, and clearly shows 
that you are basing your beliefs on incorrect assumptions (like some 
specific version of firmware that isn't the whole story).

		Linus
--

From: Johannes Berg
Date: Sunday, March 15, 2009 - 12:01 pm

Linus, Jeff is totally unbelievable here -- I just realised that the
commit he quotes doesn't even change the driver he's working with.

johannes
From: Ingo Molnar
Date: Sunday, March 15, 2009 - 1:26 pm

Even if so (bisection is very hard and error-prone) why do you 
shape your reaction to it as a personal attack? Why do you say 
"Jeff is totally unbelievable" - why dont you say something more 
amicable like:

 " Hm, that's weird - that commit does not even seem to affect 
   the driver you are working with. Could you please re-check 
   the final bits of the bisection to make sure you got the 
   right commit ID? "

Instead of this irritated-sounding attack tone you are using. 
It's not helpful. Testers are there to help you, not to annoy 
you. If they annoy you then you are in the wrong business.

	Ingo
--

From: Jeff Chua
Date: Monday, March 16, 2009 - 6:24 am

Here's what I did, and it's repeatable.

Take the attached bisect log and replay it, and the last offending
commit is this ...
# git log
commit 71c11fb57b924c160297ccd9e1761db598d00ac2
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Tue Oct 28 18:29:48 2008 +0100

    b43/legacy: remove SSID code

Yes, this is not the real problem, but it's the last commit that cause
the problem, and I couldn't bisect further, typing the next "git
bisect bad" and the commit is

# git bisect bad
71c11fb57b924c160297ccd9e1761db598d00ac2 is first bad commit
commit 71c11fb57b924c160297ccd9e1761db598d00ac2
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Tue Oct 28 18:29:48 2008 +0100

    b43/legacy: remove SSID code


Johannes, Ok, this is not the commit causing the problem, but anything
after this commit doesn't associate with my hidden APs. I may not have
run over every single commits since, but 2.6.29-rc8 is definitely not
associating at all automatically -- only manually by specifying the
AP.

This bug is quite hard to trigger, and it doesn't shows easily in
2.6.28-rc3. May be once every 10 times you tried.



# git reset --hard 71c11fb57b924c160297ccd9e1761db598d00ac2

I've tried on two different Linksys's AP. Association with WAG354G  is
better than with WAG200G ... meaning it's harder to get association
failure on 2.6.28-rc3. 1/10 fail on WAG354G, and 9/10 fails on
WAG200G.

Here's how I associate with the AP.

# modprobe -r iwlagn
# modprobe iwlagn
# iwconfig wlan0 mode Managed
# ifconfig wlan0 up
# iwconfig wlan0 essid "myessid"
# iwconfig wlan0 key restricted "hex key"
# iwconfig wlan0 ap auto channel auto
# ... wait for association
# ping -c 3 <IP of the AP>
# shutdown the interface
# ifconfig wlan0 down
# modprobe -r iwlagn


Repeat these and it'll fail to associate. On WAG200G, can't associate
after 2nd attempt.


Next revert these two commits

commit 4607816f608b42a5379aca97ceed08378804c99f
Author: Johannes Berg ...
From: Linus Torvalds
Date: Monday, March 16, 2009 - 12:57 pm

Taking a bisect log is repeatable, but pointless.

If you made any mistakes in bisecting (marking a kernel that was good as 
being bad, or the other way around), the log will always replay to the 
same thing, but it will still be wrong.

In other words, "git bisect" is only as reliable as the data you feed it, 
and if the behavior isn't 100% repeatable and unambiguous (or if you 
simply made a mistake), you need to double-check things.

So after bisecting a commit, if there is any question what-so-ever whether 
the commit makes sense as a result, you need to double-check it. The best 
way to double-check it is to go back to a known-bad state (preferably the 
tip of the branch) and revert the presumed-bad commit, and verify that it 
really fixes the behavior.

But if that is impossible (for example, because the commit no longer 
reverts cleanly), at least make 100% sure that the state at the commit is 
bad, and then go to (all) parents of that commit and make 100% sure that 
the state at those points is _good_. 

IOW, if you've pinpointed 71c11fb57b924c160297ccd9e1761db598d00ac2 as 
being bad, then you should go back and double-check that its parent 
(in this case 4607816f608b42a5379aca97ceed08378804c99f) is good.

Because if it's parent is also bad, then that just means that you made 
some mistake in "git bisect".

The thing about bisecting is that it is _extremely_ efficient. It takes 
essentially the minimal number of answers to get to the end result. But 
that very efficiency also means that getting even just _one_ of those 
answers wrong will take you _way_ off base. There's no room for error, 
because bisect will take each bit and use it to maximally split the error 
space.

In this case, it really sounds like maybe you marked the parent good, even 
though you should have marked it bad.

			Linus
--

From: Jeff Chua
Date: Monday, March 16, 2009 - 4:55 pm

On Tue, Mar 17, 2009 at 3:57 AM, Linus Torvalds

I should have been more careful, just got thrown off during the last
few steps of the bisect. But with the bad association to the AP after
a57a59f247b651e8ed6d3eeb7e2f9d83b83134c9 (iwlwifi: remove implicit
direct scan), can someone suggest where to go from here?

Meanwhile, I'll try bisecting again.

Thanks,
Jeff.
--

From: Johannes Berg
Date: Tuesday, March 17, 2009 - 12:50 am

That actually makes some sense, though I'm convinced the code I removed
there is actually wrong that doesn't mean it couldn't have had positive
side effects too. I'll take a look at it, in the meantime your time
would be better spent trying to capture what's going on on the air
instead of bisecting again.

If you don't have a second device to monitor, you can also create a
monitor interface as such:
	iw dev wlan0 interface add moni0 type monitor flags none

and run tcpdump on the resulting 'moni0' interface while you try to
associate etc. Write the packets to files and send them to me.

Due to this implicit scan modification in the driver that I removed,
however, I won't see scans in that file, so it won't be all that useful,
a capture made on a second device would be much better.

johannes
From: Jeff Chua
Date: Tuesday, March 17, 2009 - 10:21 am

On Tue, Mar 17, 2009 at 3:50 PM, Johannes Berg

Ok, I'll try this out later. Got good news to share. I've applied
John's patches and just need to modified net/mac80211/wext.c slightly
since len and ssid are not defined, and function arguments are
different. I'm checking my tree again ... my seems different from his.

Thanks,
Jeff.
--

From: John W. Linville
Date: Tuesday, March 17, 2009 - 7:48 am

The obvious question for me is did you try this?

	git revert a57a59f247b651e8ed6d3eeb7e2f9d83b83134c9

Does that restore operation for you?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: John W. Linville
Date: Tuesday, March 17, 2009 - 8:28 am

Hmmm...more like this:

git revert 41bb73eeac5ff5fb217257ba33b654747b3abf11
git revert b23f99bcfa12c7b452f7ad201ea5921534d4e9ff
git revert 71c11fb57b924c160297ccd9e1761db598d00ac2
git revert 4607816f608b42a5379aca97ceed08378804c99f
git revert a57a59f247b651e8ed6d3eeb7e2f9d83b83134c9
 

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: Ingo Molnar
Date: Tuesday, March 17, 2009 - 8:39 am

Since you apparently have done this sequence and have
resolved the conflict (which is hard to do for testers
even in trivial cases) - would you mind to post the
resulting combo patch for Jeff to test?

	Ingo
--

From: John W. Linville
Date: Tuesday, March 17, 2009 - 9:05 am

Since Jeff has been using git for bisect, I presumed he could handle
the reverts.  But if you think it is helpful:

	http://bugzilla.kernel.org/attachment.cgi?id=20567&action=view

Hth!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: Jeff Chua
Date: Tuesday, March 17, 2009 - 9:24 am

I tried applying the above patch against the latest linux git
(18439c39e826191c0ef08c3a3271ce7ece46a860), but got a lot of rejects.
Is this supposed to be?


patch -p1 </tar/v2.6/revert-remove-ssid-knowledge-from-driver-series.patch
1 out of 1 hunk FAILED -- saving rejects to file
drivers/net/wireless/adm8211.h.rej
1 out of 2 hunks FAILED -- saving rejects to file
drivers/net/wireless/b43/main.c.rej
1 out of 2 hunks FAILED -- saving rejects to file
drivers/net/wireless/b43legacy/main.c.rej
1 out of 1 hunk FAILED -- saving rejects to file
drivers/net/wireless/iwlwifi/iwl-3945.h.rej
3 out of 4 hunks FAILED -- saving rejects to file
drivers/net/wireless/iwlwifi/iwl-agn.c.rej
1 out of 1 hunk FAILED -- saving rejects to file
drivers/net/wireless/iwlwifi/iwl-dev.h.rej
3 out of 5 hunks FAILED -- saving rejects to file
drivers/net/wireless/iwlwifi/iwl3945-base.c.rej
1 out of 3 hunks FAILED -- saving rejects to file include/net/mac80211.h.rej

Thanks,
Jeff.
--

From: John W. Linville
Date: Tuesday, March 17, 2009 - 10:10 am

Works for me...I even re-downloaded the patch from bugzilla.

/home/linville/git/linux-2.6
[linville-t400.local]:> git show
commit 18439c39e826191c0ef08c3a3271ce7ece46a860
Merge: 9e8912e... b35f8ca...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Mar 17 08:59:33 2009 -0700

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
      dm crypt: wait for endio to complete before destruction
      dm crypt: fix kcryptd_async_done parameter
      dm io: respect BIO_MAX_PAGES limit
      dm table: rework reference counting fix
      dm ioctl: validate name length when renaming


/home/linville/git/linux-2.6
[linville-t400.local]:> patch -p1 < revert-remove-ssid-knowledge-from-driver-series.patch 
patching file drivers/net/wireless/adm8211.c
patching file drivers/net/wireless/adm8211.h
patching file drivers/net/wireless/b43/main.c
patching file drivers/net/wireless/b43legacy/main.c
patching file drivers/net/wireless/iwlwifi/iwl-3945.h
patching file drivers/net/wireless/iwlwifi/iwl-agn.c
patching file drivers/net/wireless/iwlwifi/iwl-dev.h
patching file drivers/net/wireless/iwlwifi/iwl-scan.c
patching file drivers/net/wireless/iwlwifi/iwl3945-base.c
patching file include/net/mac80211.h
patching file net/mac80211/ieee80211_i.h
patching file net/mac80211/main.c
patching file net/mac80211/mlme.c
patching file net/mac80211/wext.c

Perhaps you have a dirty tree?

	git checkout -f

Does that help?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: Jeff Chua
Date: Tuesday, March 17, 2009 - 10:27 am

Definitely. Even with the failing chunks, and I copied back the old
wext.c from 2.6.28-rc3, and now my wireless is associating to the
hidden AP on 2.6.29-rc8. I tried just a few times, and it's ok so far.
I'll have to pit against the WAG200G that seems to have a worse
behavior tomorrow. ... anyway, this is all good. With the patch, it's
_impossible_ to "auto" associate to the AP.  Definitely on the right
track!

Thanks,
Jeff.
--

From: Jeff Chua
Date: Tuesday, March 17, 2009 - 10:31 am

Oh, now I know why. I was "mucking" the SSID stuffs trying to revert
some of the wireless codes on 2.6.29-rc8, and just applied your
patches on top.

My downloaded tree is clean. I'm trying your patches now.

Jeff.
--

From: Jeff Chua
Date: Tuesday, March 17, 2009 - 11:26 am

John,

Your patches applied cleanly this time. And associating to the hidden
AP on 2.6.29-rc8. I'll run it against the WAG200G tomorrow. But from
what I can see, it's working well with the WAG354G.

I'll try out this as well "iw dev wlan0 interface add moni0 type
monitor flags none"

Your guys are great!

Thank you.

Jeff.
--

From: Johannes Berg
Date: Tuesday, March 17, 2009 - 12:22 pm

I'm on 0191b62 now and cannot reproduce the problem with iwlwifi

Compile iwlwifi with debugging please, and instead of plain modprobe
iwlagn, do this:
	modprobe iwlagn debug=3D0x800 debug50=3D0x800

Then send me the relevant dmesg output from a working and a failed
attempt. You should see something like this

[  318.420537] ieee80211 phy4: U iwl_bg_request_scan Start direct scan for =
'myssid'

in the log. I can't see any reason why it would be missing. For me, the
association is instantaneous after saying "ap any". This is expected
too, because
	iwconfig wlan0 essid "myessid"
will have triggered a directed scan for the AP.

There are two possible failure scenarios that I can imagine:

1) You see no line like the one above in your log, but rather

[  736.047879] ieee80211 phy5: U iwl_bg_request_scan Start indirect scan.

This would indicate a bug in the driver.


2) You do see the line with the SSID, but you don't get any reply. In
this case, please try doing it manually:
	iwlist wlan0 scan essid 'myssid'
Wait about 15 seconds between each attempt of doing so, and report
whether your AP is listed in the results or not. If it isn't most of the
time, then your AP is broken.

johannes
From: Jeff Chua
Date: Wednesday, March 18, 2009 - 7:58 pm

On Wed, Mar 18, 2009 at 3:22 AM, Johannes Berg

I think I know why it works on your but not mind.

I've tracked down to the sequence of iwconfig that causes it to fail.

I can now get vanilla 2.6.28-rc8 to work (7/10 times) by changing the
sequence of iwconfig.


This loop does not work at all without John's patch , but will work
100% when patched.
        iwconfig wlan0 mode Managed essid xxx key restricted xxx
        for((i =3D 0; i < 5; i++))
        do
                iwconfig wlan0 ap auto channel auto  # auto inside loop
                iwconfig wlan0 | grep -q "Access Point: Not-Associated"
                [ $? -ne 0 ] && break
                echo ".\c"
                sleep 1
        done


This loop only works 8 of 10 times with/without the patch.
        iwconfig wlan0 mode Managed essid xxx key restricted xxx
        iwconfig wlan0 ap auto channel auto  # auto outside loop
        for((i =3D 0; i < 5; i++))
        do
                iwconfig wlan0 | grep -q "Access Point: Not-Associated"
                [ $? -ne 0 ] && break
                echo ".\c"
                sleep 1
        done


The only difference is having "iwconfig wlan0 ap auto channel auto"


Can be broken if it works with the patch? Also, it works with WinXP,
Nokia phone, and everything else.

Attached are 4 logs (all runs with the "auto" outside loop).
  nopatch.fail
  nopatch.pass
  patched.fail
  patched.pass


Thanks,
Jeff.
From: Jeff Chua
Date: Wednesday, March 18, 2009 - 8:25 pm

I've modified it a little, and now it works 100% without patch, by
using "iwlist scan" instead of "sleep 1" ...
        iwconfig wlan0 mode Managed essid xxx key restricted xxx
        iwconfig wlan0 ap auto channel auto  # auto outside loop
        for((i = 0; i < 5; i++))
        do
                iwlist wlan0 scan >/dev/null  #use scan instead of sleep
                iwconfig $DEV | grep -q "Access Point: Not-Associated"
                [ $? -ne 0 ] && break
                echo ".\c"
        done


So, this will work for older kernel and well as 2.6.29-rc8.

Rafael, can we close the case? It's the iwconfig sequence that used to
work on 2.6.28-rc3 but now needs to be updated for 2.6.29-rc8.

Thanks,
Jeff.
--

From: Jeff Chua
Date: Wednesday, March 18, 2009 - 9:23 pm

Not yet. I reran a few more times, and without patch, it'll fail sometimes.

Here's the 3 files. With patch, pass every time (so far).

nopatch.fail
nopatch.pass

*** this is in nopatch.fail ***

2009-03-19T12:07:22.437578+08:00 boston kernel: ieee80211 phy0: U
iwl_bg_request_scan Start direct scan for 'sdg2088a88'

2009-03-19T12:07:24.337644+08:00 boston kernel: ieee80211 phy0: U
iwl_bg_request_scan Start indirect scan.


For the passing run, it's always "direct scan".


Thanks,
Jeff.
From: Johannes Berg
Date: Thursday, March 19, 2009 - 9:59 am

That's because of your 'iwlist wlan0 scan' command, it triggers an
indirect scan, if you did 'iwlist wlan0 scan essid sdg2088a88' you would
get a direct scan there. It's always 'direct' for when it passes because
then you don't get to the point where you scan again, I think.

All in all, I don't see much in the logs, it seems to be behaving
properly and asking for a direct scan when you set the essid. The kernel
will not try to connect to an encrypted network before you give it a
key, but "iwconfig wlan0 ap any" should trigger the association
process...

Can you do something else for me?

Get iw (some distros ship it, or see wireless.kernel.org) and enter this
command:
	iw dev wlan0 interface add moni0 type monitor flags none

Then,
	ip link set moni0 up

Start a capture:
	iwevent > /tmp/event.txt

Now do, in a separate shell:
	iwconfig wlan0 essid ...
	iwconfig wlan0 key ...

_Now_ start tcpdump:
	tcpdump -i moni0 -s 10000 -w /tmp/dump.pkt

and in a separate shell do:
	iwlist wlan0 scan last > /tmp/scan.txt ; iwconfig wlan0 ap any

(this is important in one command so it's timed closed together)

Send me the contents of all the files from a failed run please.

johannes
From: Jeff Chua
Date: Friday, March 20, 2009 - 10:19 am

On Fri, Mar 20, 2009 at 12:59 AM, Johannes Berg

Attached. 2 runs.

Thanks,
Jeff.
From: Jeff Chua
Date: Wednesday, March 18, 2009 - 9:49 pm

Ignore the above loop thing. The cause seems to be this one instead.


# this needs patch to work ...
iwconfig wlan0 mode Managed
ifconfig wlan0 up
iwconfig wlan0 essid xxx
iwconfig wlan0 key restricted xxx
iwconfig wlan0 ap auto channel auto


# this works with patch ...
iwconfig wlan0 mode Managed essid xxx key restricted xxx
ifconfig wlan0 up
iwconfig wlan0 ap auto channel auto

It looks the placement of "ifconfig" matters. But works on 2.6.28-rc3.


Thanks,
Jeff.
--

From: Johannes Berg
Date: Thursday, March 19, 2009 - 2:38 am

If you swap the key and essid lines, it will probably always work. But
I've yet to analyse your data to see why it doesn't in the other case.

johannes
From: John W. Linville
Date: Thursday, March 19, 2009 - 7:13 am

That is what I was going to suggest.  I go so far as to say that you
should set everything else before doing the "iwconfig wlan0 essid
xxx" bit.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: Frans Pop
Date: Thursday, March 19, 2009 - 8:02 am

Mostly just curious, but is that actually required by some wireless 
standard? If not, is it really reasonable to ask userland to do things in 
that particular order?

Reason I ask is that for example when writing wireless support for e.g. a 
distro installation system, it seems most logical to *first* ask the user 
what network (ESSID) he wants to connect to. Next to check if we can 
connect to that network without additional authentication and only then, 
if needed, ask for keys etc.
If it's not possible to set that info in that logical order that seems 
rather restrictive to me and would probably mean that you'd have to reset 
AP, ESSID and possibly other settings before each incremental attempt.

Cheers,
FJP
--

From: John W. Linville
Date: Thursday, March 19, 2009 - 8:24 am

You can ask the user for the data in whatever order you like, but
when you are done collecting it you should issue the "iwconfig wlan0
essid xxx" command (or execute the SIOCSIWESSID ioctl) last.  IMHO,
it is silly to even bother setting the SSID before you have set any
required key or (if you so choose) selecting an AP or channel.

This is a limitation of the wireless extensions API -- nothing in
the API really defines when an association should be triggered.
The mac80211 component uses the setting of the SSID as the trigger
for association.  AFAIK, this ordering should work with all other
drivers as well.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

From: Jeff Chua
Date: Thursday, March 19, 2009 - 9:45 am

On Thu, Mar 19, 2009 at 11:24 PM, John W. Linville

I just discovered that  "iwconfig wlan0 ap auto channel auto" is the
one causing the problem on 2.6.29-rc8. Remove the line and it'll
associate to the AP. Just tried on two different APs a few times, and
it seems to behave that way. If I execute the "auto" command, it'll
fail to associate, but works fine with the patch.

Jeff.
--

From: Johannes Berg
Date: Thursday, March 19, 2009 - 9:53 am

Wext is a mess, and we've known that for a long time... But no, the
sequence should _not_ be required, it's just _easier_ for the kernel,
and as such has a better probability of succeeding if there are
problems, it should still work though.

However, one thing that will _not_ work is this:

iwconfig wlan0 essid xyz
iwconfig wlan0 key s:xyz

you still need

iwconfig wlan0 ap any

or anything similar after setting the key to trigger the kernel to do

That's a pretty wrong argument, nothing says your software cannot
collect all the information and then give it to the kernel at once
later, I think... In fact, this is required anyway when you use RSN or
WPA (wpa_supplicant needs all information at once), for example.

johannes
From: Frans Pop
Date: Thursday, March 19, 2009 - 12:24 pm

Well, the thing is that we'll already have tried just setting essid to 
check if it's an open network. However, I can see the point of needing to 
set the essid _again_ after asking the key info and setting that.

I can also see how you might have to unset some settings in some cases, 
for example if the NIC has already associated with the wrong network 
(e.g. because there's a totally open network in range).

Our current logic (in Debian Installer) definitely needs improving and 
these pointers will help. Thanks.
--

From: Johannes Berg
Date: Thursday, March 19, 2009 - 12:27 pm

No, there should be no need for that really, an
	iwconfig wlan0 ap any
should always make it associate with the current settings.

Now, this thread is about why it doesn't for Jeff :)

johannes
From: Jeff Chua
Date: Thursday, March 19, 2009 - 9:55 pm

On Thu, Mar 19, 2009 at 5:38 PM, Johannes Berg

Doesn't. Taking away "hiwconfig wlan0 ap auto channel auto" makes it works.

Thanks,
Jeff.
--

From: Jeff Chua
Date: Thursday, March 19, 2009 - 10:20 pm

More discoveries...

It seems position of "ifconfig wlan0 up" matters

1) It can't be before iwconfig which will result in "SET failed on
device wlan0 ; Device or resource busy".

2) _Before_ "essid" and "key" settings. "ap auto channel auto" MUST NOT BE SET.

3) _After_ "essid" and "key". Ensure all iwconfig settings comes
before "ifconfig".


So, it seems "ifconfig" must be done as the last stage.

Thanks,
Jeff.
--

From: Johannes Berg
Date: Friday, March 20, 2009 - 1:32 am

s.

That's a little weird, but not entirely, you probably manage to cut it

This, however, is completely strange. You should always set the
interface up before doing anything with it. wext allows you to do it the
other way around, but that's not quite natural since without it being up
you cannot even scan.

However, -EBUSY isn't returned anywhere in mac80211, and I don't see the
driver passing it out either. So your point 1) confuses me. Can you
explain that a little more?

As for 2), that is very very strange since ap auto channel auto is the
default, so saying that before you do anything else should do anything
at all.

I suspect something is going on in the driver because the ifconfig order
matters and for mac80211, it shouldn't make a difference when the state
machine is really started. I'll probably need to try to reproduce this,
but to be honest between the varying failure modes, undefined wireless
extensions semantics, etc. I'm not very confident I can.

johannes
From: Jeff Chua
Date: Friday, March 20, 2009 - 3:04 am

On Fri, Mar 20, 2009 at 4:32 PM, Johannes Berg

This is what happened ...

# modprobe -r iwlagn
# modprobe iwlagn
# ifconfig wlan0 up
# iwconfig wlan0 mode Managed
Error for wireless request "Set Mode" (8B06) :

I'll try all the different combination again for 2.6.28, and see if
it's the same, and on the other AP that seems harder to associate (but
works well in 2.6.28, and other OSs include Nokia phones ... so I
don't think it's the AP problem ... because it's been around a while
and gone thru many 2.6.xx).

It worked so well before that I didn't even bother to think twice, and
I may have made silly mistakes along the way, so pardon me if I
confused you.

Thanks for your help.

Jeff.
--

From: Johannes Berg
Date: Friday, March 20, 2009 - 3:13 am

Oh, ok, yes, you cannot change the mode while the interface is up.
Though I guess setting it to the same mode should be accepted. Not that

Yeah, I have to admit that an AP problem doesn't make much sense -- but

No worries. It really should still work well. Can I convince you to try
getting the packet dump I asked for in another mail?

johannes
From: Jeff Chua
Date: Friday, March 20, 2009 - 9:14 am

On Fri, Mar 20, 2009 at 6:13 PM, Johannes Berg

Was digging around for the "iw" package and got distracted. I'll test it soon.

I went back to 2.6.28 and found out that I could associate as follows
(but this would not work in 2.6.29-rc8 ... should work with the
patch).

modprobe -r iwlagn
modprobe iwlagn
iwconfig wlan0 mode Managed
ifconfig wlan0 up
iwconfig wlan0 essid "xxx"
iwconfig wlan0 key restricted xxx
iwconfig wlan0 ap auto channel auto
iwlist wlan0 scan >/dev/null


Thanks,
Jeff.
--

From: Johannes Berg
Date: Saturday, March 21, 2009 - 5:09 am

Thing is that this works perfectly fine for me, on very similar
hardware.

OTOH, maybe it got _fixed_ again. Would you check wireless-testing
(http://wireless.kernel.org/en/developers/Documentation#wireless-testing.gi=
t) for me? Or compat-wireless (http://wireless.kernel.org/en/users/Download=
)

johannes
From: Jeff Chua
Date: Saturday, March 21, 2009 - 8:08 am

On Sat, Mar 21, 2009 at 8:09 PM, Johannes Berg

I've set hidden SSID + mac address filtering (only allow my notebook
MAC address) as well as 128 WEP. And no problem to associate with

I'll try that tomorrow. Seems a lot to pull from git.

Thanks,
Jeff.
--

From: Johannes Berg
Date: Saturday, March 21, 2009 - 8:11 am

Ok, I haven't tried WEP or MAC address filtering, but I don't see how
those would make a significant difference. I can try again, but not
git) for me? Or compat-wireless (http://wireless.kernel.org/en/users/Downlo=

Thanks, I appreciate it. You should be able to cut it down by using
--reference /path/to/local/linux-git-tree or just compat (though I would
recommend using the git tree). I've also tried to go back to the
versions you were testing before and had no problem.

johannes
From: Zhang Rui
Date: Sunday, March 15, 2009 - 6:02 pm

Previous thread: IGMP Join dropping multicast packets by Dave Boutcher on Saturday, March 14, 2009 - 1:16 pm. (11 messages)

Next thread: 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 by Rafael J. Wysocki on Saturday, March 14, 2009 - 12:11 pm. (2 messages)