Re: Regression in 2.6.27-rcX caused by commit bc19d6e0b74ef03a3baf035412c95192b54dfc6f

Previous thread: Re: 2.6.26.5-rt9 by Alexander Pazdnikov on Tuesday, September 16, 2008 - 7:16 am. (1 message)

Next thread: Re: motherboard recommendations? by Jacek Poplawski on Tuesday, September 16, 2008 - 7:39 am. (2 messages)
From: Larry Finger
Date: Tuesday, September 16, 2008 - 7:18 am

Since commit bc19d6e0b74ef03a3baf035412c95192b54dfc6f entitled
"b43/b43legacy: add RFKILL_STATE_HARD_BLOCKED support", the radio LED
no longer responds to rfkill switch events.

ATM, I do not have a fix, other than reversion of this commit.

Larry

--

From: Michael Buesch
Date: Tuesday, September 16, 2008 - 8:42 am

It's kind of bogus that it pokes with the software rfkill state
from within the hardware rfkill handler.

Please revert this patch. The commit log says nothing about
why this is needed and I don't see a point in overriding the state
in the rfkill-private data structure from within the driver.

-- 
Greetings Michael.
--

From: Larry Finger
Date: Tuesday, September 16, 2008 - 10:08 am

I agree with Michael. From what I know, the only possible reason for
having this state would be if user space could somehow affect the
state of the hardware switch. As the user's finger is the only such
thing, then there is no use for the RFKILL_STATE_HARD_BLOCKED state,
particularly when it breaks LED operations.

Please revert the patch.

Larry
--

From: Carlos Corbacho
Date: Tuesday, September 16, 2008 - 12:18 pm

When I disable the b43 device in my laptop via acer-wmi (which in turn, calls 
into the laptops firmware), b43 physically cannot re-enable it (a not 
uncommon case on a lot of laptops). In which case, as far as b43 is 
concerned, the wireless radio is then in RFKILL_STATE_HARD_BLOCKED, since b43 
is unable to re-enable the radio on the hardware.

So yes, it is quite possible for b43 to be in RFKILL_STATE_HARD_BLOCKED.

-Carlos
-- 
E-Mail: carlos@strangeworlds.co.uk
Web: strangeworlds.co.uk
GPG Key ID: 0x23EE722D
--

From: Michael Buesch
Date: Tuesday, September 16, 2008 - 12:25 pm

Sure it is. It's the common case, if there's a button.

But I don't know how to tell the rfkill subsystem about the states and
I do not _want_ to. See, that crap code _always_ blew up for every single
patch we did. I'm tired of this and I am not interested in that anymore.
Please revert that patch and if that makes it work I'm fine with it.

I'm still waiting for the sane rfkill API where we have three functions
rfkill-allocate
rfkill-hw-state-changed
rfkill-free

That's all I need. All that input stuff, which was _the_ way to go some
months ago, but is now deprecated it seems (but I don't know what it it
replaced by) is just so really confusing.

-- 
Greetings Michael.
--

From: Henrique de Moraes Holschuh
Date: Tuesday, September 16, 2008 - 3:37 pm

rfkill_force_state().  Must NOT be called from within atomic contextes,
something I haven't got around to find a proper way of fixing, and nobody

I won't go on the rfkill-allocate/-free stuff, messing with that API means
you need to fix a lot of other people's drivers.  But you have the
rfkill-hw-state-changed now, it is called rfkill_force_state().  The only

Read Documentation/rfkill.txt.  And the kernel-doc comments on rfkill.c.
The rfkill documentation was updated.

If you still have doubts after reading the stuff above, ask on
linux-wireless and CC me.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Michael Buesch
Date: Wednesday, September 17, 2008 - 7:26 am

That's a showstopper for us, as we have to change the state from

Yeah well. I didn't know. I lost interest in rfkill pretty much when it
stopped blowing up (until it started again when somebody merged a patch
recently).

-- 
Greetings Michael.
--

From: John W. Linville
Date: Wednesday, September 17, 2008 - 7:29 am

Workqueue?

-- 
John W. Linville
linville@tuxdriver.com
--

From: Michael Buesch
Date: Wednesday, September 17, 2008 - 7:33 am

Yeah, that's possible to implement yet another workaround using a workqueue.
However it would be nice to have rfkill either support atomic context
or implement the workaround in the rfkill core, as I'm sure there are more
devices that report rfkill changes via interrupt.

-- 
Greetings Michael.
--

From: Larry Finger
Date: Tuesday, September 16, 2008 - 12:30 pm

I didn't say it was not possible. What I said is that _ONLY_ the
operator's finger could change the state, just like in your laptop.
Thus it makes absolutely no difference what state RFKILL thinks it is
in. The driver knows what state the software wants, and can read but
not write the hardware state. There has to be a bug in RFKILL that got
exposed when this commit was included. This regression needs to be
fixed ASAP, otherwise 2.6.27 will be released with it. If the RFKILL
guys want to fix their bug, I'll be happy to test code, but not now.

Larry
--

From: Matthew Garrett
Date: Tuesday, September 16, 2008 - 4:32 pm

Of course it makes a difference. The reason why two states are provided 
is to allow userspace to distinguish whether it can unblock the device 
or not. It's clear that b43's rfkill code is astonishingly broken (and 
that's not a criticism of anyone involved - the documentation's 
confusing and there weren't any good examples of how it should be 
implemented).

The real question is how the LED state is supposed to be being toggled, 
and what that's got to do with rfkill. I /think/ that the current state 
of events is:

1) User toggles state
2) State toggle is used by b43 to generate KEY_WLAN (this is broken)
3) rfkill-input toggles the rfkill state, changing the LED state in the 
process

What *should* be happening is:

1) User toggles state
2) b43 changes rfkill state (by using rfkill_force_state). The LED state 
should also be changed in the process.

It looks like the commit in question gets part of the way to 
implementing the second of these situations, but will now change the 
rfkill state at the same time that rfkill-input tries to. I agree that 
it should be reverted for now, but because it's (a) broken (changing the 
state member directly is wrong) and (b) potentially open to strange 
states being generated by rfkill-input trying to change the state after 
it's already been changed. The general intent is correct.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Henrique de Moraes Holschuh
Date: Tuesday, September 16, 2008 - 7:33 pm

I sure hope you mean "the documentation WAS confusing"... because if it is
still confusing, I have not got any comments or ideas about how it could be


Correct.

However, rfkill_force_state() is NOT updating LED states (I just checked).
I will sleep on the issue, and send in a patch tomorrow.

This probably means a small patch to rfkill + Matthew's fixed patch to use
rfkill_force_state() in b43 will fix the regression in the right way.

I don't care either way which kind of fix goes to 2.6.27, though.

The proper fix for rfkill will be in two stages. A small fix now, and a
complete change on the LED handling to use the blocking notifier chain
instead later on (which will clean up rfkill code somewhat).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Larry Finger
Date: Tuesday, September 16, 2008 - 7:52 pm

I do not dispute that rfkill-handling in b43 is broken; however, prior
to the commit in question, it worked.

I also think we can agree that we need to get it working before 2.6.27
is released. If the small fix now is the reversion of bc19d6e, then I
think this is the correct path. We will then have a couple of weeks to
get the code working correctly before the 2.6.28 merge starts.

I admit that I never tested any of the RFKILL patches as they went in.
One of the reasons is that the development process seemed rather
untidy to an outsider, and I wasn't sure that any of the code would
ever be in the kernel. As such, it snuck up on me. I'll not let that
happen again. After the reversion, I will again test any suggested
code changes, but do not expect me to work out any of the changes. I
have enough to do.

Larry

--

From: John W. Linville
Date: Wednesday, September 17, 2008 - 6:23 am

FWIW, Henrique has been very persistant with driving rfkill.  He has
posted and reposted his patch series many times.  I'm sorry that it
snuck-up on you but it was on the list for quite a while.

John
-- 
John W. Linville
linville@tuxdriver.com
--

From: Henrique de Moraes Holschuh
Date: Wednesday, September 17, 2008 - 1:07 pm

The LED state was not being updated by rfkill_force_state(), which will
cause regressions in wireless drivers that had old-style rfkill support and
are updated to use rfkill_force_state().

The LED state was not being updated when a change was detected through the
rfkill->get_state() hook, either.

Move the LED trigger update calls into notify_rfkill_state_change(), where
it should have been in the first place.  This takes care of both issues.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
---
 net/rfkill/rfkill.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

 John, this one is quite likely something that should be sent for
 merge in mainline BEFORE 2.6.27 is released.

 I am NOT sure it fixes regressions, that depends on whether the drivers
 using rfkill that are in 2.6.27 had working LED support before rfkill
 support was added to them.  Unfortunately, it cannot fix the b43
 regression by itself.

diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c
index ea0dc04..f949a48 100644
--- a/net/rfkill/rfkill.c
+++ b/net/rfkill/rfkill.c
@@ -125,6 +125,7 @@ static void rfkill_led_trigger_activate(struct led_classdev *led)
 
 static void notify_rfkill_state_change(struct rfkill *rfkill)
 {
+	rfkill_led_trigger(rfkill, rfkill->state);
 	blocking_notifier_call_chain(&rfkill_notifier_list,
 			RFKILL_STATE_CHANGED,
 			rfkill);
@@ -217,10 +218,8 @@ static int rfkill_toggle_radio(struct rfkill *rfkill,
 			rfkill->state = state;
 	}
 
-	if (force || rfkill->state != oldstate) {
-		rfkill_led_trigger(rfkill, rfkill->state);
+	if (force || rfkill->state != oldstate)
 		notify_rfkill_state_change(rfkill);
-	}
 
 	return retval;
 }
-- 
1.5.6.5

--

From: Larry Finger
Date: Wednesday, September 17, 2008 - 1:55 pm

The b43 regression is not fixed with this patch and the one from
Matthew that starts out with "Oh, hey, I suck. This one might stand a
better chance of not falling over."

Larry


--

From: Henrique de Moraes Holschuh
Date: Thursday, September 18, 2008 - 5:43 am

Curious.  My patch to rfkill WAS tested, and it DOES fix the same issue you
reported (hardware state changes to HARD_BLOCKED do not update the LEDs) in
thinkpad-acpi.  It is also an "obviously correct" patch.

What this probably means is that b43 would need a little more rfkill surgery
than what Matthew's patch already did.  I will look over Matthew's patch,
but my guess is that Michael's comments about the need to add some extra
code to b43 to actually synthesize the rfkill state from the separate HW and
SW rfkill input lines are a strong hint of where the problem might be.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Larry Finger
Date: Thursday, September 18, 2008 - 6:09 am

You must have missed my mail that shows the corrected patch for b43.
See http://marc.info/?l=linux-wireless&m=122171448920267&w=2.

Larry


--

From: Henrique de Moraes Holschuh
Date: Thursday, September 18, 2008 - 6:18 am

Indeed I have.  I will read it now and comment on it.

Thanks for the heads' up :-)

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Ivo van Doorn
Date: Thursday, September 18, 2008 - 5:49 am

From: Michael Buesch
Date: Wednesday, September 17, 2008 - 7:22 am

Read the rfkill code. It toggles a LED trigger if the state changes from

No it shouldn't. LEDs are entirely handled by triggers. We must _never_ toggle
the LED state from within b43 directly via hardcoded stuff.
rfkill is responsible for handling the radio LEDs in the machine.

-- 
Greetings Michael.
--

From: Henrique de Moraes Holschuh
Date: Wednesday, September 17, 2008 - 7:50 am

He means rfkill_force_state() should change the LED state in the process,
which it isn't doing.

It is an one-line patch, I am testing it now.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Larry Finger
Date: Wednesday, September 17, 2008 - 8:28 am

If your one-line patch fixes my regression, then I'll ACK it for 2.6.27.

Larry
--

From: Henrique de Moraes Holschuh
Date: Wednesday, September 17, 2008 - 8:36 am

Sorry, it can't fix that regression by itself, since b43 is not using
rfkill_force_state() in mainline.

It will fix the same issue you are experiencing on b43 in anything that uses
rfkill_force_state(), which means (for mainline): iwlwifi, rt2x00,
thinkpad-acpi and hp-wmi.

It is a regression only on b43 and thinkpad-acpi, as the other drivers above
didn't use rfkill_force_state() in 2.6.26.

To fix the regression in b43, you'd be faced with the need of both my simple
patch AND Matthew's more complex one.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Larry Finger
Date: Wednesday, September 17, 2008 - 8:47 am

I don't expect that the more complex one will be accepted into 2.6.27.

Larry
--

From: Matthew Garrett
Date: Tuesday, September 16, 2008 - 12:51 pm

RFKILL_STATE_HARD_BLOCKED indicates that the hardware is disabled in a 
way that userspace can't influence, so sounds like exactly the right 
state to have here. I still have absolutely no idea what b43 rfkill is 
supposed to be doing - why on earth is it requesting rfkill-input, and 
why does it generate keypress events? I /think/ it should e something 
like the following (untested, I'm not near any of my b43 hardware at the 
moment). This basically does the following:

1) Split the update function in two, so it can be called by either 
polling or an interrupt driven event on newer hardware (not implemented 
yet)
2) Remove all the input handling
3) Change the state updates to use rfkill_force_state, which will 
generate an event that gets sent up to userland
4) Retains the RFKILL_STATE_HARD_BLOCKED code

When the user flicks a switch or presses a button that physically 
disables the radio, the state will now automatically change to 
RFKILL_STATE_HARD_BLOCKED. If they have a key that generates KEY_WLAN 
but doesn't change the radio state, rfkill-input will trigger a change 
to RFKILL_STATE_SOFT_BLOCKED.

Like I said, this is only build-tested - I won't be back at a b43 until 
next week. If someone could give this a go, that would be great.

diff --git a/drivers/net/wireless/b43/rfkill.c b/drivers/net/wireless/b43/rfkill.c
index fec5645..e8b2acb 100644
--- a/drivers/net/wireless/b43/rfkill.c
+++ b/drivers/net/wireless/b43/rfkill.c
@@ -49,21 +49,18 @@ static void b43_rfkill_update_state(struct b43_wldev *dev)
 	struct b43_rfkill *rfk = &(dev->wl->rfkill);
 
 	if (!dev->radio_hw_enable) {
-		rfk->rfkill->state = RFKILL_STATE_HARD_BLOCKED;
+		rfkill_force_state(rfk->rfkill, RFKILL_STATE_HARD_BLOCKED);
 		return;
 	}
 
 	if (!dev->phy.radio_on)
-		rfk->rfkill->state = RFKILL_STATE_SOFT_BLOCKED;
+		rfkill_force_state(rfk->rfkill, RFKILL_STATE_SOFT_BLOCKED);
 	else
-		rfk->rfkill->state = RFKILL_STATE_UNBLOCKED;
-
+		rfkill_force_state(rfk->rfkill, ...
From: Larry Finger
Date: Tuesday, September 16, 2008 - 1:34 pm

It locks up tight with a kernel panic when booting. I have a screen
picture that I will send separately to Matthew.

Larry
--

From: Matthew Garrett
Date: Tuesday, September 16, 2008 - 2:09 pm

Oh, hey, I suck. This one might stand a better chance of not falling 
over.

diff --git a/drivers/net/wireless/b43/rfkill.c b/drivers/net/wireless/b43/rfkill.c
index fec5645..991e318 100644
--- a/drivers/net/wireless/b43/rfkill.c
+++ b/drivers/net/wireless/b43/rfkill.c
@@ -49,21 +49,18 @@ static void b43_rfkill_update_state(struct b43_wldev *dev)
 	struct b43_rfkill *rfk = &(dev->wl->rfkill);
 
 	if (!dev->radio_hw_enable) {
-		rfk->rfkill->state = RFKILL_STATE_HARD_BLOCKED;
+		rfkill_force_state(rfk->rfkill, RFKILL_STATE_HARD_BLOCKED);
 		return;
 	}
 
 	if (!dev->phy.radio_on)
-		rfk->rfkill->state = RFKILL_STATE_SOFT_BLOCKED;
+		rfkill_force_state(rfk->rfkill, RFKILL_STATE_SOFT_BLOCKED);
 	else
-		rfk->rfkill->state = RFKILL_STATE_UNBLOCKED;
-
+		rfkill_force_state(rfk->rfkill, RFKILL_STATE_UNBLOCKED);
 }
 
-/* The poll callback for the hardware button. */
-static void b43_rfkill_poll(struct input_polled_dev *poll_dev)
+static void b43_rfkill_update(struct b43_wldev *dev)
 {
-	struct b43_wldev *dev = poll_dev->private;
 	struct b43_wl *wl = dev->wl;
 	bool enabled;
 	bool report_change = 0;
@@ -82,13 +79,25 @@ static void b43_rfkill_poll(struct input_polled_dev *poll_dev)
 			enabled ? "ENABLED" : "DISABLED");
 	}
 	mutex_unlock(&wl->mutex);
+}
 
-	/* send the radio switch event to the system - note both a key press
-	 * and a release are required */
-	if (unlikely(report_change)) {
-		input_report_key(poll_dev->input, KEY_WLAN, 1);
-		input_report_key(poll_dev->input, KEY_WLAN, 0);
-	}
+static void b43_rfkill_poll(unsigned long data)
+{
+	struct b43_rfkill *rfkill = (struct b43_rfkill *)data;
+	schedule_work(&rfkill->poll_queue);
+}
+
+static void b43_rfkill_work(struct work_struct *work)
+{
+	struct b43_rfkill *rfk = container_of(work, struct b43_rfkill, 
+					      poll_queue);
+	struct b43_wl *wl = container_of(rfk, struct b43_wl, rfkill);
+	struct b43_wldev *dev = wl->current_dev;
+
+	b43_rfkill_update(dev);
+	rfk->poll_timer.function = ...
From: Michael Buesch
Date: Wednesday, September 17, 2008 - 7:19 am

I still thing that it is really wrong to check and change the software
rfkill state from within the _hardware_ rfkill state handler.
So let's say this handler sets the state to SOFT_BLOCKED. Who is going
to set it back to unblocked, if the user unblocks the radio?

We can turn the radio on/off from the mac80211 config callback. Possibly
we must tell rfkill about it (I'm not sure. I don't understand the API).
I think this is pretty hard to get right, actually. HW-block and SW-block
are two completely independent states in b43. You can HW-block and SW-block
the device at the same time. So one must make sure that at any time the rfkill
is in a sane state if _either_ HW-block or SW-block changes. Currently I think
we only change rfkill state if HW-block state changes. Which is wrong.


In the end, I do not care about this crap anymore.
Feel free to remove my copyright notice from that file and put yours in
there. So feel free to send any patch to the rfkill code to john, but please
handle any regressions resulting from it. If not, I will handle them by reverting
patches until it starts to work again. ;)

-- 
Greetings Michael.
--

From: Henrique de Moraes Holschuh
Date: Wednesday, September 17, 2008 - 8:18 am

I would suggest reading the updated docs, but I know you'd rather just stay
away from rfkill.

For reference:

Basically, your wireless device driver has to monitor any hardware rfkill
lines and call rfkill_force_state() accordingly (you don't even need to
track if the state changed, rfkill core will do that).

The call to rfkill_force_state() should set state to HARD_BLOCKED (any of
the hardware rfkill lines are active), SOFT_BLOCKED (no hardware rfkill
lines are active, but one of the software rfkill lines are active), or
UNBLOCKED (no rfkill lines of any type are active).

Whether this is to be done through interrupts or pooling is something that
depends on your driver and the hardware.

User and userspace interaction is taken care of by the rfkill core.  You do
nothing of the sort in the wireless device driver.

There are *very few* exceptions for the above as far as wireless device
drivers go.  They are explained in the docs, and I know of no wireless

Again, for reference:

You need to synthesize a single state (unblock/soft-block/hard-block) for a
transmitter from every rfkill line that affects that transmitter.  This
happens because the interface is a SINGLE ONE rfkill class per independent

That's quite understandable...  b43 is one of the drivers (if not THE
driver) that suffered most from rfkill bugs, as it was an early adopter and
the old rfkill API was just plain broken for the kind of stuff b43 needed it
for.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Michael Buesch
Date: Wednesday, September 17, 2008 - 8:59 am

Right. But that's not what I was talking about. The hw and sw rfkill is something
_completely_ different in b43 and it's handled by _completely_ different codepaths.
I just said that currently the sw-rfkill path does _not_ announce the state
correctly to the rfkill subsystem. Additionally we must _not_ check the sw rfkill
state from within the hw rfkill handlers, as it will get out of sync this way.
(that's what we're currently doing. If you revert the commit from the subject
this will go away, afaics).
Telling rfkill about the state isn't the hard part, but the
"You need to synthesize a single state" part, which currently is not done correctly.
We currently have _two_ states. (Don't get me wrong, the fix is _not_ to remove
either of these. The fix is to introduce a common rfkill notifier that constructs
a common rfkill state from these two states. This notifier is called from hw-rfkill
and sw-rfkill.)

-- 
Greetings Michael.
--

From: Carlos Corbacho
Date: Tuesday, September 16, 2008 - 1:44 pm

Just this should be enough to fix the regression for 2.6.27 (and should be
trivial to port to b43legacy) in the immediate term (I was planning to
send something similar).

The rest of the patch is out-of-scope for 2.6.27, but is definitely along
he right lines, IMHO (and it would be good if we can get a working version
of it in to 2.6.28). I may try and look at it later, as I do have my b43
hardware handy.

-Carlos
-- 
E-Mail: carlos@strangeworlds.co.uk
Web: strangeworlds.co.uk
GPG Key ID: 0x23EE722D
--

From: Larry Finger
Date: Tuesday, September 16, 2008 - 2:07 pm

This one doesn't crash, but it doesn't turn the light off.

I'm done testing for now!

John,

Revert the patch until the rfkill guys get their house in order.

Larry


--

From: Henrique de Moraes Holschuh
Date: Tuesday, September 16, 2008 - 3:40 pm

Hmm... if this lack of LED control when HARD_BLOCKED is a problem in rfkill,
I will track it down and fix it.

The rest of the stuff (b43's usage of rfkill), is a problem for the people
who know the b43 driver to fix, I don't dare touch it and break it further.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

Previous thread: Re: 2.6.26.5-rt9 by Alexander Pazdnikov on Tuesday, September 16, 2008 - 7:16 am. (1 message)

Next thread: Re: motherboard recommendations? by Jacek Poplawski on Tuesday, September 16, 2008 - 7:39 am. (2 messages)