Re: use of preempt_count instead of in_atomic() at leds-gpio.c

Previous thread: [git patches] parisc bug fixes for 2.6.25 by Kyle McMartin on Sunday, March 16, 2008 - 10:35 am. (1 message)

Next thread: Re: Linux 2.6.25-rc4 by Andrey Borzenkov on Sunday, March 16, 2008 - 11:59 am. (1 message)
From: Henrique de Moraes Holschuh
Date: Sunday, March 16, 2008 - 11:43 am

David, Richard,

Is the use of "if (preempt_count())" to know when to defer led gpio work to
a workqueue needed?  Shouldn't "if (in_atomic())" be enough?

I have found no other such uses of preempt_count() anywhere in kernel code,
while in_atomic() is used for that sort of heuristic in various places.

Relevant git commit id is: 00852279af5ad26956bc7f4d0e86fdb40192e542
"leds: Teach leds-gpio to handle timer-unsafe GPIOs".   It made mainline in
2.6.23-rc1.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: David Brownell
Date: Sunday, March 16, 2008 - 12:46 pm

At this point, I don't know of any such reason.

I remember hunting for the right heuristic, and settling on
that one for reasons that I can't recall now.  They may even
be no longer applicable.
--

From: Andrew Morton
Date: Tuesday, March 18, 2008 - 12:14 am

Both are incorrect.  When CONFIG_PREEMPT=n we have no support for
determining whether schedule() may be called.  The calling code has to sort
out its stuff on its own.

<greps for preempt_count>

The LEDs code seems to be the sole offender.  print_vma_addr() might be
wrong too, but Ingo did it, and perhaps he knows that all code paths which
call print_vma_addr() from deadlockable contexts have already called
inc_preempt_count().  But is that true for all architectures?

<greps for in_atomic>

omigawd, what have we done, and how can we fix it? :(
--

From: David Brownell
Date: Tuesday, March 18, 2008 - 12:06 pm

==============
It appears that we can't just check to see if we're in a task
context ... so instead of trying that, just make the relevant
leds always schedule a little worklet.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
---
 drivers/leds/leds-gpio.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- g26.orig/drivers/leds/leds-gpio.c	2008-03-18 01:32:08.000000000 -0700
+++ g26/drivers/leds/leds-gpio.c	2008-03-18 02:01:23.000000000 -0700
@@ -49,13 +49,13 @@ static void gpio_led_set(struct led_clas
 	if (led_dat->active_low)
 		level = !level;
 
-	/* setting GPIOs with I2C/etc requires a preemptible task context */
+	/* Setting GPIOs with I2C/etc requires a task context, and we don't
+	 * seem to have a reliable way to know if we're already in one; so
+	 * let's just assume the worst.
+	 */
 	if (led_dat->can_sleep) {
-		if (preempt_count()) {
-			led_dat->new_level = level;
-			schedule_work(&led_dat->work);
-		} else
-			gpio_set_value_cansleep(led_dat->gpio, level);
+		led_dat->new_level = level;
+		schedule_work(&led_dat->work);
 	} else
 		gpio_set_value(led_dat->gpio, level);
 }


--

From: Andrew Morton
Date: Tuesday, March 18, 2008 - 1:07 pm

On Tue, 18 Mar 2008 11:06:13 -0800

Better, I guess.

There's a design problem in the LED interface, though.  If callers really
do want to be able to call led_classdev.brightness_set() from atomic
contexts then we should either

a) make that function atomic (as you've done).  But that's inefficient.

b) pass in a mode flag to tell the callee whether it is allowed to
   sleep.  Ugly, but there's lots of precedent: GFP_ATOMIC-vs-GFP_KERNEL.

c) create a separate led_classdev.brightness_set_atomic() which callers
   should use when they're in atomic contexts.


Option c) would be best from a cleanness and efficiency POV.
--

From: Henrique de Moraes Holschuh
Date: Thursday, March 20, 2008 - 3:56 pm

Well, it is obvious an "are we in a sleep-ok state?" query that works in any
case is desired by a lot of code.

I certainly don't want to punt every thinkpad LED write to a workqueue,
because that would mean less time with the CPU in C3 in the bottom line,
even if there are some benefits to always doing it the workqueue way (the
workqueue helper colapses attempts to change the LED state too often).

Can we add "in_scheduleable()", or maybe "can_schedule()", that returns
in_atomic() if CONFIG_PREEMT, or 0 if there is no way to know?   To my
limited knowledge of how that part of the kernel works, it would do the
right thing.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Andrew Morton
Date: Thursday, March 20, 2008 - 4:47 pm

If we did that, then people would use it.  And that would be bad.  It'll
lead to code which behaves differently on non-preemptible kernels, to code
which works less well on non-preemptible kernels and it will lead to less
well-thought-out code in general.

Really, this all points at an ill-designed part of the leds interface.  The
consistent pattern we use in the kernel is that callers keep track of
whether they are running in a schedulable context and, if necessary, they
will inform callees about that.  Callees don't work it out for themselves.

--

From: Henrique de Moraes Holschuh
Date: Thursday, March 20, 2008 - 5:36 pm

ACK.  Richard?  I have changed the thinkpad-acpi LED support to always defer
to a workqueue right now, but this *really* wants a LED class API fixup.
I'm for adding an specific hook for atomic access, but a flag would be good
enough too.

Well, so far so good for LEDs, but what about the other users of in_atomic
that apparently should not be doing it either?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Andrew Morton
Date: Thursday, March 20, 2008 - 6:08 pm

Ho hum.  Lots of cc's added.



./arch/x86/mm/pageattr.c

  Looks wrong.

./arch/m68k/atari/time.c

  Possibly buggy: deadlockable

./sound/core/seq/seq_virmidi.c

  Possibly buggy

./net/iucv/iucv.c
./kernel/power/process.c

  Just a debug check.

./drivers/s390/char/sclp_tty.c

  Possibly buggy: deadlockable
  
./drivers/s390/char/sclp_vt220.c

  Possibly buggy: deadlockable

./drivers/s390/net/netiucv.c

  Possibly buggy: deadlockable

./drivers/char/isicom.c

  Possibly buggy: deadlockable

./drivers/usb/misc/sisusbvga/sisusb_con.c

  Possibly buggy: deadlockable

./drivers/net/usb/pegasus.c

  Possibly buggy: deadlockable (I assume)

./drivers/net/wireless/airo.c

  Possibly buggy: deadlockable

./drivers/net/wireless/rt2x00/rt73usb.c

  Possibly buggy: deadlockable (I assume)

./drivers/net/wireless/rt2x00/rt2500usb.c

  Possibly buggy: deadlockable (I assume)

./drivers/net/wireless/hostap/hostap_ioctl.c

  Possibly buggy: deadlockable (I assume)

./drivers/net/wireless/zd1211rw/zd_usb.c

  Possibly buggy: deadlockable (I assume)

./drivers/net/irda/sir_dev.c

  Possibly buggy: deadlockable

./drivers/net/netxen/netxen_nic_niu.c

  Possibly buggy: deadlockable

./drivers/net/netxen/netxen_nic_init.c

  Possibly buggy: deadlockable

./drivers/ieee1394/ieee1394_transactions.c

  Possibly buggy: deadlockable

./drivers/video/amba-clcd.c

  Possibly buggy: deadlockable

./drivers/i2c/i2c-core.c

  Possibly buggy: deadlockable


The usual pattern for most of the above is

	if (!in_atomic())
		do_something_which_might_sleep();

problem is, in_atomic() returns false inside spinlock on non-preptible
kernels.  So if anyone calls those functions inside spinlock they will
incorrectly schedule and another task can then come in and try take the
already-held lock.

Now, it happens that in_atomic() returns true on non-preemtible kernels
when running in interrupt or softirq context.  But if the above code really
is ...
From: Alan Stern
Date: Thursday, March 20, 2008 - 6:31 pm

Presumably most of these places are actually trying to detect 
am-i-allowed-to-sleep.  Isn't that what in_atomic() is supposed to do?  
Why doesn't it do that in non-preemptible kernels?

For that matter, isn't it also the sort of thing that might_sleep() is 
supposed to check?  But looking at the definitions in 
include/linux/kernel.h, it appears that might_sleep() does nothing at 
all when neither CONFIG_PREEMPT_VOLUNTARY nor 
CONFIG_DEBUG_SPINLOCK_SLEEP is set.

Alan Stern

--

From: Michael Buesch
Date: Thursday, March 20, 2008 - 6:36 pm

No, I think there is no such check in the kernel. Most likely for performance
reasons, as it would require a global flag that is set on each spinlock.
You simply must always _know_, if you are allowed to sleep or not. This is
done by defining an API. The call-context is part of any kernel API.

-- 
Greetings Michael.
--

From: Andrew Morton
Date: Thursday, March 20, 2008 - 7:27 pm

Yup.  non-preemptible kernels avoid the inc/dec of

Yup.  99.99% of kernel code manages to do this...
--

From: Alan Stern
Date: Thursday, March 20, 2008 - 8:07 pm

So then what's the point of having in_atomic() at all?  Is it nothing 
more than a shorthand form of (in_irq() | in_softirq() | 
in_interrupt())?

In short, you are saying that there is _no_ reliable way to determine
am-i-called-from-inside-spinlock.  Well, why isn't there?  Would it be 
so terrible if non-preemptible kernels did adjust preempt_count on 
spin_lock/unlock?

Alan Stern

--

From: Andrew Morton
Date: Thursday, March 20, 2008 - 8:17 pm

in_atomic() is for core kernel use only.  Because in special circumstances
(ie: kmap_atomic()) we run inc_preempt_count() even on non-preemptible
kernels to tell the per-arch fault handler that it was invoked by


The reasons I identified: it adds additional overhead and it encourages
poorly-thought-out design.

Now we _could_ change kernel design principles from
caller-knows-whats-going-on over to callee-works-out-whats-going-on.  But

The vast, vast majority of kernel code has managed to get through life
without needing this hidden-argument-passing.  The handful of errant
callsites should be able to do so as well...

--

From: Jean Delvare
Date: Friday, March 21, 2008 - 2:53 am

Then why is it made available to drivers through <linux/hardirq.h>? If
it's such a dangerous macro to call from drivers, it shouldn't be made
available, or at the very least there should be a big fat warning in
<linux/hardirq.h> that drivers aren't supposed to use it. This would
have avoided the 23 uses cases in drivers we have right now.

-- 
Jean Delvare
--

From: Andrew Morton
Date: Friday, March 21, 2008 - 10:37 am

True.
--

From: Alan Stern
Date: Friday, March 21, 2008 - 11:05 am

There's also a section about in_atomic() in the Linux Device Drivers 
(3rd ed.) book which may have contributed to the confusion.  On p. 198:

	A function related to in_interrupt() is in_atomic().  Its 
	return value is nonzero whenever scheduling is not allowed;
	this includes hardware and software interrupt contexts as well
	as any time when a spinlock is held.  In the latter case, 
	current may be valid, but access to user space is forbidden, 
	since it can cause scheduling to happen.  Whenever you are
	using in_interrupt(), you should really consider whether 
	in_atomic() is what you actually mean.  Both functions are
	declared in <asm/hardirq.h>.

Alan Stern

--

From: Jonathan Corbet
Date: Monday, March 24, 2008 - 12:34 pm

My fault (again).  Obviously it *looked* like something people could use
to me...  

How about the following patch as a short-term penance to keep others
from making the same mistake?

jon

--

Discourage people from using in_atomic()

Signed-off-by: Jonathan Corbet <corbet@lwn.net>

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 4982998..3d196cb 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -72,6 +72,11 @@
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
+/*
+ * Are we running in atomic context?  WARNING: this macro cannot
+ * always detect atomic context and should not be used to determine
+ * whether sleeping is possible.  Do not use it in driver code.
+ */
 #define in_atomic()		((preempt_count() & ~PREEMPT_ACTIVE) != 0)
 
 #ifdef CONFIG_PREEMPT

--

From: Andrew Morton
Date: Monday, March 24, 2008 - 12:42 pm

On Mon, 24 Mar 2008 13:34:49 -0600

It'd be better if the comment were to describe _why_ in_atomic() is
unreliable.  ie: "does not account for held spinlocks on non-preemptible
kernels".

--

From: Jonathan Corbet
Date: Monday, March 24, 2008 - 12:53 pm

But then...why would anybody have a reason to read the upcoming LWN
article on the subject?  

OK, how's this?

jon

--

Discourage people from inappropriately using in_atomic()

Signed-off-by: Jonathan Corbet <corbet@lwn.net>

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 4982998..63a7782 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -72,6 +72,13 @@
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
+/*
+ * Are we running in atomic context?  WARNING: this macro cannot
+ * always detect atomic context; in particular, it cannot know about
+ * held spinlocks in non-preemptible kernels.  Thus it should not be
+ * used in the general case to determine whether sleeping is possible.
+ * Do not use in_atomic() in driver code.
+ */
 #define in_atomic()		((preempt_count() & ~PREEMPT_ACTIVE) != 0)
 
 #ifdef CONFIG_PREEMPT
--

From: Junio C Hamano
Date: Tuesday, March 25, 2008 - 1:52 am

Is it just me who feels this comment that says "in_atomic() is not a way
to tell if we are in atomic reliably and cannot be used for such and such"
very reader-unfriendly?  Ok, maybe the macro is not reliable and is not
meant to be used for the purpose its name seems to suggest (at least to a
non-kernel person).  An inevitable question is, then what is it good for?
What's the right situation to use this macro?

I guess an additional comment "even if this says no, you could still be in
atomic, but if this says yes, then you definitely are in atomic and cannot
sleep" may help unconfuse a clueless reader like myself.
--

From: Jean Delvare
Date: Tuesday, March 25, 2008 - 3:39 am

Andrew explained that in_atomic() could deadlock if called in a
condition where it is unreliable (although I did not understand the
details). Documenting that a "yes" from in_atomic() can always be
trusted, would invite driver authors to still use it, when my
understanding is that they still shouldn't.

If drivers shouldn't use in_atomic() at all then I think that the
long-term solution is to move its definition out of <linux/hardirq.h>.
But of course this means fixing all the drivers that still use it first.

-- 
Jean Delvare
--

From: Jonathan Corbet
Date: Tuesday, March 25, 2008 - 6:44 am

The "right situation" would appear to be "you're deep in the mm code and
really know what you're doing."  It is not a useful way for code to
determine whether it's running in atomic context - as was discussed
elsewhere in the thread, that information really needs to be passed in
by the caller.
 

The point being that "you just *might* be in atomic context, where
sleeping would be a bad idea, but I can't tell you" really isn't all
that useful.  It's a trap which can only lead to incorrect code.

What really needs to happen, IMHO, is that this macro should be ripped
out of hardirq.h entirely and cleverly hidden somewhere.  That can't be
done, though, until the drivers which use it are fixed.  But while that
is happening, we can at least put up a skull-and-crossbones sign to
discourage others from making the same mistake.

jon
--

From: David Brownell
Date: Tuesday, March 25, 2008 - 4:20 pm

I _almost_ hate bringing this lovely flamage back onto $SUBJECT ... but
what's the resolution for the leds-gpio.c issue?  I've not seen a merge
notice for the patch I submitted a week ago now:

	http://marc.info/?l=linux-kernel&m=120597839009399&w=2

Just a "leaning..." comment:

	http://marc.info/?l=linux-kernel&m=120606104619198&w=2

Seems to me that by now there ought to be resolution on at least
one of the issues brought up on this thread.  :)

- Dave

--

From: Alan Stern
Date: Wednesday, March 26, 2008 - 7:28 am

Is it reasonable to have two version of that subroutine: one meant to 
be called in a sleepable context and the other to be called when 
sleeping isn't allowed?

Alan Stern

--

From: Henrique de Moraes Holschuh
Date: Wednesday, March 26, 2008 - 9:17 am

I have changed the thinkpad-acpi leds code to always assume an atomic
context, but I too would appreciate a proper flag (or secondary hook)
from the LED class to know when I am in an atomic context or not.

LED Triggers also need to be modified, they are mostly called from an
atomic context so we have to assume that by default, but we'd do well to
add a way to call them from non-atomic contexts.

Richard?  AFAIK, the ball *is* in your court as the LED maintainer.  You
have to decide which way to go and tell us.  I suppose either I or Alan
could write up patches to implement it, but I have better things to do
than to write patches that would be rejected anyway... OTOH, I don't
mind writing ones that I know are at least attempting to implement an
approved idea and would be rejected only if they need some fixing.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Richard Purdie
Date: Wednesday, March 26, 2008 - 9:46 am

I've been meaning to merge David's patch and will get that done shortly,
sorry for the delay.

As I've said, I don't really think we need context flags for the LED
class at the moment. As you say, most of the triggers are atomic context
and those which can sleep are not time critical so I don't really see
the point in the added complexity.

Cheers,

Richard


--

From: David Brownell
Date: Thursday, March 27, 2008 - 11:51 am

Presumably, both near-term and long-term solutions are needed.

I'd suggest merging the leds-gpio and thinkpad-acpi fixes
before 2.6.25 ships, and then *maybe* adopting something
more invasive.

- Dave
--

From: Tetsuo Handa
Date: Friday, March 21, 2008 - 8:11 am

So, it is impossible to know whether I am inside a spinlock or not.
OK. That's not what I want to do.

I want to make sure that my code (not a device driver) is called only from a context
where use of down()/mutex_lock()/kmalloc(GFP_KERNEL)/get_user_pages()/kmap() etc. are permitted.
Is "if (in_atomic()) return;" check a correct method for avoiding deadlocks
when my code was accidentally called from a context
where use of down()/mutex_lock()/kmalloc(GFP_KERNEL)/get_user_pages()/kmap() etc. are not permitted?
I'm assuming that in_atomic() returns nonzero whenever scheduling is not permitted.

Regards.
--

From: Stefan Richter
Date: Friday, March 21, 2008 - 9:54 am

You shouldn't sleep while holding a spinlock.  As soon as another thread 
attempts to take the spinlock, it will be stuck in a busy-wait loop.

So, it's better if you specify that your code either can be called in 
atomic context or must not be called in atomic context, and all callers 
observe this restriction.  Or callers pass a flag to your code which 
says whether your code is allowed to sleep or not.
-- 
Stefan Richter
-=====-==--- --== =-=-=
http://arcgraph.de/sr/
--

From: Stefan Richter
Date: Friday, March 21, 2008 - 10:02 am

PS,

No.  Quoting Andrew:  "in_atomic() returns false inside spinlock on 

-- 
Stefan Richter
-=====-==--- --== =-=-=
http://arcgraph.de/sr/
--

From: Tetsuo Handa
Date: Saturday, March 22, 2008 - 10:53 pm

So, just "if (in_atomic()) return;" check is insufficient for detecting
all cases when it is not permitted to sleep. I see.

Is "in_atomic() returns false inside spinlock on non-preemptible kernels."
the only case that the in_atomic() can't tell whether it is permitted to sleep or not?
If this is the only case, can't we somehow know it by remembering
"how many spinlocks does this CPU is holding now" since
"it is not permitted to sleep inside the spinlock" means
"the CPU the current process is running will not change".
Something like

#ifdef CONFIG_COUNT_SPINLOCKS_HELD
atomic_t spinlock_held_counter[NR_CPUS];
#endif

void spin_lock(x)
{
	/* obtain this spinlock. */
#ifdef CONFIG_COUNT_SPINLOCKS_HELD
	/* increment spinlock_held_counter[this_CPU]. */
#endif
}

void spin_unlock(x)
{
#ifdef CONFIG_COUNT_SPINLOCKS_HELD
	/* decrement spinlock_held_counter[this_CPU]. */
#endif
	/* release this spinlock. */
}

bool in_spinlock()
{
#ifdef CONFIG_COUNT_SPINLOCKS_HELD
	/* return spinlock_held_counter[this_CPU] != 0. */
#else
	return false;
#endif
}

and use "if (in_atomic() || in_spinlock()) return;" instead of "if (in_atomic()) return;" ?
--

From: Heiko Carstens
Date: Friday, March 21, 2008 - 6:47 am

This is difficult for console drivers. They get called and are supposed to
print something and don't have the slightest clue which context they are
running in and if they are allowed to schedule.
This is the problem with e.g. s390's sclp driver. If there are no write
buffers available anymore it tries to allocate memory if schedule is allowed
or otherwise has to wait until finally a request finished and memory is
available again.
And now we have to always busy wait if we are out of buffers, since we
cannot tell which context we are in?
--

From: Greg KH
Date: Friday, March 21, 2008 - 9:54 am

This is the reason why the drivers/usb/misc/sisusbvga driver is trying
to test for in_atomic:
        /* We can't handle console calls in non-schedulable
         * context due to our locks and the USB transport.
         * So we simply ignore them. This should only affect
         * some calls to printk.
         */
        if (in_atomic())
                return NULL;


So how should this be "fixed" if in_atomic() is not a valid test?

thanks,

greg k-h
--

From: Andrew Morton
Date: Friday, March 21, 2008 - 12:59 pm

Well.  The kernel has traditionally assumed that console writes are atomic.

But we now have complex sleepy drivers acting as consoles.  Presumably this
means that large amounts of device driver code, page allocator code, etc
cannot have printks in them without going recursive.  Except printk itself
internally handles that, due to its need to be able to handle
printk-from-interrupt-when-this-cpu-is-already-running-printk.

The typical fix is for these console drivers to just assume that they
cannot sleep: pass GFP_ATOMIC down into the device driver code.  But I bet
the device driver code was designed assuming that it could sleep,
oops-bad-we-lose.

And it's not just sleep-in-spinlock.  If any of that device driver code
uses alloc_pages(GFP_KERNEL) then it can deadlock if we do a printk from
within the page allocator (and hence a lot of the block and storage layer).
Because in those code paths we must use GFP_NOFS or GFP_NOIO to allocate
memory.

So I think the right fix here is to switch those drivers to being
unconditionally atomic: don't schedule, don't take mutexes, don't use
__GFP_WAIT allocations.

They could of course be switched to using
kmalloc(GFP_ATOMIC)+memcpy()+schedule_task().  That's rather slow, but this
is not a performance-sensitive area.  But more seriously, this could lead
to messages getting lost from a dying machine.

One possibility would be to do current->called_for_console_output=1 and
then test that in various places.  But a) ugh and b) that's only useful for
memory allocations - it doesn't help if sleeping locks need to be taken.

Another possibility might be:

	if (current->called_for_console_output == false) {
		mutex_lock(lock);
	} else {
		if (!mutex_trylock(lock))
			return -EAGAIN;
	}

and then teach the console-calling code to requeue the message for later. 
But that's hard, because the straightforward implementation would result in
the output being queued for _all_ the currently active consoles, but some
of them might already have ...
From: Michael Buesch
Date: Friday, March 21, 2008 - 1:16 pm

Well, IMO drivers that need to sleep to transmit some data (to whatever,
the screen or something) are not useful for debugging a crashing kernel anyway.
Or how high is the possibility that it'd survive the actual sleep in the
memory allocation? I'd say almost zero.
So that schedule_task() is not that bad.

-- 
Greetings Michael.
--

From: Michael Buesch
Date: Friday, March 21, 2008 - 1:20 pm

and

transmit_data_func()
{
	if (!oops_in_progress) {
		schedule_transmission_for_later();
	} else {
		/* We crash anyway, so we don't care about
		 * possible deadlocks from memory alloc sleeps
		 * or whatever. */
		close_eyes_and_transmit_it_now();
	}
}


-- 
Greetings Michael.
--

From: Stefan Richter
Date: Friday, March 21, 2008 - 2:21 am

That's in hpsb_get_tlabel(), an exported symbol of the ieee1394 core.

The in_atomic() there didn't cause problems yet and is unlikely to do so 
in the future, because there are no plans for substantial changes to the 
whole drivers/ieee1394/ anymore (because of drivers/firewire/).

Nevertheless I shall look into replacing the in_atomic() by in_softirq() 
or something like that.  Touching this legacy code is dangerous though.


Some background:

This in_atomic() is just one symptom of one of the fundamental design 
flaws of the ieee1394 stack:  The "tlabels" (transaction labels, a 
limited resource) are acquired not only in process context but also in 
soft IRQ context --- but they are released only in process context. 
Unsurprisingly (in hindsight), the stack used to run out of tlabels 
simply because the tlabel consumers were scheduled more frequently than 
the tlabel recycler.  This resulted in IO failures in sbp2 and eth1394.

This is one of the design problems which inspired the submission of a 
new alternative driver stack.  (Though this particular one of the 
ieee1394 stack's problems could of course also be solved by a rework of 
the stack --- with a respective need of resources for testing and some 
danger of regressions.)

In the meantime (Linux 2.6.19 and 2.6.22) I added workarounds in sbp2 
and eth1394 to deal with temporary lack of of tlabels.  Alas I just 
recently received a report that eth1394's workaround is unsuccessful on 
non-preemptible uniprocessor kernels.  I suspect the same issue exists 
with sbp2's workaround, it just isn't as likely to happen there.

The new drivers/firewire/ recycle tlabels in bottom halves context and 
in timer context, which is the appropriate approach.  Alas 
drivers/firewire/ don't have an eth1394 equivalent yet...
-- 
Stefan Richter
-=====-==--- --== =-=-=
http://arcgraph.de/sr/
--

From: Stefan Richter
Date: Friday, March 21, 2008 - 2:27 am

Or extend the API to have separate calls for callers which can sleep and 
-- 
Stefan Richter
-=====-==--- --== =-=-=
http://arcgraph.de/sr/
--

From: Henrique de Moraes Holschuh
Date: Friday, March 21, 2008 - 5:37 am

Which, I think, is exactly the config where in_atomic() can't be used to
mean "in_scheduleable_context()" ?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Stefan Richter
Date: Friday, March 21, 2008 - 6:16 am

That's coincidence.

The mentioned workaround fails this way:
   - tlabel consumer eth1394 (IPv4 over FireWire) grabs lots of tlabels
     in soft IRQ context.
   - tlabel recycler khpsbpkt (a kthread of ieee1394) sleeps even though
     it could start putting tlabels back into the pool.
   - eth1394 can't get tlabels anymore, stops the transmit queue,
     schedules a workqueue job.
   - eth1394's workqueue job (run by the events kthread) tries to acquire
     a tlabel.  It does so in non-atomic context and hence sleeps in
     hpsb_get_tlabel() until the tlabel pool is nonempty again.  It would
     then wake up the eth1394 transmit queue again.
   - Normally, khpsbpkt would have been woken up by now and would have
     released a lot of now unused tlabels back into the pool again.
     However, on UP preempt_none kernels, khpsbpkt continues to sleep.
     (The 1394 stack's lower level runing in IRQ context or perhaps
     tasklet context wakes up khpsbpkt.)
   - Since it doesn't get a tlabel, eth1394's workqueue jobs sleeps
     forever as well.

Result is that all other tasks of the shared workqueue can't be 
serviced, notably the keyboard is stuck, and that the eth1394 connection 
breaks down.  (I haven't started working on a fix, or opened a bugzilla 
ticket for it yet.  The reporter currently switched his kernel to 
PREEMPT which is not affected.)

IOW:
The failure in the workaround is *not* about the in_atomic() being the 
wrong question asked in hpsb_get_tlabel() --- no, ieee1394's in_atomic() 
abuse works just fine even on UP PREEMPT_NONE.  Instead, the failure is 
about kthreads not being scheduled in the way that I thought they would.
-- 
Stefan Richter
-=====-==--- --== =-=-=
http://arcgraph.de/sr/
--

From: Stefan Richter
Date: Saturday, March 22, 2008 - 4:29 am

-- 
Stefan Richter
-=====-==--- --== =-==-
http://arcgraph.de/sr/
--

From: David Brownell
Date: Friday, March 21, 2008 - 10:04 am

Looks just unecessary to me ... ethtool MII ops get called from
a task context, as I recall, and other drivers just rely on that.

- Dave

========= CUT HERE
Remove superfluous in-atomic() check; ethtool MII ops are called
from task context.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
---
 drivers/net/usb/pegasus.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- g26.orig/drivers/net/usb/pegasus.c	2008-03-21 08:53:28.000000000 -0700
+++ g26/drivers/net/usb/pegasus.c	2008-03-21 08:54:07.000000000 -0700
@@ -1128,12 +1128,8 @@ pegasus_get_settings(struct net_device *
 {
 	pegasus_t *pegasus;
 
-	if (in_atomic())
-		return 0;
-
 	pegasus = netdev_priv(dev);
 	mii_ethtool_gset(&pegasus->mii, ecmd);
-
 	return 0;
 }
 

--

From: Richard Purdie
Date: Thursday, March 20, 2008 - 5:56 pm

The LED interface said that the brightness_set implementation should not
sleep since it was intended to be a 'cheap' function and to allow LED
triggers changing the LED brightness to be simple. A lot of embedded LED
hardware doesn't need to sleep to toggle gpios.

Some drivers do have a problem with that however and its usually been
suggested they offload the brightness changes into a workqueue. The gpio
driver tries to be clever and only uses the workqueue if the gpio
backend can sleep *and* the calling context requires it, the latter part
being the problem.

So the options are:

* fix the gpio driver not to be so clever and clearly document
* move the workqueue into the LED class, use it for everyone and remove
the limitation of the function (punishes the hardware which doesn't need
to sleep)
* move the workqueue into the LED class and have LED drivers state
whether they can sleep or not
* start passing around GFP_* flags

Passing flags around and maintaining a track of schedulable state for
the LED class sounds like overkill. I also don't like the idea of
needlessly always using a workqueue. The reason the workqueue was never
implemented in the core was basically a question of timing. If you know
the LED is on a serial bus running at 9600 baud you might not schedule
work quite as often as something on a faster bus. Yes you could start
passing this info around but to me it makes sense to leave this kind of
policy to the drivers.

So I'm leaning towards 'fixing' the gpio driver as I think David has
already offered. I will also improve the documentation on this function
and its requirements as I agree the current isn't as clear as it should
be.

Cheers,

Richard

--

From: Henrique de Moraes Holschuh
Date: Thursday, March 20, 2008 - 7:10 pm

Also good.  But the fact is, the LED core *does* know when it is calling
from a scheduleable context (e.g. from sysfs handlers), and that's not an
uncommon path either.

The trigger code is more complicated, I don't know if most of its calls to
brightness_set are in safe or unsafe contexts for sleep.  But the people

It is the preferred way to do these things.  If you don't do it like that,
both gpio and *all* ACPI-based LED devices will have to always defer to

And we will have to always defer to workqueues on drivers that can't operate
from an atomic/interrupt context?  Even when there would be no need for it
because brightness_set is not being called from an non-scheduleable context
at all?

I hope I can live with that for LEDs (I have to think about LED
brightness_get first before I am sure about that), but I don't like it at
all for the long term.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

Previous thread: [git patches] parisc bug fixes for 2.6.25 by Kyle McMartin on Sunday, March 16, 2008 - 10:35 am. (1 message)

Next thread: Re: Linux 2.6.25-rc4 by Andrey Borzenkov on Sunday, March 16, 2008 - 11:59 am. (1 message)