Re: [PATCH RFC] pm_qos_requirement might sleep

Previous thread: [patch 00/33] 2.6.25-stable review cycle by Greg KH on Monday, August 4, 2008 - 1:13 pm. (34 messages)

Next thread: [PATCH] tracehook: kerneldoc fix by Roland McGrath on Monday, August 4, 2008 - 1:56 pm. (2 messages)
From: John Kacur
Date: Monday, August 4, 2008 - 1:52 pm

Even after applying some fixes posted by Chirag and Peter Z, I'm still
getting some messages in my log like this
BUG: sleeping function called from invalid context swapper(0) at
kernel/rtmutex.c:743
in_atomic():1 [00000001], irqs_disabled():1
Pid: 0, comm: swapper Tainted: G        W 2.6.26.1-rt1.jk #2

Call Trace:
 [<ffffffff802305d3>] __might_sleep+0x12d/0x132
 [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
 [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
 [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
 [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
 [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
 [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
 [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
 [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
 [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
 [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
 [<ffffffff80466af0>] start_secondary+0x186/0x18b

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
.....[<ffffffff80466af0>] ..   ( <= start_secondary+0x186/0x18b)

The following simple patch makes the messages disappear - however,
there may be a better more fine grained solution, but the problem is
also that all the functions are designed to use the same lock.
From: Peter Zijlstra
Date: Tuesday, August 5, 2008 - 12:25 am

Hmm, I think you're right - its called from the idle routine so we can't
go about sleeping there.

The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
use of this lock - that is decidedly not O(1).

Mark, would it be possible to split that lock in two, one lock
protecting pm_qos_array[], and one lock protecting the
requirements.list ?


--

From: mark gross
Date: Tuesday, August 5, 2008 - 1:49 pm

very likely, but I'm not sure how it will help.

the fine grain locking I had initially worked out on pm_qos was to have
a lock per pm_qos_object, that would be used for accessing the
requirement_list and the target_value.  But that isn't what you are
asking about is it?

Is what you want is a pm_qos_requirements_list_lock and a
pm_qos_target_value_lock, for each pm_qos_object instance?

I guess it wold work but besides giving the code spinlock diarrhea would
it really help solve the issue you are seeing? 

--

From: Peter Zijlstra
Date: Tuesday, August 5, 2008 - 2:09 pm

The problem is that on -rt spinlocks turn into mutexes. And the above
BUG tells us that the idle loop might end up scheduling due to trying to
take this lock.

Now, the way I read the code, pm_qos_lock protects multiple things:

 - pm_qos_array[target]->target_value

 - &pm_qos_array[pm_qos_class]->requirements.list

Now, the thing is, we could turn the lock back into a real spinlock
(raw_spinlock_t), but the loops in eg update_target() are not O(1) and
could thus cause serious preempt-off latencies.

My question was, and now having had a second look at the code I think it
is, would it be possible to guard the list using a sleeping lock,
protect the target_value using a (raw) spinlock.

OTOH, just reading a (word aligned, word sized) value doesn't normally
require serialization, esp if the update site is already serialized by
other means.

So could we perhaps remove the lock usage from pm_qos_requirement()? -
that too would solve the issue.


 - Peter

--

From: John Kacur
Date: Tuesday, August 5, 2008 - 3:18 pm

How about this patch? Like Peter suggests, It adds a raw spinlock only
for the target value. I'm currently running with it, but still
testing, comments are appreciated.

Thanks
From: John Kacur
Date: Monday, August 11, 2008 - 6:25 am

I have been running with this patch for quite a while now without any
problems, so please apply. If Mark is able to remove the lock
altogether at some point in the future then we can remove this patch.
--

From: mark gross
Date: Tuesday, August 12, 2008 - 3:49 pm

As long as RAW_SPINLOCK compiles to a normal spinlock for non-RT premept
kernels I'm don't see a problem, as the change is almost a no-op for
non-RT kernels.

Signed-off-by: mark gross <mgross@linux.intel.com>

Should I send an updated patch that includes a change to the comment
block regarding the locking design after this patch or instead of it?


--gmross


--

From: John Kacur
Date: Wednesday, August 13, 2008 - 1:24 am

Correct, kernels with the rt patch that are configured to be non-rt
change the raw_spinlock to a normal spinlock. This patch still applies

I've updated my patch to include changes to the comment block about
the locking design. I've also added your Signed-off-by line . (Maybe
Acked-by: would be more appropriate?)

Now that I've separated locking of the target value from the other
locks, Peter's question still remains. Could the lock protecting
target be dropped from mainline which would allow us to drop this
patch altogether from rt? For now the safe thing to do is keep it
protected, but could you explain why it is needed? (it may very well
be)

Thank You.
(updated patch attached)
From: mark gross
Date: Thursday, August 14, 2008 - 8:52 am

I was confused about this point, as I thought I saw raw_spinlock defined

thanks, 


This code is doing list deletions, insertions and walks / element
updates in a multi threaded environment where many processes on many
CPU's can be adding removing and updating PM_QOS request, a lot (tm).

So I think I still need to locking for the list walking and list element
updating code on face value.  I'm not supper good with lists, perhaps
there is a trick to protecting the lists better than the way I'm doing
it. 

Keeping a lock around the different "target_value"s may not be so
important.  Its just a 32bit scaler value, and perhaps we can make it an
atomic type?  That way we loose the raw_spinlock.


Actually, the target is only getting read by CPUIDLE from idle.  It
shouldn't get changed from the idle context.


--

From: Peter Zijlstra
Date: Thursday, August 14, 2008 - 10:48 am

My suggestion was to keep the locking for the write side - so as to
avoid stuff stomping on one another, but drop the read side as:

 spin_lock
 foo = var;
 spin_unlock
 return foo;

is kinda useless, it doesn't actually serialize against the usage of
foo, that is, once it gets used, var might already have acquired a new
value.

The only thing it would protect is reading var, but since that is a
machine sized read, its atomic anyway (assuming its naturally aligned).

So no need for atomic_t (its read-side is just a read too), just drop
the whole lock usage from pq_qos_requirement().



--

From: John Kacur
Date: Thursday, August 14, 2008 - 3:51 pm

Thanks Peter.

Mark, is the following patch ok with you? This should be applied to
mainline, and then after that no special patches are necessary for
real-time.

Thanks

John Kacur
From: mark gross
Date: Wednesday, August 20, 2008 - 12:14 pm

It looks ok to me, do I need to add a compiler declaration to the
structure to make sure the target_value is word aligned?

thanks,


--

From: mark gross
Date: Monday, August 25, 2008 - 9:34 am

I've been thinking about this patch and I worry that the readability
from making the use of this lock asymmetric WRT reads and writes to the
storage address is bothersome.

I would rather make the variable an atomic.  What do you think about
that?


--

From: Peter Zijlstra
Date: Monday, August 25, 2008 - 9:35 am

It would make the write side more expensive, as we already have the two
atomic operations for the lock and unlock, this would add a third.

Then again, I doubt that this is really a fast path.

OTOH, a simple comment could clarify the situation for the reader.

Up to you I guess ;-)

--

From: John Kacur
Date: Tuesday, August 26, 2008 - 1:48 am

Personally I agree with Peter, a simple comment would clarify the
situation, it seems quite silly to me to add complexity in the name of
symmetry. This is not my definition of readability. Never-the-less I
offer up solution number 3 here if that would please everyone more.
Attached is a patch that changes the target value to an atomic
variable as suggested by Arjan. To summarize.

3 Sol'ns - all of which solve the problem.
1. Add a raw spinlock around target value only. This makes the raw
spinlock area very small, and is converted to a normal spinlock for
non-preempt-rt.
2. Remove the spinlock altogether in pm_qos_requirement since the
simple read is already atomic. Advantage - smallest patch and realtime
doesn't require a special patch once this is included in mainline. I
like this one the best.
3. make target_value atomic_t. Advantage - symmetry, some people find
this more readable. The patch is larger than the above solution but as
above, no special patch is required for realtime once this is included
in mainline. Solution three is in the attached patch. Comments are
appreciated as always.
From: mark gross
Date: Tuesday, August 26, 2008 - 9:18 am

do we want to move the unlock before the setting of the target_value?
This feels wrong to me, the option 2 patch didn't do this.

couldn't we have a race from 2 cpu's hitting update_target at the same
time with different values if we drop the lock before the target_value
is set?


--

From: John Kacur
Date: Tuesday, August 26, 2008 - 10:45 am

I think you are right since atomicity doesn't have anything to do with
ordering, good catch, putting the the unlock back where it was before,
new patch attached. (also shortened-up pm_qos_requirement)

---SNIP----

John
From: mark gross
Date: Thursday, August 28, 2008 - 12:38 pm

looks good, its running without error on the non-rt kernel.  

Signed-off-by: mark gross <mgross@linux.intel.com>

FWIW I'll send this as patch to Andrew in a moment.


--

From: mark gross
Date: Thursday, August 28, 2008 - 12:44 pm

From: John Kacur <jkacur at gmail dot com>

Patch to make PM_QOS and CPU_IDLE play nicer when ran with the
RT-Preempt kernel.  CPU_IDLE polls the target_value's of some of the
pm_qos parameters from the idle loop causing sleeping locking
warnings.  Changing the target_value to an atomic avoids this issue.

Signed-off-by: mark gross <mgross@linux.intel.com>

Thanks,

--mgross


Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
This is necessary for real-time since pm_qos_requirement is called by idle and
cannot be allowed to sleep.
Signed-off-by: John Kacur <jkacur at gmail dot com>

Index: linux-2.6/kernel/pm_qos_params.c
===================================================================
--- linux-2.6.orig/kernel/pm_qos_params.c	2008-08-08 07:52:01.000000000 -0700
+++ linux-2.6/kernel/pm_qos_params.c	2008-08-28 12:14:55.000000000 -0700
@@ -43,7 +43,7 @@
 #include <linux/uaccess.h>
 
 /*
- * locking rule: all changes to target_value or requirements or notifiers lists
+ * locking rule: all changes to requirements or notifiers lists
  * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
  * held, taken with _irqsave.  One lock to rule them all
  */
@@ -66,7 +66,7 @@
 	struct miscdevice pm_qos_power_miscdev;
 	char *name;
 	s32 default_value;
-	s32 target_value;
+	atomic_t target_value;
 	s32 (*comparitor)(s32, s32);
 };
 
@@ -77,7 +77,7 @@
 	.notifiers = &cpu_dma_lat_notifier,
 	.name = "cpu_dma_latency",
 	.default_value = 2000 * USEC_PER_SEC,
-	.target_value = 2000 * USEC_PER_SEC,
+	.target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
 	.comparitor = min_compare
 };
 
@@ -87,7 +87,7 @@
 	.notifiers = &network_lat_notifier,
 	.name = "network_latency",
 	.default_value = 2000 * USEC_PER_SEC,
-	.target_value = 2000 * USEC_PER_SEC,
+	.target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
 	.comparitor = min_compare
 };
 
@@ -99,7 +99,7 @@
 	.notifiers = &network_throughput_notifier,
 	.name = ...
From: Andrew Morton
Date: Thursday, August 28, 2008 - 5:32 pm

On Thu, 28 Aug 2008 12:44:20 -0700

I tend to deobfuscate email addresses when preparing the changelogs.

Not sure why, really, and I feel a bit guilty about doing it (hey, you
can forward the resulting spam to me if you like).  One reason I guess
is to avoid breaking all the emails which my scripts will automatically
send out as the patch goes through its little lifecycle.  Some of which

As usual when I get two different changelogs, I create a masterpiece

OK, so I've stared and stared.  This patch doesn't do anything.

I'm suspecting that the rt patch might be doing something silly, like
thinking that it needs to take pm_qos_lock to read a single word from
memory, or something like that.

But this patch is simply not understandable given the amount of
information which you guys have provided.

Help?

--

From: John Kacur
Date: Thursday, August 28, 2008 - 11:31 pm

On Fri, Aug 29, 2008 at 2:32 AM, Andrew Morton

Hi Andrew

The purpose of the patch is to remove the spin_lock around the read in
the function pm_qos_requirement - since spinlocks can sleep in -rt and
this function is called from idle. I did propose a patch, that simply
removed the spinlock, since Peter Zilstra and others (yourself) point
out, the lock isn't needed to read a single word from memory. Although
the patch worked fine, Arjan objected to it, because it was not
symmetric. A reader of the code might wonder why it was locked in
other functions but not here. He suggested changing it to an atomic
type might be less confusing. The alternative which I like better
actually is to simply remove the lock in pm_qos_requirement and add a
comment for readers of the code.

John
--

From: Steven Rostedt
Date: Friday, August 29, 2008 - 7:29 am

This is a much better changelog, and is what should have been written in 
the first place ;-)

-- Steve

--

Previous thread: [patch 00/33] 2.6.25-stable review cycle by Greg KH on Monday, August 4, 2008 - 1:13 pm. (34 messages)

Next thread: [PATCH] tracehook: kerneldoc fix by Roland McGrath on Monday, August 4, 2008 - 1:56 pm. (2 messages)