Even after applying some fixes posted by Chirag and Peter Z, I'm still getting some messages in my log like this BUG: sleeping function called from invalid context swapper(0) at kernel/rtmutex.c:743 in_atomic():1 [00000001], irqs_disabled():1 Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2 Call Trace: [<ffffffff802305d3>] __might_sleep+0x12d/0x132 [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10 [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c [<ffffffff803e1b7f>] menu_select+0x7b/0x9c [<ffffffff8020b1be>] ? default_idle+0x0/0x5a [<ffffffff8020b1be>] ? default_idle+0x0/0x5a [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8 [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8 [<ffffffff8020b1be>] ? default_idle+0x0/0x5a [<ffffffff8020b333>] cpu_idle+0xb2/0x12d [<ffffffff80466af0>] start_secondary+0x186/0x18b --------------------------- | preempt count: 00000001 ] | 1-level deep critical section nesting: ---------------------------------------- .. [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d .....[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b) The following simple patch makes the messages disappear - however, there may be a better more fine grained solution, but the problem is also that all the functions are designed to use the same lock.
Hmm, I think you're right - its called from the idle routine so we can't go about sleeping there. The only trouble I have is with kernel/pm_qos_params.c:update_target()'s use of this lock - that is decidedly not O(1). Mark, would it be possible to split that lock in two, one lock protecting pm_qos_array[], and one lock protecting the requirements.list ? --
very likely, but I'm not sure how it will help. the fine grain locking I had initially worked out on pm_qos was to have a lock per pm_qos_object, that would be used for accessing the requirement_list and the target_value. But that isn't what you are asking about is it? Is what you want is a pm_qos_requirements_list_lock and a pm_qos_target_value_lock, for each pm_qos_object instance? I guess it wold work but besides giving the code spinlock diarrhea would it really help solve the issue you are seeing? --
The problem is that on -rt spinlocks turn into mutexes. And the above BUG tells us that the idle loop might end up scheduling due to trying to take this lock. Now, the way I read the code, pm_qos_lock protects multiple things: - pm_qos_array[target]->target_value - &pm_qos_array[pm_qos_class]->requirements.list Now, the thing is, we could turn the lock back into a real spinlock (raw_spinlock_t), but the loops in eg update_target() are not O(1) and could thus cause serious preempt-off latencies. My question was, and now having had a second look at the code I think it is, would it be possible to guard the list using a sleeping lock, protect the target_value using a (raw) spinlock. OTOH, just reading a (word aligned, word sized) value doesn't normally require serialization, esp if the update site is already serialized by other means. So could we perhaps remove the lock usage from pm_qos_requirement()? - that too would solve the issue. - Peter --
How about this patch? Like Peter suggests, It adds a raw spinlock only for the target value. I'm currently running with it, but still testing, comments are appreciated. Thanks
I have been running with this patch for quite a while now without any problems, so please apply. If Mark is able to remove the lock altogether at some point in the future then we can remove this patch. --
As long as RAW_SPINLOCK compiles to a normal spinlock for non-RT premept kernels I'm don't see a problem, as the change is almost a no-op for non-RT kernels. Signed-off-by: mark gross <mgross@linux.intel.com> Should I send an updated patch that includes a change to the comment block regarding the locking design after this patch or instead of it? --gmross --
Correct, kernels with the rt patch that are configured to be non-rt change the raw_spinlock to a normal spinlock. This patch still applies I've updated my patch to include changes to the comment block about the locking design. I've also added your Signed-off-by line . (Maybe Acked-by: would be more appropriate?) Now that I've separated locking of the target value from the other locks, Peter's question still remains. Could the lock protecting target be dropped from mainline which would allow us to drop this patch altogether from rt? For now the safe thing to do is keep it protected, but could you explain why it is needed? (it may very well be) Thank You. (updated patch attached)
I was confused about this point, as I thought I saw raw_spinlock defined thanks, This code is doing list deletions, insertions and walks / element updates in a multi threaded environment where many processes on many CPU's can be adding removing and updating PM_QOS request, a lot (tm). So I think I still need to locking for the list walking and list element updating code on face value. I'm not supper good with lists, perhaps there is a trick to protecting the lists better than the way I'm doing it. Keeping a lock around the different "target_value"s may not be so important. Its just a 32bit scaler value, and perhaps we can make it an atomic type? That way we loose the raw_spinlock. Actually, the target is only getting read by CPUIDLE from idle. It shouldn't get changed from the idle context. --
My suggestion was to keep the locking for the write side - so as to avoid stuff stomping on one another, but drop the read side as: spin_lock foo = var; spin_unlock return foo; is kinda useless, it doesn't actually serialize against the usage of foo, that is, once it gets used, var might already have acquired a new value. The only thing it would protect is reading var, but since that is a machine sized read, its atomic anyway (assuming its naturally aligned). So no need for atomic_t (its read-side is just a read too), just drop the whole lock usage from pq_qos_requirement(). --
Thanks Peter. Mark, is the following patch ok with you? This should be applied to mainline, and then after that no special patches are necessary for real-time. Thanks John Kacur
It looks ok to me, do I need to add a compiler declaration to the structure to make sure the target_value is word aligned? thanks, --
I've been thinking about this patch and I worry that the readability from making the use of this lock asymmetric WRT reads and writes to the storage address is bothersome. I would rather make the variable an atomic. What do you think about that? --
It would make the write side more expensive, as we already have the two atomic operations for the lock and unlock, this would add a third. Then again, I doubt that this is really a fast path. OTOH, a simple comment could clarify the situation for the reader. Up to you I guess ;-) --
Personally I agree with Peter, a simple comment would clarify the situation, it seems quite silly to me to add complexity in the name of symmetry. This is not my definition of readability. Never-the-less I offer up solution number 3 here if that would please everyone more. Attached is a patch that changes the target value to an atomic variable as suggested by Arjan. To summarize. 3 Sol'ns - all of which solve the problem. 1. Add a raw spinlock around target value only. This makes the raw spinlock area very small, and is converted to a normal spinlock for non-preempt-rt. 2. Remove the spinlock altogether in pm_qos_requirement since the simple read is already atomic. Advantage - smallest patch and realtime doesn't require a special patch once this is included in mainline. I like this one the best. 3. make target_value atomic_t. Advantage - symmetry, some people find this more readable. The patch is larger than the above solution but as above, no special patch is required for realtime once this is included in mainline. Solution three is in the attached patch. Comments are appreciated as always.
do we want to move the unlock before the setting of the target_value? This feels wrong to me, the option 2 patch didn't do this. couldn't we have a race from 2 cpu's hitting update_target at the same time with different values if we drop the lock before the target_value is set? --
I think you are right since atomicity doesn't have anything to do with ordering, good catch, putting the the unlock back where it was before, new patch attached. (also shortened-up pm_qos_requirement) ---SNIP---- John
looks good, its running without error on the non-rt kernel. Signed-off-by: mark gross <mgross@linux.intel.com> FWIW I'll send this as patch to Andrew in a moment. --
From: John Kacur <jkacur at gmail dot com> Patch to make PM_QOS and CPU_IDLE play nicer when ran with the RT-Preempt kernel. CPU_IDLE polls the target_value's of some of the pm_qos parameters from the idle loop causing sleeping locking warnings. Changing the target_value to an atomic avoids this issue. Signed-off-by: mark gross <mgross@linux.intel.com> Thanks, --mgross Remove the spinlock in pm_qos_requirement by making target_value an atomic type. This is necessary for real-time since pm_qos_requirement is called by idle and cannot be allowed to sleep. Signed-off-by: John Kacur <jkacur at gmail dot com> Index: linux-2.6/kernel/pm_qos_params.c =================================================================== --- linux-2.6.orig/kernel/pm_qos_params.c 2008-08-08 07:52:01.000000000 -0700 +++ linux-2.6/kernel/pm_qos_params.c 2008-08-28 12:14:55.000000000 -0700 @@ -43,7 +43,7 @@ #include <linux/uaccess.h> /* - * locking rule: all changes to target_value or requirements or notifiers lists + * locking rule: all changes to requirements or notifiers lists * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock * held, taken with _irqsave. One lock to rule them all */ @@ -66,7 +66,7 @@ struct miscdevice pm_qos_power_miscdev; char *name; s32 default_value; - s32 target_value; + atomic_t target_value; s32 (*comparitor)(s32, s32); }; @@ -77,7 +77,7 @@ .notifiers = &cpu_dma_lat_notifier, .name = "cpu_dma_latency", .default_value = 2000 * USEC_PER_SEC, - .target_value = 2000 * USEC_PER_SEC, + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC), .comparitor = min_compare }; @@ -87,7 +87,7 @@ .notifiers = &network_lat_notifier, .name = "network_latency", .default_value = 2000 * USEC_PER_SEC, - .target_value = 2000 * USEC_PER_SEC, + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC), .comparitor = min_compare }; @@ -99,7 +99,7 @@ .notifiers = &network_throughput_notifier, .name = ...
On Thu, 28 Aug 2008 12:44:20 -0700 I tend to deobfuscate email addresses when preparing the changelogs. Not sure why, really, and I feel a bit guilty about doing it (hey, you can forward the resulting spam to me if you like). One reason I guess is to avoid breaking all the emails which my scripts will automatically send out as the patch goes through its little lifecycle. Some of which As usual when I get two different changelogs, I create a masterpiece OK, so I've stared and stared. This patch doesn't do anything. I'm suspecting that the rt patch might be doing something silly, like thinking that it needs to take pm_qos_lock to read a single word from memory, or something like that. But this patch is simply not understandable given the amount of information which you guys have provided. Help? --
On Fri, Aug 29, 2008 at 2:32 AM, Andrew Morton Hi Andrew The purpose of the patch is to remove the spin_lock around the read in the function pm_qos_requirement - since spinlocks can sleep in -rt and this function is called from idle. I did propose a patch, that simply removed the spinlock, since Peter Zilstra and others (yourself) point out, the lock isn't needed to read a single word from memory. Although the patch worked fine, Arjan objected to it, because it was not symmetric. A reader of the code might wonder why it was locked in other functions but not here. He suggested changing it to an atomic type might be less confusing. The alternative which I like better actually is to simply remove the lock in pm_qos_requirement and add a comment for readers of the code. John --
This is a much better changelog, and is what should have been written in the first place ;-) -- Steve --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano |
