Re: [RFC PATCH] kick sleeping idle CPUS on cpu_idle_wait

Previous thread: [PATCH 00/10] percpu: Per cpu code simplification V3 by travis on Monday, January 7, 2008 - 10:11 pm. (8 messages)

Next thread: !Congratulations! You have won. by final_notification11 on Monday, January 7, 2008 - 10:22 pm. (1 message)
To: LKML <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andi Kleen <ak@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, January 7, 2008 - 10:27 pm

Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
already in idle, have no tasks waiting to run and have no interrupts
going to them. This is common on bootup when switching cpu idle
governors.

This patch accounts for CPUs already in idle.

Background:
-----------
I notice this while developing the mcount patches, that every once in a
while the system would hang. Looking deeper, the hang was always at boot
up when registering init_menu of the cpu_idle menu governor. Talking
with Thomas Gliexner, we discovered that one of the CPUS had no timer
events scheduled for it and it was in idle (running with NO_HZ). So the
CPU would not set the cpu_idle_state bit.

Hitting sysrq-t a few times would eventually route the interrupt to the
stuck CPU and the system would continue.

Note, I would have used the PDA isidle but that is set after the
cpu_idle_state bit is cleared, and would leave a window open where we
may miss being kicked.

hmm, looking closer at this, we still have a small race window between
clearing the cpu_idle_state and disabling interrupts (hence the RFC).

CPU0: CPU 1:
--------- ---------
cpu_idle_wait(): cpu_idle():
| __cpu_cpu_var(is_idle) = 1;
| if (__get_cpu_var(cpu_idle_state)) /* == 0 */
per_cpu(cpu_idle_state, 1) = 1; |
if (per_cpu(is_idle, 1)) /* == 1 */ |
smp_call_function(1) |
| receives ipi and runs do_nothing.
wait on map == empty idle();
/* waits forever */

So really we need interrupts off for most of this then. One might think
that we could simply clear the cpu_idle_state from do_nothing, but I'm
assuming that cpu_idle governors can be removed, and this might cause a
race that a governor might be used after the module was removed.

One solution I think is that we could keep calling do_nothing to all the
bit...

To: Steven Rostedt <rostedt@...>
Cc: LKML <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andi Kleen <ak@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, January 7, 2008 - 11:33 pm

I must admit I never liked that cpu idle wait code anyways. Why again
can't normal RCU be used for this? Waiting for two RCU quiescent cycles should
be enough, shouldn't it?

-Andi
--

To: LKML <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Wednesday, January 9, 2008 - 4:42 pm

This patch is different than the first patch I sent out.
This one just sends an IPI to all CPUS that don't check in after 1 sec.

Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
already in idle, have no tasks waiting to run and have no interrupts
going to them. This is common on bootup when switching cpu idle
governors.

This patch gives those CPUS that don't check in an IPI kick.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
arch/x86/kernel/process_32.c | 11 +++++++++++
arch/x86/kernel/process_64.c | 11 +++++++++++
2 files changed, 22 insertions(+)

Index: linux-compile-i386.git/arch/x86/kernel/process_32.c
===================================================================
--- linux-compile-i386.git.orig/arch/x86/kernel/process_32.c 2008-01-09 14:09:36.000000000 -0500
+++ linux-compile-i386.git/arch/x86/kernel/process_32.c 2008-01-09 14:09:45.000000000 -0500
@@ -204,6 +204,10 @@ void cpu_idle(void)
}
}

+static void do_nothing(void *unused)
+{
+}
+
void cpu_idle_wait(void)
{
unsigned int cpu, this_cpu = get_cpu();
@@ -228,6 +232,13 @@ void cpu_idle_wait(void)
cpu_clear(cpu, map);
}
cpus_and(map, map, cpu_online_map);
+ /*
+ * We waited 1 sec, if a CPU still did not call idle
+ * it may be because it is in idle and not waking up
+ * because it has nothing to do.
+ * Give all the remaining CPUS a kick.
+ */
+ smp_call_function_mask(map, do_nothing, 0, 0);
} while (!cpus_empty(map));

set_cpus_allowed(current, tmp);
Index: linux-compile-i386.git/arch/x86/kernel/process_64.c
===================================================================
--- linux-compile-i386.git.orig/arch/x86/kernel/process_64.c 2008-01-09 14:09:36.000000000 -0500
+++ linux-compile-i386.git/arch/x86/kernel/process_64.c 2008-01-09 15:17:20.000000000 -0500
@@ -135,6 +135,10 @@ static void poll_idle (void)
cpu_relax();
}

+static void do_nothing(void *unused)
+{
+}
+
void cpu_idle_wait(void)
{
u...

To: Steven Rostedt <rostedt@...>, LKML <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Wednesday, January 9, 2008 - 8:12 pm

I think your RFC patch is the right solution here. As I see it, there is
no race with your RFC patch.
As long as you call a dummy smp_call_function on all CPUs, we should be
OK. We can get rid of cpu_idle_state
and the current wait forever logic altogether with dummy
smp_call_function. And so there wont be any wait forever scenario.

The whole point of cpu_idle_wait() is to make all CPUs come out of idle
loop atleast once.
The caller will use cpu_idle_wait something like this.

// Want to change idle handler
- Switch global idle handler to always present default_idle
- call cpu_idle_wait so that all cpus come out of idle for an instant
and stop using old idle pointer and start using default idle
- Change the idle handler to a new handler
- optional cpu_idle_wait if you want all cpus to start using the new
handler immediately.

May be the below 1s patch is safe bet for .24. But for .25, I would say
we just replace all complicated logic by simple dummy smp_call_function
and remove cpu_idle_state altogether.

Thanks,
--

To: Steven Rostedt <rostedt@...>
Cc: <linux-kernel@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <len.brown@...>, <venkatesh.pallipadi@...>, <abelay@...>, <a.p.zijlstra@...>, <ak@...>
Date: Wednesday, January 9, 2008 - 7:42 pm

s/cpus_/cpu_/

On Wed, 09 Jan 2008 15:42:10 -0500

This seems rather hacky. Although it may turn out to be the most efficient
fix, dunno.

I'd have thought that the right fix would be to plug the race which you
described at the top-of-thread. That might require some redesign, but it
sounds like the design is wrong anyway.

Maybe your proposed fix is suitable for a 2.6.24 bandaid..

<looks at cpu_idle_wait()>

<pokes his tongue out at the person who put in a global,
exported-to-modules interface and didn't bother documenting it>

OK, it's called infrequently, so a few extra IPIs there won't hurt.

btw, it's pretty damn sad that cpu_idle_wait() will always stall for at
least one second. That's a huge amount of time and I bet it's thousands of
times longer than is actually needed..
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <len.brown@...>, <venkatesh.pallipadi@...>, <abelay@...>, <a.p.zijlstra@...>, <ak@...>
Date: Wednesday, January 9, 2008 - 8:05 pm

I didn't like that either. But I was focusing on something else, and I was
getting sick and tired of my box hanging on bootup every once in a while
(usually when I reboot and walk away, just to come back to find the box
hung).

So this was my band-aid, and since it was only happening on the box
running with my patches, I thought it may have been something I did. But
then it finally hung on a reboot to a vanilla kernel, so I decided to at
least send my band-aid out.

If anything, this should get some notice and we can have a proper fix for
.25.

-- Steve

--

To: LKML <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Wednesday, January 9, 2008 - 6:12 pm

FYI, I just hit this hang on 2.6.24-rc6 without any extra patches. So,
unless 2.6.24-rc7 did anything to fix this issue, this is a high
priority bug (IMHO).

-- Steve

--

To: Steven Rostedt <rostedt@...>
Cc: LKML <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Thursday, January 10, 2008 - 9:45 am

i'm wondering why this only triggered now. Is this something new in
2.6.24?

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Thursday, January 10, 2008 - 10:43 am

It only triggeres with the switching of the idle governors. And not just
one, you need to switch twice. The first loading of a governor does not
call cpu_idle_wait, but the second one does. NO_HZ must also be enabled,
plus this needs to happen when no events or threads are scheduled to run
on a CPU, which limits this to boot up.

Also, this only seems to happen on my 2x2 (4way) and only once in a while.

I'm surprised that I'm the only one so far to report it. I can boot up the
2.6.23 kernel on this box to see if it also hangs sometimes. But, as I
said, it may take several hundreds of tries to see it.

-- Steve

--

To: Steven Rostedt <rostedt@...>, Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Thursday, January 10, 2008 - 1:31 pm

With 2.6.23, you can try compiling acpi processor.ko as a module and
doing insmod and rmmod in a loop. That should call cpu_idle_wait very
frequently.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Ingo Molnar <mingo@...>, LKML <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>, Brown, Len <len.brown@...>, Adam Belay <abelay@...>, Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <ak@...>
Date: Thursday, January 10, 2008 - 2:03 pm

The thing is, after the full boot, there's too many things going on to
keep a system fully idle. Little services can wake a processor up. The
problem I had was that this happens on early bootup where there's not much
around to wake the sleeping processor.

I'm not sure if it is too much of a problem after boot up.

-- Steve

--

Previous thread: [PATCH 00/10] percpu: Per cpu code simplification V3 by travis on Monday, January 7, 2008 - 10:11 pm. (8 messages)

Next thread: !Congratulations! You have won. by final_notification11 on Monday, January 7, 2008 - 10:22 pm. (1 message)