Re: preempt rcu bug on s390

Previous thread: [GIT PULL] kbuild updates by Sam Ravnborg on Saturday, February 9, 2008 - 6:02 am. (1 message)

Next thread: [PATCH] proc: extend /proc/<pid>/fdinfo/<fd> by Eugene Teo on Saturday, February 9, 2008 - 8:01 am. (5 messages)
To: Paul E. McKenney <paulmck@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Saturday, February 9, 2008 - 7:34 am

Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
gets stuck when running with more than one cpu.
When booting with four cpus I get all four cpus caught withing cpu_idle
and not advancing anymore. However there is the init process which is
waitung for synchronize_rcu() to complete (lcrash output):

STACK TRACE FOR TASK: 0xf84d968 (swapper)

STACK:
0 schedule+842 [0x36c956]
1 schedule_timeout+172 [0x36d0e4]
2 wait_for_common+204 [0x36c398]
3 synchronize_rcu+76 [0x567bc]
4 netlink_change_ngroups+150 [0x2b4302]
5 genl_register_mc_group+256 [0x2b6174]
6 genl_init+188 [0x534e44]
7 kernel_init+444 [0x518334]
8 kernel_thread_starter+6 [0x192a6]

If I change the code so that timer ticks won't be disabled everything
runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
thing for the rcu preemptible case.

Kernel version is git head of today.

Any ideas?
--

To: Heiko Carstens <heiko.carstens@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Saturday, February 9, 2008 - 10:07 am

Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?

If not, could you please check it out?

Thanx, Paul
--

To: Paul E. McKenney <paulmck@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Saturday, February 9, 2008 - 1:14 pm

It's not applied, however it doesn't change anything. Also the patch
is tied to the dynticks implementation which is differently from
s390's nohz implementation.
I had to add the patch below so it would make at least some sense.
But it doesn't fix the problem.

---
arch/s390/kernel/time.c | 2 ++
include/linux/hardirq.h | 2 +-
kernel/rcupreempt.c | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/rcupreempt.c
===================================================================
--- linux-2.6.orig/kernel/rcupreempt.c
+++ linux-2.6/kernel/rcupreempt.c
@@ -413,7 +413,7 @@ static void __rcu_advance_callbacks(stru
}
}

-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ)

DEFINE_PER_CPU(long, dynticks_progress_counter) = 1;
static DEFINE_PER_CPU(long, rcu_dyntick_snapshot);
Index: linux-2.6/arch/s390/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/s390/kernel/time.c
+++ linux-2.6/arch/s390/kernel/time.c
@@ -200,6 +200,7 @@ static void stop_hz_timer(void)
if (timer >= jiffies_timer_cc)
todval = timer;
}
+ rcu_enter_nohz();
set_clock_comparator(todval);
}

@@ -213,6 +214,7 @@ static void start_hz_timer(void)

if (!cpu_isset(smp_processor_id(), nohz_cpu_mask))
return;
+ rcu_exit_nohz();
account_ticks(get_clock());
set_clock_comparator(S390_lowcore.jiffy_timer + CPU_DEVIATION);
cpu_clear(smp_processor_id(), nohz_cpu_mask);
Index: linux-2.6/include/linux/hardirq.h
===================================================================
--- linux-2.6.orig/include/linux/hardirq.h
+++ linux-2.6/include/linux/hardirq.h
@@ -109,7 +109,7 @@ static inline void account_system_vtime(
}
#endif

-#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ)
+#if defined(CONFIG_PREEMPT_RCU) && (defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ))
extern void rcu_irq_enter(void);
extern void rcu_irq_exit(void...

To: Heiko Carstens <heiko.carstens@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Saturday, February 9, 2008 - 6:02 pm

OK, I was afraid of that. ;-)

Does s390 start out in nohz mode? The reason I ask is that it feels like
an off-by-one error for the dynticks_progress_counter.

--

To: Paul E. McKenney <paulmck@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Sunday, February 10, 2008 - 9:01 am

Actually I forgot to add a few ifdefs to make the code do something :)
That just reveals that we have a conflict with the dynticks implementation
and s390's nohz that shows up in what rcu_irq_enter/exit assume.
I didn't patch s390 and common code so it will work, but I think the
patch you mentionened will fix the problem I reported.
So I guess we should either convert s390 to use the generic dynticks
implementation or disable preemptible rcu on s390 until we converted
our code.

Thanks for helping debugging this!
--

To: Heiko Carstens <heiko.carstens@...>
Cc: Paul E. McKenney <paulmck@...>, Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Monday, February 11, 2008 - 11:37 am

Heiko, thanks for reporting this.

This patch still didn't make it into -rc1, and it really should. Because
without this patch, PREEMPT_RCU and NO_HZ together is broken, on all boxes.

The patch is in Ingo's sched-devel git tree, as
9460545f81ea48b07dbb20456a8ede776d8ebc1b (last I checked) and titled:

rcu: add support for dynamic ticks and preempt rcu

-- Steve
--

To: Heiko Carstens <heiko.carstens@...>
Cc: Gautham R Shenoy <ego@...>, Dipankar Sarma <dipankar@...>, Steven Rostedt <srostedt@...>, Ingo Molnar <mingo@...>, Martin Schwidefsky <schwidefsky@...>, <linux-kernel@...>
Date: Sunday, February 10, 2008 - 1:43 pm

Sounds good to me!!! (Especially converting s390 to generic algorithm.)

I believe that the generic implementation will do what you need, but

Thank you for tracking it down!

Thanx, Paul
--

Previous thread: [GIT PULL] kbuild updates by Sam Ravnborg on Saturday, February 9, 2008 - 6:02 am. (1 message)

Next thread: [PATCH] proc: extend /proc/<pid>/fdinfo/<fd> by Eugene Teo on Saturday, February 9, 2008 - 8:01 am. (5 messages)