Re: [patch 1/3] kernel: local_irq_{save,restore}_nmi()

Previous thread: [PATCH] perf kmem: Fix breakage introduced by 5a0e3ad slab.h script by Arnaldo Carvalho de Melo on Tuesday, April 6, 2010 - 6:37 am. (2 messages)

Next thread: Re: [PATCH]: REPOST cleanup debug message in init_kstat_irqs() by Prarit Bhargava on Tuesday, April 6, 2010 - 6:48 am. (1 message)
From: Peter Zijlstra
Date: Tuesday, April 6, 2010 - 6:28 am

This patch set tries to solve the local_irq_disable() vs NMI problem
that SPARC has by providing new arch hooks and instrumenting the
existing interface to WARN on conflicting usage.

--

From: Peter Zijlstra
Date: Tuesday, April 6, 2010 - 6:28 am

Provide local_irq_{save,restore}_nmi() which will allow us to help
architectures that implement NMIs using IRQ priorities like SPARC64
does.

Sparc uses IRQ prio 15 for NMIs and implements local_irq_disable() as
disable <= 14. However if you do that while inside an NMI you re-
enable the NMI priority again, causing all kinds of fun.

A more solid implementation would first check the disable level and
never lower it, however that is more costly and would slow down the
rest of the kernel for no particular reason.

Therefore introduce local_irq_save_nmi() which can implement this
slower but more solid scheme and dis-allow local_irq_save() from NMI
context.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/irqflags.h     |   51 ++++++++++++++++++++++++++++++++++++++++---
 kernel/lockdep.c             |    7 +++++
 kernel/trace/trace_irqsoff.c |    8 ++++++
 3 files changed, 63 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/irqflags.h
===================================================================
--- linux-2.6.orig/include/linux/irqflags.h
+++ linux-2.6/include/linux/irqflags.h
@@ -18,6 +18,7 @@
   extern void trace_softirqs_off(unsigned long ip);
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
+  extern void trace_hardirqs_off_no_nmi(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
 # define trace_softirq_context(p)	((p)->softirq_context)
 # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
@@ -30,6 +31,7 @@
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
+# define trace_hardirqs_off_no_nmi()	do { } while (0)
 # define trace_softirqs_on(ip)		do { } while (0)
 # define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
@@ -59,15 +61,15 @@
 #define local_irq_enable() \
 	do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
 #define ...
From: Steven Rostedt
Date: Tuesday, April 6, 2010 - 6:13 pm

Should we do this for all archs? I can imagine a lot of warning reports
coming in the near future. And they will be passing it towards me.



--

From: David Miller
Date: Tuesday, April 6, 2010 - 6:19 pm

From: Steven Rostedt <rostedt@goodmis.org>

That's the whole point, so that the problem is more easily noticed and
it gets fixed long before I end up accidently testing the code on my
machines :-)

To be honest, the fix is so trivial, you just need to add '_nmi' to
the local_irq_{save,restore}() calls that warn like this.

I'm even willing to have you forward all of those reports to me and
I'll be responsible for fixing them.

How's that? :-)
--

From: Steven Rostedt
Date: Tuesday, April 6, 2010 - 6:23 pm

Sure.

/me sets up his procmailrc to search for the WARN_ON line in
trace_irqsoff.c and have it forward to davem.

-- Steve


--

From: Peter Zijlstra
Date: Tuesday, April 6, 2010 - 6:28 am

Since we can call cpu_clock() from NMI context fix up the IRQ
disabling to conform to the new rules.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched_clock.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/sched_clock.c
===================================================================
--- linux-2.6.orig/kernel/sched_clock.c
+++ linux-2.6/kernel/sched_clock.c
@@ -241,9 +241,9 @@ unsigned long long cpu_clock(int cpu)
 	unsigned long long clock;
 	unsigned long flags;
 
-	local_irq_save(flags);
+	local_irq_save_nmi(flags);
 	clock = sched_clock_cpu(cpu);
-	local_irq_restore(flags);
+	local_irq_restore_nmi(flags);
 
 	return clock;
 }


--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 4:27 am

That seem to add a small overhead in various places.
Do we want to make local_irq_save_nmi == local_irq_save
for archs that have native nmi?

Or cpu_clock_nmi()?

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 4:31 am

That is already so, see 1/3.


--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 4:44 am

Ah you're right.

--

From: Peter Zijlstra
Date: Tuesday, April 6, 2010 - 6:28 am

Patch 8bb39f9 (perf: Fix 'perf sched record' deadlock) introduced a
local_irq_save() in NMI context, convert that to local_irq_save_nmi()
and move the IRQ disable into perf_output_lock/unlock().

The former is needed because we now disallow local_irq_disable() from
NMI context due to some arch limitations.

The second is because its really about IRQ lock inversion with that
funny output lock, and perf_event_task_output() is only one site that
could trigger it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
---
 include/linux/perf_event.h |    1 +
 kernel/perf_event.c        |   17 ++++++-----------
 2 files changed, 7 insertions(+), 11 deletions(-)

Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -758,6 +758,7 @@ struct perf_output_handle {
 	struct perf_mmap_data		*data;
 	unsigned long			head;
 	unsigned long			offset;
+	unsigned long			flags;
 	int				nmi;
 	int				sample;
 	int				locked;
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -2848,6 +2848,10 @@ static void perf_output_lock(struct perf
 	struct perf_mmap_data *data = handle->data;
 	int cur, cpu = get_cpu();
 
+	/*
+	 * Since this is a lock we need to be IRQ-safe
+	 */
+	local_irq_save_nmi(handle->flags);
 	handle->locked = 0;
 
 	for (;;) {
@@ -2906,6 +2910,7 @@ again:
 	if (atomic_xchg(&data->wakeup, 0))
 		perf_output_wakeup(handle);
 out:
+	local_irq_restore_nmi(handle->flags);
 	put_cpu();
 }
 
@@ -3385,19 +3390,10 @@ static void perf_event_task_output(struc
 	unsigned long flags;
 	int size, ret;
 
-	/*
-	 * If this CPU attempts to acquire an rq lock held by a CPU spinning
-	 * in perf_output_lock() from interrupt context, it's game over.
-	 ...
From: David Miller
Subject:
Date: Tuesday, April 6, 2010 - 10:54 am

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Thanks Peter, looks good at first sight.

I'll toss together the sparc bits and give this a go.
--

From: David Miller
Date: Tuesday, April 6, 2010 - 4:39 pm

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Ok, two patches coming.

One which adds the sparc64 irqflags.h methods.

And one which annotates the ftrace functions, as needed.

With this I have the function tracer working.

Thanks!
--

Previous thread: [PATCH] perf kmem: Fix breakage introduced by 5a0e3ad slab.h script by Arnaldo Carvalho de Melo on Tuesday, April 6, 2010 - 6:37 am. (2 messages)

Next thread: Re: [PATCH]: REPOST cleanup debug message in init_kstat_irqs() by Prarit Bhargava on Tuesday, April 6, 2010 - 6:48 am. (1 message)