Re: [patch 1/2] genirq: Run irq handlers with interrupts disabled

Previous thread: [patch] acpi: map pxms to low node ids by David Rientjes on Thursday, March 25, 2010 - 4:33 pm. (2 messages)

Next thread: linux-next: manual merge of the pci tree with the pci-current tree by Stephen Rothwell on Thursday, March 25, 2010 - 5:57 pm. (5 messages)
From: Thomas Gleixner
Date: Thursday, March 25, 2010 - 5:06 pm

The following patch series removes the IRQF_DISABLED functionality
from the core interrupt code and runs all interrupt handlers with
interrupts disabled.

IRQF_DISABLED is kept as a define and scheduled for feature removal.

I booted and stressed that patches w/o any obvious fallout on more
than 20 different systems in my arsenal of test machines which
includes various embedded non x86 systems.

To debug eventual latency issues we (admittedly I talked acme into
looking at that) want to extend perf with a top like tool to monitor
the maximum runtime of interrupt handlers with the already existing
tracepoints.

Thanks,

	tglx

--

From: Thomas Gleixner
Date: Thursday, March 25, 2010 - 5:06 pm

Running interrupt handlers with interrupts enabled can cause stack
overflows. That has been observed with multiqueue NICs delivering all
their interrupts to a single core. We might band aid that somehow by
checking the interrupt stacks, but the real safe fix is to run the irq
handlers with interrupts disabled.

Drivers for whacky hardware still can reenable them in the handler
itself, if the need arises. (They do already due to lockdep)

The risk of doing this is rather low:

 - lockdep already enforces this
 - CONFIG_NOHZ has shaken out the drivers which relied on jiffies updates
 - time keeping is not longer sensitive to the timer interrupt being delayed

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/handle.c |    3 ---
 1 file changed, 3 deletions(-)

Index: linux-2.6/kernel/irq/handle.c
===================================================================
--- linux-2.6.orig/kernel/irq/handle.c
+++ linux-2.6/kernel/irq/handle.c
@@ -370,9 +370,6 @@ irqreturn_t handle_IRQ_event(unsigned in
 	irqreturn_t ret, retval = IRQ_NONE;
 	unsigned int status = 0;
 
-	if (!(action->flags & IRQF_DISABLED))
-		local_irq_enable_in_hardirq();
-
 	do {
 		trace_irq_handler_entry(irq, action);
 		ret = action->handler(irq, action->dev_id);


--

From: Andi Kleen
Date: Thursday, March 25, 2010 - 11:13 pm

Can you please explain that lockdep reference?
I don't think lockdep really forces on interrupts, does it?

BTW the one problem I have with this patchkit is that it's clearly
no stable candidate and I was hoping for a stable fix too.
Any chance to at least approve my original patch for .32/.33 only?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Thomas Gleixner
Date: Friday, March 26, 2010 - 6:05 am

Lockdep forces interrupts off. It yells at anyone enabling irqs in the
handler. The ones which do that have been annotated with

Why not simply force IRQF_DISABLED for all MSI interrupts. That still
allows nesting for non MSI ones, but it limits the chance of throwing
up reasonably well. That's a two liner.

Can you please test whether it resolves the issue at hand ?

Thanks,

	tglx
---
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index eb6078c..1d55e92 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -735,6 +735,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		if (new->flags & IRQF_ONESHOT)
 			desc->status |= IRQ_ONESHOT;
 
+		if (desc->msi_desc)
+			new->flags |= IRQF_DISABLED;
+
 		if (!(desc->status & IRQ_NOAUTOEN)) {
 			desc->depth = 0;
 			desc->status &= ~IRQ_DISABLED;






--

From: Andi Kleen
Date: Monday, March 29, 2010 - 10:33 pm

Sorry for the late answer. Got confirmation that this patch
fixes the test case. Thanks.

-Andi
--

From: Thomas Gleixner
Date: Wednesday, March 31, 2010 - 4:16 am

Ok, I'll push it linus wards and cc stable. I think thats the least
intrusive safe bet we can have right now.

Thanks,

	tglx
--

From: Pavel Machek
Date: Friday, April 2, 2010 - 2:31 am

stable? I'd say thats way too intrusive for -stable...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Thomas Gleixner
Date: Friday, April 2, 2010 - 1:42 pm

So we better let the possible stack overruns unaddressed ?

Thanks,

	tglx
--

From: Pavel Machek
Date: Friday, April 2, 2010 - 2:09 pm

-stable should have no regressions, first and foremost. And this is
pretty certain to introduce some, at least on low-powered system with
serial ports.

So yes, it is probably better to let the possible stack overruns
unaddressed. We have lived with them for 15 years or so...

(Alternatively, just make the irq stacks bigger? Or just take Andi's
patch, which solves the overruns, and only introduces latency
regressions when it would otherwise crash?)

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Peter Zijlstra
Date: Friday, April 2, 2010 - 2:31 pm

You've got serial ports with MSI interrupts?
--

From: Thomas Gleixner
Date: Friday, April 2, 2010 - 3:51 pm

Pavel,


I think you misunderstood what I'm going to push. The patch merily
forces IRQF_DISABLED for MSI(X) based interrupts. So that does not
affect low powered systems in any way.

It only affects high end systems where Dave Miller already said he did
the IRQF_DISABLED magic already in some NIC drivers just to prevent
that.

So I think your fear of regressions for low-powered systems is
completely unsubstantiated.

Thanks,

	tglx

 
--

From: tip-bot for Ingo Molnar
Date: Tuesday, April 13, 2010 - 12:33 pm

Commit-ID:  e58aa3d2d0cc01ad8d6f7f640a0670433f794922
Gitweb:     http://git.kernel.org/tip/e58aa3d2d0cc01ad8d6f7f640a0670433f794922
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Fri, 26 Mar 2010 00:06:51 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Apr 2010 16:36:40 +0200

genirq: Run irq handlers with interrupts disabled

Running interrupt handlers with interrupts enabled can cause stack
overflows. That has been observed with multiqueue NICs delivering all
their interrupts to a single core. We might band aid that somehow by
checking the interrupt stacks, but the real safe fix is to run the irq
handlers with interrupts disabled.

Drivers for whacky hardware still can reenable them in the handler
itself, if the need arises. (They do already due to lockdep)

The risk of doing this is rather low:

 - lockdep already enforces this
 - CONFIG_NOHZ has shaken out the drivers which relied on jiffies updates
 - time keeping is not longer sensitive to the timer interrupt being delayed

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@osdl.org>
LKML-Reference: <20100326000405.758579387@linutronix.de>

---
 kernel/irq/handle.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 76d5a67..27e5c69 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -370,9 +370,6 @@ irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
 	irqreturn_t ret, retval = IRQ_NONE;
 	unsigned int status = 0;
 
-	if (!(action->flags & IRQF_DISABLED))
-		local_irq_enable_in_hardirq();
-
 	do {
 		trace_irq_handler_entry(irq, action);
 		ret = ...
From: Peter Zijlstra
Date: Thursday, April 15, 2010 - 12:35 am

Clearly I don't mind this, but shouldn't we at least first convert the
known problematic drivers to an alternative strategy?




--

From: Venkatesh Pallipadi
Date: Tuesday, May 25, 2010 - 1:32 pm

Just noticed a minor side effect of this patch on /proc/stat.

As int handlers run with ints disabled, timer interrupt based
samples will no longer account for any irq time. As a result, irq
time in /proc/stat, top, etc will be mostly zero (unless some handler
is enabling ints).

This is not a problem on s390, powerpc, ia64 when VIRT_CPU_ACCOUNTING is
enabled.

May be we should gracefully call it out as zero, with patch like below?

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 Documentation/filesystems/proc.txt |    2 +-
 fs/proc/stat.c                     |    4 ++++
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 6507d2a..8f5254a 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1123,7 +1123,7 @@ second).  The meanings of the columns are as follows, from left to right:
 - system: processes executing in kernel mode
 - idle: twiddling thumbs
 - iowait: waiting for I/O to complete
-- irq: servicing interrupts
+- irq: servicing interrupts (valid only with VIRT_CPU_ACCOUNTING, 0 otherwise)
 - softirq: servicing softirqs
 - steal: involuntary wait
 - guest: running a normal guest
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index bf31b03..d92bea8 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -46,7 +46,9 @@ static int show_stat(struct seq_file *p, void *v)
 		idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
 		idle = cputime64_add(idle, arch_idle_time(i));
 		iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
 		irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
+#endif
 		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
 		steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
 		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
@@ -87,7 +89,9 @@ static int show_stat(struct seq_file *p, void *v)
 		idle = kstat_cpu(i).cpustat.idle;
 ...
From: Thomas Gleixner
Date: Thursday, March 25, 2010 - 5:06 pm

Remove all code which is related to IRQF_DISABLED from the core kernel
code. IRQF_DISABLED still exists as a flag, but becomes a NOOP and
will be removed after a grace period. That way we can easily revert to
the previous behaviour by just restoring the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/feature-removal-schedule.txt |    7 +++++++
 include/linux/interrupt.h                  |    3 ++-
 kernel/irq/manage.c                        |   20 --------------------
 3 files changed, 9 insertions(+), 21 deletions(-)

Index: linux-2.6/Documentation/feature-removal-schedule.txt
===================================================================
--- linux-2.6.orig/Documentation/feature-removal-schedule.txt
+++ linux-2.6/Documentation/feature-removal-schedule.txt
@@ -589,3 +589,10 @@ Why:	Useful in 2003, implementation is a
 	Generally invoked by accident today.
 	Seen as doing more harm than good.
 Who:	Len Brown <len.brown@intel.com>
+
+----------------------------
+
+What:	IRQF_DISABLED
+When:	2.6.36
+Why:	The flag is a NOOP as we run interrupt handlers with interrupts disabled
+Who:	Thomas Gleixner <tglx@linutronix.de>
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -39,7 +39,8 @@
  * These flags used only by the kernel as part of the
  * irq handling routines.
  *
- * IRQF_DISABLED - keep irqs disabled when calling the action handler
+ * IRQF_DISABLED - keep irqs disabled when calling the action handler.
+ *                 DEPRECATED. This flag is a NOOP and scheduled to be removed
  * IRQF_SAMPLE_RANDOM - irq is used to feed the random generator
  * IRQF_SHARED - allow sharing the irq among several devices
  * IRQF_PROBE_SHARED - set by callers when they expect sharing mismatches to occur
Index: ...
From: Andi Kleen
Date: Thursday, March 25, 2010 - 11:20 pm

Perhaps I'm dense but it's not fully clear to me why is suddenly safe to use 
the behaviour of this flags on shared interrupts when it wasn't before?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Thomas Gleixner
Date: Friday, March 26, 2010 - 4:19 am

The shared handlers cannot guarantee to run one with irqs enabled and
the other with irqs disabled. That's all. There is absolutely no
reason why we would need interrupts enabled to process the shared
handler chain.

Thanks,

	tglx
--

From: tip-bot for Thomas Gleixner
Date: Tuesday, April 13, 2010 - 12:34 pm

Commit-ID:  6932bf37bed45ce8ed531928b1b0f98162fe6df6
Gitweb:     http://git.kernel.org/tip/6932bf37bed45ce8ed531928b1b0f98162fe6df6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 26 Mar 2010 00:06:55 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Apr 2010 16:36:40 +0200

genirq: Remove IRQF_DISABLED from core code

Remove all code which is related to IRQF_DISABLED from the core kernel
code. IRQF_DISABLED still exists as a flag, but becomes a NOOP and
will be removed after a grace period. That way we can easily revert to
the previous behaviour by just restoring the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@osdl.org>
LKML-Reference: <20100326000405.991244690@linutronix.de>
---
 Documentation/feature-removal-schedule.txt |    7 ++++++
 include/linux/interrupt.h                  |    3 +-
 kernel/irq/manage.c                        |   30 ----------------------------
 3 files changed, 9 insertions(+), 31 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index ed511af..9c31c2e 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -589,3 +589,10 @@ Why:	Useful in 2003, implementation is a hack.
 	Generally invoked by accident today.
 	Seen as doing more harm than good.
 Who:	Len Brown <len.brown@intel.com>
+
+----------------------------
+
+What:	IRQF_DISABLED
+When:	2.6.36
+Why:	The flag is a NOOP as we run interrupt handlers with interrupts disabled
+Who:	Thomas Gleixner <tglx@linutronix.de>
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index d7e7a76..e6d2f44 100644
--- ...
From: David Miller
Date: Thursday, March 25, 2010 - 8:34 pm

From: Thomas Gleixner <tglx@linutronix.de>

Acked-by: David S. Miller <davem@davemloft.net>
--

From: Russell King
Date: Friday, March 26, 2010 - 1:14 am

As was covered in previous discussions, what about drivers such as
SMC91x which take a long time to retrieve packets from the hardware?
Always running handlers with IRQs disabled will kill things such as
serial on these platforms.

So based on your description, I have no option but to NAK this.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--

From: Ingo Molnar
Date: Friday, March 26, 2010 - 2:20 am

As long as it's rare (which it is) i dont see a problem: you can enable 
interrupts in the handler by using local_irq_enable(), like the IDE PIO 
drivers do. That way it's documented a bit better as well, because it shows 
the precise source of the latency, with a big comment explaining it, etc.

Thanks,

	Ingo
--

From: Peter Zijlstra
Date: Friday, March 26, 2010 - 2:28 am

Or alternatively, use threaded interrupts for such slow hardware.

--

From: Jamie Lokier
Date: Friday, March 26, 2010 - 5:02 am

What is the latency of threaded interrupts these days, compared with
non-threaded interrupts?

Slow hardware is quite sensitive to increases in latency.  Obviously
not a problem for the sources of latency: it's a problem for the irq
which is _sensitive_ to latency caused by the other one.  That is
typically a serial port or something.

But the benefit of kernel-settable interrupt priorities (i.e. due to
the threads) may be worth it even for serial ports.  I would love to
see some measurements comparing with and without.

-- Jamie
--

From: Alan Cox
Date: Friday, March 26, 2010 - 2:59 am

I don't think it's as rare as you think particularly in embedded, and the
moment you start explicitly using local_irq_enable() you've simply moved
the underlying problem back and made it far harder to grep for.

Alan
--

From: Peter Zijlstra
Date: Friday, March 26, 2010 - 3:08 am

We've got local_irq_enable_in_hardirq() which should be used and can
easily be grep'ed for.

But yes, I would much prefer to simply convert these known slow handlers
to threaded interrupts.

--

From: Andi Kleen
Date: Friday, March 26, 2010 - 3:12 am

That could be potentially hundreds in old obscure drivers all over the tree.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Ingo Molnar
Date: Friday, March 26, 2010 - 3:53 am

Yeah, agreed. So there's multiple solutions:

 - On old hw with a driver that nobody is willing to convert to threaded IRQs: 
   use the existing local_irq_enable_in_hardirq() API. This preserves the 
   status quo.

 - On new hw with new drivers where there's such a level of IRQ parallelism 
   that enabling IRQs in hardirqs is not an option, use threaded IRQ handlers.

Thanks,

	Ingo
--

From: Nicolas Pitre
Date: Friday, March 26, 2010 - 5:00 am

Can't do that.  The smc91x has a very small internal buffer which has to 
be emptied using PIO.  Threaded interrupts simply have too high 
latencies for overruns not to occur.  That's why the RX path is entirely 
done in hardirq context while the TX path is done in softirq context.


Nicolas
--

From: Jamie Lokier
Date: Friday, March 26, 2010 - 5:06 am

Although I wouldn't be surprised to find threaded interrupts are too
slow on certain hardware, is that _fundamental_ to threaded
interrupts, or is it just that our implementation doesn't have the
funky hot path straight direct from hardirq -> running high priority
RT irq thread when it exceeds previously running priority?

In other words, can we swizzle threaded irqs into something more
resembling software-implemented hard irq priorities, while cunningly
updating the kernel state just enough to look like it's a thread?

-- Jamie
--

Previous thread: [patch] acpi: map pxms to low node ids by David Rientjes on Thursday, March 25, 2010 - 4:33 pm. (2 messages)

Next thread: linux-next: manual merge of the pci tree with the pci-current tree by Stephen Rothwell on Thursday, March 25, 2010 - 5:57 pm. (5 messages)