This patch series implements support for threaded irq handlers for the generic IRQ layer. Threaded interrupt handlers are not only interesting in the preempt-rt context. Threaded interrupt handlers can help to address common problems in the interrupt handling code: - move long running handlers out of the hard interrupt context - avoid complex hardirq -> tasklet/softirq interaction and locking problems by integration of this functionality into the threaded handler code - improved debugability of the kernel: faulty handlers do not take down the system. - allows prioritizing of the handlers which share an interrupt line The implementation provides an opt-in mechanism to convert drivers to the threaded interrupt handler model contrary to the preempt-rt patch where the threaded handlers are enabled by a brute force switch. The brute force switch is suboptimal as it does not change the interrupt handler -> tasklet/softirq interaction problems, but was the only way which was possible for the limited man power of the preempt-rt developers. Converting an interrupt to threaded makes only sense when the handler code takes advantage of it by integrating tasklet/softirq functionality and simplifying the locking. When a driver wants to use threaded interrupt handlers it needs to provide a quick check handler function, which checks whether the interrupt was originated from the device or not. In case it was originated from the device the quick check handler must disable the interrupt at the device level and return IRQ_WAKE_THREAD. The generic interrupt handling core then sets the IRQF_RUNTHREAD in the irqaction of the handler and wakes the associated thread. The irqaction is referenced in the threads task_struct so handlers can check for newly arrived interrupts in action flags. Aside of that the reference is used to prevent further referencing of the thread in the interrupt code in the case of segfault to make sure that the system (minus the now dead ...
On Wed, 01 Oct 2008 23:02:08 -0000 That would be nice ;) I'm a bit surprised to see that there is no facility for per-cpu interrupt threads? --
On Wed, 1 Oct 2008 16:23:33 -0700 per handler is the right approach (that way, if one dies, all other interrupts will likely keep working) now.. normally an interrupt only goes to one cpu, so effectively it is per cpu already anyway we should however make the irq threads follow the affinity masks of the irq... that'd be an easy add-on and probably worthwhile. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
On Wed, 1 Oct 2008 16:29:50 -0700 Yes, if a) the thread was asleep when it was woken up and b) if the scheduler does the right thing and wakes the thread on the CPU which called wake_up(). The ongoing sagas of tbench/mysql/volanomark regressions make me think that any behaviour which we "expect" of the scheduler should be --
Yup. I missed that detail when I dusted off the moldy patches. Of course we need to pin the thread to the affinity mask of the hardware interrupt. /me goes back to do home work :) Thanks, tglx --
Sven and I started poking at the various USB host drivers on the flight back from Plumbers. I'll see if I can convert over a few here on the systems that I have and send over some patches. Jon. --
There's only 3, how hard could it be? :) /me runs away quickly --
Hi there, [CC'ed some distribution specific experts] Heh! That feature is the only thing, that keeps my crappy combination of hardware and Ubuntu Hardy together :-) With the non-RT Ubuntu kernel flavors, it just hangs in that situation. It looks like this at the moment with a Ubuntu RT-Kernel, in case someone likes to debug that: [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Linux version 2.6.24-21-rt (buildd@palmer) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP PREEMPT RT Mon Aug 25 19:24:40 UTC 2008 (Ubuntu 2.6.24-4.6-generic) [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable) [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 00000000cfff0000 (usable) [ 0.000000] BIOS-e820: 00000000cfff0000 - 00000000cfff3000 (ACPI NVS) [ 0.000000] BIOS-e820: 00000000cfff3000 - 00000000d0000000 (ACPI data) [ 0.000000] BIOS-e820: 00000000d0000000 - 00000000e0000000 (reserved) [ 0.000000] BIOS-e820: 00000000e8000000 - 00000000f0000000 (reserved) [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) [ 0.000000] BIOS-e820: 0000000100000000 - 0000000130000000 (usable) [ 0.000000] Warning only 4GB will be used. [ 0.000000] Use a HIGHMEM64G enabled kernel. [ 0.000000] 3200MB HIGHMEM available. [ 0.000000] 896MB LOWMEM available. [ 0.000000] found SMP MP-table at 000f4b40 [ 0.000000] Entering add_active_range(0, 0, 1048576) 0 entries of 256 used [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0 -> 4096 [ 0.000000] Normal 4096 -> 229376 [ 0.000000] HighMem 229376 -> 1048576 [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[1] active PFN ranges [ 0.000000] 0: 0 -> 1048576 [ 0.000000] On node 0 totalpages: ...
I'm not clear on your direction here.. I don't have a problem with a mass driver audit, which I think is what your suggesting with this patch set .. However, a mass audit like that would push a fully real time system out for quite some time.. I also don't see a clear connection between these changes and ultimately removing spinlock level latency in the kernel. I realize you don't address that in your comments, but this is part of the initiative to remove spinlock level latency.. So with this set of changes and in terms of real time, I'm wonder your going with this ? Daniel --
This helps with latencies and locking. With the current scheme of hardirq, softirq/tasklets, there are a lot of craziness with spin_locks and spin_lock_irqs and mutexes. By creating an interrupt thread, we can skip the softirq/tasklet altogether, and this simplifies locking. There are other cases where threaded interrupt handlers also improve performance. But we will get to those in due time. -- Steve --
Clearly threading irq handlers does have something to do with real time, It's all connected to the removal of latency .. One part depending on How does this simplify locking ? Daniel --
Clearly you have neither clue about real time nor about operating systems in general. Solaris, some BSDs and MacOSX use interrupt threads. Where exactly is the relation to realtime? The concept of interrupt threads is nothing which is in any way related to real time. It is a well known and pretty old concept in operating system design. The fact that real time operating systems benefit from interrupt threads is a totally different topic. tglx --
Here we go again Thomas.. You think you can have a conversation without The very fact that you mention it in your release notes .. You mention The fact that a direct relationship exists means that any threaded interrupt system needs to take into account the inevitable connection to real time since it will be used in that system as a core component.. If you can't effectively achieve real time with your system , than that's a problem that needs to be addressed. Daniel --
Daniel, what kind of logic is this? I was already accused of being on crack today (but was just too much coffee). Perhaps you might be the one that's on crack. I build a pipe. There exists a relationship between a pipe and crap running through it from my toilet. Does this mean that every time I need a pipe, that I need to take into account the inevitable connection to crap to run through it? God, I can see the problems with my gas lines. -- Steve --
Well, that's clearly wrong: threaded IRQ handlers are not tied to real-time in any way. Yes, they can be used for RT too but as far as the upstream kernel is involved that's at most an afterthought. and the "unless this patch isn't actually threading anything" bit does not parse at all. The patches execute hard-IRQ handlers from special what Thomas said was a strong but fair reaction to your obviously incorrect statement. What reaction did you expect? Ingo --
You contradict yourself .. I said "Clearly threading irq handlers does have something to do with real time" then you say "they can be used for RT too" .. So my comments are clearly correct , they have "something" to do with real time. There exists a relationship of some kind or type. We need less insults on this list not more.. Maybe I could understand his reaction had I insulted him, but I didn't.. "Fair reaction" doesn't fit in this case .. Daniel --
What Ingo is telling you is: - RT needs threaded interrupts. - Threaded interrupts do not need RT My dog is an Italian Greyhound. Italian Greyhound is a dog, but a dog is not an Italian Greyhound. -- Steve --
My comments are basically bidirectional , so what your saying doesn't make any sense .. I said basically, that dogs and "Italian Greyhounds" have _some_ connection .. Why are we even debating this. Daniel --
Why are you bringing up real time in this thread?? The thread has absolutely nothing to do with real time. This thread is about a better Again, this thread has nothing to do with removing spinlock level latency. You brought in this relationship with real time, just because real time uses threaded interrupts. This thread has nothing to do with real time. That is what Ingo, Thomas and myself are trying to ge through to you. The strong reaction from Thomas is that you just brought up something that is completely off topic. Basically, drop the real time topic from this thread. It's not related. Yes real time addresses threaded interrupts, but just because we are talking about threaded interrupts does not mean we are talking about real time. Actually my analogy with dogs and IG's is wrong. The pipe and crap analogy is better. Crap uses pipes to get out of the house. But if I'm talking about pipes, I don't want to hear about crap. We are debating this because YOU brought it up! -- Steve --
I'm concerned about the connection between the two, which is what I'm If they are connected (which I think we established) , then it's not out of line for me to discuss the direction of these changes as related to You know Steven, often times you start a conversation and you have no idea where it will end up.. You can't always control which direction it We already debated this fact Steven. real time and this type of threading are connected. It's not off topic to discuss connected components. If the intent here is to totally disconnect these threading patches from any type of real time in the future, then that's a good answer to my original question .. That these changes have no future what so ever in regards to real time. If they will be used in the future for real time then we should discuss I don't see why you are so concerned with this.. Real time is taboo now? Daniel --
Well, please take that up separately. Do you see these patches going into the -rt tree? No, they are going in mainline. We will deal with You are bringing up concerns about mainline changes with something that is maintained outside the mainline tree. Changes to mainline have never Yes Daniel, I know. But this is not a conversation. This is a email thread that is talking about changes to mainline. The mainline kernel developers really don't care about any issues that these changes will do to the real time project. The real time project is a niche, and is currently outside the mainline tree. Hence, lets stop bothering mainline No Daniel, it is off topic. The thread is not about real time issues. This thread is about mainline. If you have an issue that these changes will make to the current mainline tree, then please, by all means, bring No the intent here is to handle mainline issues. The real time issues you consistantly bring up are not important to most kernel developers. If you have real time issues with this change, bring that up on a real time forum. Not in this thread. The changes in this thread are dealing with mainline interrupt handlers. There have been several kernel device driver writers who asked us to get interrupt threads in mainline. This was not about real time, this was about helping out mainline kernel Not at all, Daniel, but this thread is not the appropriate place to discuss your real time concerns. You are asking about what this patch has to do with the future real time direction. Who on this thread cares? (besides you) The topic for mainline patch threads needs to stay focused on mainline. Not on out of tree branches. When the rest of real time is in mainline, then sure, you can discuss those concerns then. But until that happens, keep to the topic of the thread. Which for now is not real time. -- Steve --
Your speaking for a lot of developers.. It's an RFC, it's coming from real time developers, it's real time connected, and this is the real time development list .. The issues I've brought up are specifically design comments/concerns related to future directions.. I was not at all speaking to your real Real time forum ? That's what this list is .. If you want this thread to People that don't care about real time related comments, they can stop reading the thread.. That's the nature of this list .. Daniel --
This RFC patch is from a mainline developer/ maintainer who happens to be a real time developer as well. Welcome to my real time incoherent nonsense filter tglx --
You flagged this as an RFC, typically that means it's a work in You ask for comments, than get upset when you get some .. If anyone needs to be filtered it's you. Daniel --
I'm not sure I'm really looking forward to this brave new world of very long running interrupt handlers. e.g. what do you I had an old patch to handle this without threaded interrupts. What normally happens is when a interrupt oopses it tries to kill the idle process which panics. My fix was to just restart another idle process instead of panicing. But back then it was rejected by Linus with the argument that a crashing interrupt handler will typically hold some lock and the next time the interrupt happens it will deadlock on that lock. Has that changed with your threaded interrupts? If it has changed I suspect the restart idle change could To be honest my opinion is that it will encourage badly written interrupt code longer term. -Andi -- ak@linux.intel.com --
We have this issue today with some irqs (USB is known for issue here...) So I don't think this is a big issue, and in the end, a better idea as it might force us to confront some of the big abusers and fix them. thanks, greg k-h --
On Thu, 2 Oct 2008 14:31:46 -0700 one of the things irq threads gives you is that 'top' will show you which ones are eating cpu ;-) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
oprofile does that job fine already and is imho any time preferable for detailed analysis. -andi -- ak@linux.intel.com --
On Fri, 3 Oct 2008 05:25:04 +0200 while I don't disagree that oprofile will give you more detailed results, I think there's a HUGE difference between asking a bugreporter "can you paste a screen of 'top'" and "can you configure and run oprofile". CHances are good that the user already thought of top him/herself and just reports "interrupt X is eating CPU" rather than "something seems to be eating CPU". I'm not going argue that this alone is enough justification for irqthreads, but you can't deny it's an advantage. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Perhaps the better fix would be to make oprofile easier to configure. I never quite understood for example why it doesn't get the symbol table simply from the kallsyms. Or simply unpacks gzip'ed vmlinux Do we have that many cases of runaway irqs? The only common one I can think of is ACPI, but that is a separate thread already. Or for networking high performance goes into polling mode and most of the work is outside hardirq so it wouldn't be visible in the thread statistics either. But even there livelocks are not very common. Also that would assume that the proposed opt in irq threads are used for all interrupts. -Andi --
Not sure I follow? You're saying you want to first allow very long running IRQ handlers and then after the fact somehow fix them? Sounds like a weird proposal to be honest. The problem of course is that if you have such a very slow hardirq (let's say one that runs for tens of ms) that what happens when another interrupt comes in? How deep is your hardware queue? Or will you just lose events in that case? I suspect in the end you'll need another "fast interrupt" anyways if the hardware queue is not deep enough to buffer at the software side. Sounds a bit like the existing "interrupts" vs "softirq" doesn't it? So I'm just not sure what the "slow interrupt handler" will give you longer term. It just sounds like a redundant concept to me. The other issue is that if you allow sleeping locks in the interrupt handler what guarantee is that it won't block for even longer than tens of milliseconds? And lose even more events. -Andi -- ak@linux.intel.com --
Subject: genirq threading for ehci, ohci and uhci USB hosts.
This is a functional POC for partitioning of the USB interrupt handlers into
a quickcheck function and a threaded main-handler function.
The quickcheck performs the following actions only:
1. read status and stores.
2. check if the device is asserting, disable device assert
3. wake irq thread or return IRQ_NONE (shared IRQ not this USB)
The handler thread performs all other operations, and is conditionally
invoked via return code from quickcheck.
A function pointer *quickcheck() has been added, as well as a status
cache, which remembers the status reported by the chip. The status must
be passed to the threaded handler, since it is cleared by the quickcheck
function.
Currently, non-reentrancy is assumed, i.e. the chip will not assert another
IRQ until the thread has completed running.
In contrast with that, I have tightened up the locking between the
quickcheck handler and the main thread, this may be overly conservative.
The WARN_ONCE notes when status reported by the chip does not match
cached status, although this is expected. Its to be removed.
Note, that locking is not explicitly used for ohci as it is
in uhci and ehci.
The ieee1394/ohci1394.c is a bit more problematic, since it does not use
struct usb_hcd, and therefore another place for the status cache must be found,
in struct ti_ohci presumably.
I will send a separate patch for ieee1394/ohci1394.c after looking at it more.
The patch increases the code size by 100 lines, but there are some cleanups
that account for maybe a dozen lines, which can be broken-out.
Infrastructure:
This hack in kernel/irq/handle.c needs to be looked at, set_bit failed to work
properly on my AMD x86_64 machine, not further investigated and the code path
is currently not triggered by this patch.
+ //set_bit(IRQF_WARNED_THREADED, &action->flags);
+ action->flags |= IRQF_WARNED_THREADED;
Testing:
I have copied ...Thanks for sending this through; I will endeavour to test it over the coming days. I'll also see if I can test it for the case where USB and ieee1394 share IRQs. Even though ieee1394 isn't done yet these changes should prevent the USB-related problems we've seen in FFADO when USB shares the firewire IRQ. Having said that, the forthcoming ohci1394.c patch will also be interesting since that is the other area which will greatly benefit FFADO. Regards jonathan --
Hi, I have re-released the patch set with support for ohci1394. I have not tested, for lack of hardware :: it compiles. The location of status_cache is now within the ti_ohci struct. That organization not entirely consistent with the USB side, predicated by other variation between the drivers. The updated patch queue is available here. http://www.thebigcorporation.com/Sven/genirq-usb/genirq-usb-v2.tar.bz2 Bug fix: The declaration of status_cache has been made consistent u32. The former declaration of unsigned short (copied from uhci-hcd.c) it was probably too ambiguous for some architectures. Thanks, --
