Re: Interrupt routine usage not shown by top in 8.0

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Robert Watson
Date: Friday, March 13, 2009 - 8:41 am

On Fri, 13 Mar 2009, Barney Cordoba wrote:


Strikes me that this thread is getting a bit contentious, and I don't meaning 
in a locking sense :-).

FreeBSD provides two interrupt execution environments: fast interrupts, and 
ithreads.  Historically (5.x ... 7.x) device drivers have had to select one of 
the two models, but in 8.x a hybrid mode, called interrupt filters, allows 
drivers to do both for an interrupt source.  The problem you're running into 
is that "fast" interrupts borrow the complete context of the thread they 
preempt, including its stack and accounting characteristics.  For pure ithread 
drivers, this is generally not a problem, as the sole purpose of that 
interrupt handler is to kick the scheduler to launch the full ithread context, 
which will typically immediately preempt (at that same point in the stack) in 
order to give the interrupt handler a full context that can sleep on locks, be 
accounted for, etc.

if_em lives in a world a bit between these models, in which it wants both a 
fast context, to do a bit of low level interaction with the device, and a 
"slow" or full context in which to execute the network stack, perform memory 
allocation, and so on.  Because interrupt filters weren't yet around (and are 
presumably too experimental to use in 8.x right now), it does this by creating 
its own ithread-like execution context using a task queue.  The result is 
mis-billing of what is effectively an ithread as system time instead of 
interrupt time.  You'll notice that if_em (and other drivers employing the 
same trick) do elevate the priority of the task queue thread so that the 
scheduler treats it (almost) the same way it treats an ithread (it will 
immediately preempt most stuff).


The overhead of the scheduler is billed to a combination of the thread being 
switched out of, and the thread being switched to.  Fast interrupt execution 
is billed to the thread it preempts.  In the scenario you describe, you will 
only get mis-billing to idle if those fast interrupts preempt only the idle 
thread.  Otherwise they will get billed to whatever is preempted.  On a system 
where you have a network interface effectively keeping the CPU busy, it will 
get billed to the task queue thread (I expect) because the task queue is what 
will get preempted.  Now, this might not strictly be true because the 
scheduler tries hard to keep ithreads running close to where the interrupt is 
delivered, but if it doesn't know the task queue thread is an ithread, it may 
do this less well.  Presumably this is a temporary state of mind while 
interrupt filters are being adopted, only the interrupt filter work seems 
stalled (?).


The purpose of a context switch from a fast interrupt context is to give 
interrupt code the ability to acquire general kernel locks, as opposed to just 
spin locks.  If you run in a borrowed context (i.e., you have synchronously 
preempted a thread to run a fast interrupt), you may (will) generate deadlocks 
due to violating lock orders, since you don't want to (can't) release the 
locks already acquired by the thread, and may then acquire locks in the wrong 
order.  If you want to acquire full sleep locks, you need a full context, 
which requires a context switch out of the preempted thread and into an 
ithread (or task queue thread or whatever).  Passage into the normal input 
paths of the network stack will encounter normal locks, so must be done from a 
full context.


It's fine to enter the network stack from any full and dedicated thread 
context, which means it's OK from an ithread or a task queue thread kicked by 
a fast interrupt, but it's not OK from a fast interrupt.  There's no 
difference between MSI/MSIX as far as I know from this perspective, only in 
how the drivers use them.


Only a few drivers use the fast interrupt approach; those that don't 
presumably do it because the approach of mixing "fast" and "slow" contexts 
hasn't been applied by their authors.  If the interrupt filter model is going 
to become mainstream, I think we'd like to see them adopt that rather than 
hand-crafting fast interrupts and taskqueues, to the same effect, in every 
driver.  On the other hand, one benefit to the task queue model is that you 
can deliver events to it that *aren't* interrupts, such as requests for state 
transitions from the software side of the stack.


All direct interrupt deliveries bill a small amount of CPU time to the context 
that they execute in.  Drivers that do less work in the fast interrupt 
delivery context will bill less time outside of their own worker ithreads.
There are two ways to measure CPU use, btw: one is a sampled approach 
involving timers, which works badly in fast interrupt contexts with interrupts 
disabled because the ticks are deferred until after interrupts are re-enabled, 
an the other is explicit time measurement, which is quite expensive. 
Currently the kernel uses the TSC, where available, and an estimator to map 
between CPU cycles and real time, but that also has its limitations.


When the em driver creates task queue threads, it assigns them an ithread 
priority.  You can manually adjust that priority in the code, but I'm not sure 
we have an explicit management API from userspace to adjust those priorities 
without source code changes (but I may be wrong).


You can use cpuset to force a specific thread onto a specific CPU, and to 
force other threads not to run on that CPU.  You can also use cpuset, I 
believe, to direct the low-level interrupt delivery for sources to specific 
CPUs, but I've not done this myself.


This is, effectively, what fast interrupt handlers + task queues do.  There's 
another potential dispatch point, between the link layer and the network 
layer, which is controlled by the net.isr.direct flag; right now we dispatch 
the whole stack to completion in the ithread, but you can turn that off by 
setting the flag to zero.  In practice, the context switch avoidance 
associated with doing that appears to be a significant win for many, but not 
all, workloads.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Thu Mar 12, 1:23 pm)
Re: Interrupt routine usage not shown by top in 8.0, Chris Ruiz, (Thu Mar 12, 2:17 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Thu Mar 12, 3:40 pm)
Re: Interrupt routine usage not shown by top in 8.0, Chris Ruiz, (Thu Mar 12, 4:12 pm)
Re: Interrupt routine usage not shown by top in 8.0, Scott Long, (Thu Mar 12, 4:42 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Thu Mar 12, 5:18 pm)
Re: Interrupt routine usage not shown by top in 8.0, Scott Long, (Thu Mar 12, 5:35 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Fri Mar 13, 7:34 am)
Re: Interrupt routine usage not shown by top in 8.0, Robert Watson, (Fri Mar 13, 8:41 am)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Fri Mar 13, 10:52 am)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Fri Mar 13, 3:41 pm)
Re: Interrupt routine usage not shown by top in 8.0, Robert Watson, (Tue Mar 17, 7:04 am)
Re: Interrupt routine usage not shown by top in 8.0, Paolo Pisati, (Tue Mar 17, 7:27 am)
Re: Interrupt routine usage not shown by top in 8.0, Robert Watson, (Tue Mar 17, 8:24 am)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Tue Mar 17, 1:28 pm)
Re: Interrupt routine usage not shown by top in 8.0, Sam Leffler, (Tue Mar 17, 1:41 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Tue Mar 17, 4:03 pm)
Re: Interrupt routine usage not shown by top in 8.0, Scott Long, (Tue Mar 17, 10:34 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Wed Mar 18, 1:44 pm)
Re: Interrupt routine usage not shown by top in 8.0, Scott Long, (Wed Mar 18, 2:25 pm)
Re: Interrupt routine usage not shown by top in 8.0, Barney Cordoba, (Sun Mar 22, 3:06 pm)