Re: Soft IRQ statistics under /proc/stat

Previous thread: [PATCH] x86: check dsdt before find oem table for es7000 by Yinghai Lu on Thursday, September 11, 2008 - 6:25 pm. (1 message)

Next thread: [stable] regression in iptables: recent filter by Grant Coady on Thursday, September 11, 2008 - 6:53 pm. (7 messages)
From: Elad Lahav
Date: Thursday, September 11, 2008 - 6:38 pm

I've been observing some oddities in the statistics produced by mpstat 
with respect to soft IRQs (for example, considerable soft IRQ time on 
processors sending UDP packets on dummy NICs). While looking at the 
kernel code, I noticed that ticks are attributed to soft IRQs when 
softirq_count() is greater than 0. This happens in __local_bh_disable(), 
which is called from __do_softirq(), but also from local_bh_disable(). 
Thus, the number of ticks reported in /proc/stat refers to any execution 
path that runs with soft IRQs disabled, not just code called from 
__do_softirq().
I hacked the kernel to differentiate between the two cases, and indeed 
the anomalies I saw can be explained as code executing under 
local_bh_disable().

Is this behaviour by design? References to /proc/stat on the web refer 
to this number simply as "soft IRQ time" (e.g., 
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/en-US/Reference_Guide/s2-p...). 
I would have expected that to include only execution paths starting from 
__do_softirq().

Elad
--

From: Elad Lahav
Date: Monday, September 15, 2008 - 7:16 am

Here's some data to support my claims.
The first experiment consists of sending UDP packets on a dummy network interface. No 
interrupts are generated, so there should be no soft IRQs. Nevertheless, /proc/stat shows 
that a considerable share of CPU time is taken by soft IRQs:

CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
   0    4.52    0.00   67.84    0.00    0.00   27.64    0.00    0.00    0.00
   0    4.00    0.00   70.00    0.00    0.00   26.00    0.00    0.00    0.00
   0    4.98    0.00   68.16    0.00    0.00   26.87    0.00    0.00    0.00
   0    4.02    0.00   69.85    0.00    0.00   26.13    0.00    0.00    0.00

In a second experiment, UDP packets are sent over a real NIC by a process pinned to CPU 0, 
while the respective network interrupts are pinned to CPU 2. Here, you can see that CPU 0 
is executing soft IRQs, despite the interrupt affinity:

CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
   0    4.02    0.00   63.82    0.00    0.00   32.16    0.00    0.00    0.00
   2    0.00    0.00    0.00    0.00    6.47   40.30    0.00    0.00   53.23
   0    2.48    0.00   67.33    0.00    0.00   30.20    0.00    0.00    0.00
   2    0.00    0.00    0.00    0.00    6.47   41.79    0.00    0.00   51.74

I have verified, from /proc/interrupts, that in both cases the number of interrupts per 
second on CPU 0 is negligible.

Next, I modified the kernel code to raise a per-CPU flag at the beginning of 
__do_softirq(), and clear it at the end. Using this flag, account_system_time() can 
differentiate between a "true" soft IRQ, and code running under local_bh_disable(). The 
results of the first experiment (dummy NIC) are as follows:

CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  %guest %bh_dis   %idle
   0    4.50    0.00   71.00    0.00    0.00    0.00    0.00    0.00   24.50    0.00
   0    4.00    0.00   67.00    0.00    0.00    0.00    0.00    0.00   29.00    0.00
   0    3.98    0.00   69.15    ...
Previous thread: [PATCH] x86: check dsdt before find oem table for es7000 by Yinghai Lu on Thursday, September 11, 2008 - 6:25 pm. (1 message)

Next thread: [stable] regression in iptables: recent filter by Grant Coady on Thursday, September 11, 2008 - 6:53 pm. (7 messages)