Hi,
While using the preemptirqsoff ftrace tracer I noticed that
everytime handle_edge_irq is called it needs to mask and unmask MSI, and
that leads to a series of very expensive calls to pci_find_capability,
as can be seen here, with preemption disabled:
<idle>-0 [03] 422.558652: unmask_msi_irq <-handle_edge_irq
<idle>-0 [03] 422.558653: msi_set_mask_bits <-unmask_msi_irq
<idle>-0 [03] 422.558653: msi_set_enable <-msi_set_mask_bits
<idle>-0 [03] 422.558654: pci_find_capability <-msi_set_enable
<idle>-0 [03] 422.558655: __pci_bus_find_cap_start <-pci_find_capability
<idle>-0 [03] 422.558655: pci_bus_read_config_word <-__pci_bus_find_cap_start
<idle>-0 [03] 422.558656: _spin_lock_irqsave <-pci_bus_read_config_word
<idle>-0 [03] 422.558656: add_preempt_count <-_spin_lock_irqsave
<idle>-0 [03] 422.558657: pci_read <-pci_bus_read_config_word
<idle>-0 [03] 422.558657: raw_pci_read <-pci_read
<idle>-0 [03] 422.558658: pci_conf1_read <-raw_pci_read
<idle>-0 [03] 422.558658: _spin_lock_irqsave <-pci_conf1_read
<idle>-0 [03] 422.558659: add_preempt_count <-_spin_lock_irqsave
BZZT! 37us
<idle>-0 [03] 422.558696: _spin_unlock_irqrestore <-pci_conf1_read
<idle>-0 [03] 422.558697: sub_preempt_count <-_spin_unlock_irqrestore
<idle>-0 [03] 422.558697: preempt_schedule <-_spin_unlock_irqrestore
<idle>-0 [03] 422.558698: _spin_unlock_irqrestore <-pci_bus_read_config_word
<idle>-0 [03] 422.558698: sub_preempt_count <-_spin_unlock_irqrestore
<idle>-0 [03] 422.558699: preempt_schedule <-_spin_unlock_irqrestore
<idle>-0 [03] 422.558699: __pci_find_next_cap <-pci_find_capability
<idle>-0 [03] 422.558700: __pci_find_next_cap_ttl <-__pci_find_next_cap
...Ouch, left this in, do you want another patch or can you just remove this bit? Thanks, - Arnaldo --
As I told you on IRC, this is just the MSI code being complete crap. It should be caching the offset itself. We shouldn't have this extra array in the struct pci_dev just because MSI is broken. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Well, we can certainly do that, its just that I did this first and thought that perhaps there could be some other users, but I see that 44 extra bytes per pci_dev can be a pain if the only one to reap benefits is MSI, can't you think of any other users? I couldn't detect any so far in my admitedly limited testing. - Arnaldo --
There are a few other common cap checks, but I don't think they compare to MSI in terms of latency sensitivity (though I didn't audit all the CAP_ID_EXP checks, there are quite a few of those). Since we know MSI is a problem, let's just go with fixing that for now. If we find that other caps are also causing problems we can revisit caching all of them; the patch is simple enough. Thanks, Jesse --
Do you want me to submit another patch or can you cook up one? - Arnaldo --
I can hack it up, probably not until Monday though (I'll be out of town this weekend). Jesse --
OK, for now this patch should be in the next rt patchset, we'll drop it when you get The Right Thing merged 8) - Arnaldo --
Humm...
The position of MSI capability is already cached in the msi_desc when
we enter the msi_set_mask_bits(). Use it instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
drivers/pci/msi.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 8c61304..ccb1974 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -70,12 +70,10 @@ arch_teardown_msi_irqs(struct pci_dev *dev)
}
}
-static void msi_set_enable(struct pci_dev *dev, int enable)
+static void __msi_set_enable(struct pci_dev *dev, int pos, int enable)
{
- int pos;
u16 control;
- pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
if (pos) {
pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
control &= ~PCI_MSI_FLAGS_ENABLE;
@@ -85,6 +83,11 @@ static void msi_set_enable(struct pci_dev *dev, int enable)
}
}
+static void msi_set_enable(struct pci_dev *dev, int enable)
+{
+ __msi_set_enable(dev, pci_find_capability(dev, PCI_CAP_ID_MSI), enable);
+}
+
static void msix_set_enable(struct pci_dev *dev, int enable)
{
int pos;
@@ -141,7 +144,8 @@ static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
mask_bits |= flag & mask;
pci_write_config_dword(entry->dev, pos, mask_bits);
} else {
- msi_set_enable(entry->dev, !flag);
+ __msi_set_enable(entry->dev, entry->msi_attrib.pos,
+ !flag);
}
break;
case PCI_CAP_ID_MSIX:
--
1.5.4.3
--
Yeah, this looks really nice. It should also fix Arnaldo's latency problem, and really looks like a bug fix for the MSI code more than anything. Arnaldo, can you take a look & test and ack/nack? Thanks, Jesse --
Tested, works as expected, thanks. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> - Arnaldo --
Great, thanks a lot for testing (and getting us on track to fix this in the first place!). Jesse --
