Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

Previous thread: [PATCH 00/22 take 3] UBI: Unsorted Block Images by Artem Bityutskiy on Wednesday, March 14, 2007 - 8:19 am. (88 messages)

Next thread: [IDE] au1xxx: Fix use of mixed declarations and code by Ralf Baechle on Wednesday, March 14, 2007 - 8:26 am. (1 message)
From: Tejun Heo
Date: Wednesday, March 14, 2007 - 8:23 am

Hello, all.

This patch started off from the following thread.

  http://thread.gmane.org/gmane.linux.ide/16899

The problem is that a PCI device can be in any arbitrary when it gets
enabled and the device has to be enabled for its driver to
initialize/reset it.  The most common case this causes headache is as
follows.

Let's assume there's a device which shares its INTX IRQ line with
another device and the other one is already initialized.  During boot,
due to BIOS's fault, bad hardware design or sheer bad luck, the device
has got a pending IRQ.  When its driver enables the device, the
pending IRQ hits INTX.  The IRQ line has been enabled by the other
driver sharing the IRQ but IRQ handler for this device hasn't been
registered yet.  So, screaming interrupts.  IRQ subsystem shuts up the
IRQ line in an attempt to save the machine from complete lockup and
both devices end up dead.

There seem to be other scenarios that this can be triggered as,
reportedly, similar behavior is also observed when IRQ line is not
shared.  I suspect similar thing can and is more likely to happen
while resuming as the device can really be in any random state.

The dilemma here is that 1. the device needs to be enabled to be
initialized/reset 2. enabling the device may cause all hell to break
loose.  Note that bus mastering has similar risks and we are currently
dealing with that by doing pci_set_master() from each driver after
certain initialization steps are taken.

This patch expands the pci_set_master() approach.  Instead of enabling
the device in one go, it's done in two steps - prepare and activate.
'prepare' enables access to PCI configuration, IO, MMIO but prohibits
the device from bus mastering or raising IRQ by adjusting respective
PCI control bits prior to enabling the device.  The second step
'activate' allows the device to bus master and raise IRQs.  Typical
initialization would look like the following.

  static int __devinit my_init_one(blah blah)
  {
	...

	rc = ...
From: Alan Stern
Date: Wednesday, March 14, 2007 - 10:04 am

In several cases this problem has indeed come up and been fixed by adding
a PCI quirk routine to turn off the device's pending IRQ.

Alan Stern

-

From: Stephen Hemminger
Date: Wednesday, March 14, 2007 - 10:07 am

On Thu, 15 Mar 2007 00:23:02 +0900

The problem is the BIOS is busted on these machines. How much effort
do we want to put into dealing with systems with broken BIOS?
I would rather have the root cause fixed than creating a bandaid that
has to be maintained for all the other architectures and platforms.

What if the device with the IRQ problem is never loaded? Sometimes
devices aren't loaded until after boot.

If you use MSI interrupts, they aren't shared so there isn't a problem.
Maybe the root cause of this is bad MSI emulation handling in BIOS.

Any change like this has to be done without changing device drivers.
Changing the skge/sky2 drivers as special case is not acceptable.


-- 
Stephen Hemminger <shemminger@linux-foundation.org>
-

From: Tejun Heo
Date: Wednesday, March 14, 2007 - 7:37 pm

For sky2/skge, it might be caused by broken BIOS.  For some ATA devices, 
it's just the hardware which is designed that way.  Also, under non-x86 
machines and during resume, there's no BIOS to nudge chips into sane 
state.  This is an existing problem which has to be solved.  How much 
effort we are gonna put into it is certainly debatable.

Also, the current implementation doesn't have any arch independent part. 
  It's wholly contained in arch independent PCI layer, but it might be 
beneficial to have arch dependent hooks (IRQ line enable/disable?) in 

What do you mean by loading a device?  Do you mean loading driver for 
the device?  The patch as posted is probably not a complete solution. 
We probably need to make sure during early boot and resume that all IRQ 
/ bus master are turned off where possible and let low level drivers 
enable them as needed and after certain amount of initialization is 


I dunno about that.  What I'm proposing is alternative two-step PCI 
initialization step - the first step enables the device just enough for 
initialization/reset and the second one enables full access.  We're 
doing part of it already for bus master.  I'm proposing to expand that 
approach and make them handled by generic PCI layer.  As you can see, it 
doesn't add noticeable complexity to drivers.  I think it's even clearer 
than doing pci_set_master() explicitly.

If this way of solving the problem is chosen, eventually most drivers 
should be converted to new initialization steps.  And there is no way to 
do this without modifying low level driver.  Only low level driver knows 
when full blown access can be enabled and such thing must happen before 
registering the device to upper layer (e.g. ATA/SCSI, netif).

sky2/skge aren't exceptions.  If this way of solving the problem is 
chosen, eventually most if not all drivers should be converted to new 
model.  It may take two years, maybe five, but as a start just 
converting ATA and network drivers shouldn't take too long ...
From: Grant Grundler
Date: Wednesday, March 14, 2007 - 11:45 pm

On Thu, Mar 15, 2007 at 11:37:20AM +0900, Tejun Heo wrote:



I don't like the idead of changing the driver API for PCI device setup.


Agreed. ISTR this has been discussed before but don't recall
the exact context. I'll try to find the previous thread.

When I started the parisc port on 2.4 kernels, the policy was to
leave all interrupts enabled even if no interrupt handler was registered.
This is useful for debugging misconfigured IRQ routing.
Did the policy already change or is this a proposal to change the policy?

thanks,
-

From: Stephen Hemminger
Date: Friday, March 16, 2007 - 2:21 pm

All this discussion is well and good, but I suspect there is a driver setup
problem where the interrupt isn't being handled properly. Please retest with
the latest version of skge driver (I just pushed patches to netdev about 2min ago).
One patch changes to disable IRQ's from device for packets until device
is brought up.

-

From: Andi Kleen
Date: Wednesday, March 14, 2007 - 2:46 pm

This seems to be also common after kexec during kexec crashdumps

I hope there aren't any new erratas triggered by this. Perhaps it would
make sense to add some paranoia sleeps at least before touching other
state? 

-Andi
-

From: Tejun Heo
Date: Wednesday, March 14, 2007 - 7:39 pm

Do you mean between disabling IRQ mechanisms and enabling PCI device in 
pcim_prepare_device()?

Thanks.

-- 
tejun
-

From: Vivek Goyal
Date: Thursday, March 15, 2007 - 4:41 am

That's true. It happens very frequently in kdump case where underlying device
can very well have pending interrupts while second kernel is booting.
Currently we allow the kernel to disable that irq line and we boot the kernel
with "irqpoll" so that device still operates in polling mode.

But getting this fixed will help. If device interrupts are enabled only after
driver has had a chance to reset the device, things will be better, at least
from kdump perspective.

Thanks
Vivek
-

From: Russell King
Date: Wednesday, March 14, 2007 - 2:56 pm

pci_enable_device() doesn't deal with this; in most PCI setups I've
seen, there is no control at PCI level over whether a device generates
an interrupt on the bus.  Certainly the memory and io command enables
have no effect on the ability of the device to cause an interrupt.

In most cases, only the driver of the device knows how to disable
interrupts on any particular device.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-

From: Jeff Garzik
Date: Wednesday, March 14, 2007 - 3:34 pm

PCI grew an interrupt enable while you weren't looking: 
PCI_COMMAND_INTX_DISABLE

No idea about ARM, but almost all PCI devices made in the past few years 
support that bit.

Unless you are using a PCI Express device (maybe PCI-X too?), though, 
you cannot count on the bit's presence.  It was added in PCI 2.3 I 
think.  Older PCI devices certainly do not have this standardized bit.

	Jeff


-

From: Russell King
Date: Wednesday, March 14, 2007 - 3:58 pm

That's fine for devices which conform to the later PCI specs, but not


No PCI device that I have has that bit - including the raid card I
bought last year...

In any case, relying on such a new control bit to implement this kind
of functionality would result in a very hit and miss result; Linux
tends to get used on things other than the bleeding edge of hardware
technology.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-

From: Jeff Garzik
Date: Wednesday, March 14, 2007 - 4:16 pm

How does Tejun's patch or thesis rely on this new control bit?  He 
explicitly mentions DISABLE_INTX variability...

	Jeff


-

From: Tejun Heo
Date: Wednesday, March 14, 2007 - 10:47 pm

[cc'ing Andi, Hi!]

Hello,


Many recent ATA and network controllers do and most new ones will

I don't think INTX_DISABLE is on the bleeding edge of hardware
technology and many common cases will benefit from using it (just think
about the number of newish notebook users).  The problem with
INTX_DISABLE is that there doesn't seem to be any way to tell whether
writing to that bit is safe or not.

You are right in that turning off IRQ mechanisms in pci_enable_device()
doesn't fix all the problems as PCI-wise it only enables IO and memory
address space access, but to some extent it does because in the arch
code, it enables the IRQ line and the physical IRQ line might not be
shared even if the final IRQ number is shared (Andi, am I correct)?

Anyways, I think the proper solution is to make sure all generic IRQ
controls including INTX turned off early in the boot during PCI
subsystem initialization (ie. do the disable part of
pcim_prepare_device() early in the boot before any IRQ line is
requested) and let each driver enable after initialization as necessary
and do similar things during resume.  Note that drivers still need to be
modified to signify when the device is initialized enough to enable IRQ,
and bus mastering.

We can also arch-dep IRQ enabling to the activation time.  That will
give us more protection even when INTX_DISABLE is not available.

Thanks.

-- 
tejun
-

Previous thread: [PATCH 00/22 take 3] UBI: Unsorted Block Images by Artem Bityutskiy on Wednesday, March 14, 2007 - 8:19 am. (88 messages)

Next thread: [IDE] au1xxx: Fix use of mixed declarations and code by Ralf Baechle on Wednesday, March 14, 2007 - 8:26 am. (1 message)