Hello, all. This patch started off from the following thread. http://thread.gmane.org/gmane.linux.ide/16899 The problem is that a PCI device can be in any arbitrary when it gets enabled and the device has to be enabled for its driver to initialize/reset it. The most common case this causes headache is as follows. Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. When its driver enables the device, the pending IRQ hits INTX. The IRQ line has been enabled by the other driver sharing the IRQ but IRQ handler for this device hasn't been registered yet. So, screaming interrupts. IRQ subsystem shuts up the IRQ line in an attempt to save the machine from complete lockup and both devices end up dead. There seem to be other scenarios that this can be triggered as, reportedly, similar behavior is also observed when IRQ line is not shared. I suspect similar thing can and is more likely to happen while resuming as the device can really be in any random state. The dilemma here is that 1. the device needs to be enabled to be initialized/reset 2. enabling the device may cause all hell to break loose. Note that bus mastering has similar risks and we are currently dealing with that by doing pci_set_master() from each driver after certain initialization steps are taken. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, IO, MMIO but prohibits the device from bus mastering or raising IRQ by adjusting respective PCI control bits prior to enabling the device. The second step 'activate' allows the device to bus master and raise IRQs. Typical initialization would look like the following. static int __devinit my_init_one(blah blah) { ... rc = ...
In several cases this problem has indeed come up and been fixed by adding a PCI quirk routine to turn off the device's pending IRQ. Alan Stern -
On Thu, 15 Mar 2007 00:23:02 +0900 The problem is the BIOS is busted on these machines. How much effort do we want to put into dealing with systems with broken BIOS? I would rather have the root cause fixed than creating a bandaid that has to be maintained for all the other architectures and platforms. What if the device with the IRQ problem is never loaded? Sometimes devices aren't loaded until after boot. If you use MSI interrupts, they aren't shared so there isn't a problem. Maybe the root cause of this is bad MSI emulation handling in BIOS. Any change like this has to be done without changing device drivers. Changing the skge/sky2 drivers as special case is not acceptable. -- Stephen Hemminger <shemminger@linux-foundation.org> -
For sky2/skge, it might be caused by broken BIOS. For some ATA devices, it's just the hardware which is designed that way. Also, under non-x86 machines and during resume, there's no BIOS to nudge chips into sane state. This is an existing problem which has to be solved. How much effort we are gonna put into it is certainly debatable. Also, the current implementation doesn't have any arch independent part. It's wholly contained in arch independent PCI layer, but it might be beneficial to have arch dependent hooks (IRQ line enable/disable?) in What do you mean by loading a device? Do you mean loading driver for the device? The patch as posted is probably not a complete solution. We probably need to make sure during early boot and resume that all IRQ / bus master are turned off where possible and let low level drivers enable them as needed and after certain amount of initialization is I dunno about that. What I'm proposing is alternative two-step PCI initialization step - the first step enables the device just enough for initialization/reset and the second one enables full access. We're doing part of it already for bus master. I'm proposing to expand that approach and make them handled by generic PCI layer. As you can see, it doesn't add noticeable complexity to drivers. I think it's even clearer than doing pci_set_master() explicitly. If this way of solving the problem is chosen, eventually most drivers should be converted to new initialization steps. And there is no way to do this without modifying low level driver. Only low level driver knows when full blown access can be enabled and such thing must happen before registering the device to upper layer (e.g. ATA/SCSI, netif). sky2/skge aren't exceptions. If this way of solving the problem is chosen, eventually most if not all drivers should be converted to new model. It may take two years, maybe five, but as a start just converting ATA and network drivers shouldn't take too long ...
On Thu, Mar 15, 2007 at 11:37:20AM +0900, Tejun Heo wrote: I don't like the idead of changing the driver API for PCI device setup. Agreed. ISTR this has been discussed before but don't recall the exact context. I'll try to find the previous thread. When I started the parisc port on 2.4 kernels, the policy was to leave all interrupts enabled even if no interrupt handler was registered. This is useful for debugging misconfigured IRQ routing. Did the policy already change or is this a proposal to change the policy? thanks, -
All this discussion is well and good, but I suspect there is a driver setup problem where the interrupt isn't being handled properly. Please retest with the latest version of skge driver (I just pushed patches to netdev about 2min ago). One patch changes to disable IRQ's from device for packets until device is brought up. -
This seems to be also common after kexec during kexec crashdumps I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? -Andi -
Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Thanks. -- tejun -
Yes. -Andi -
That's true. It happens very frequently in kdump case where underlying device can very well have pending interrupts while second kernel is booting. Currently we allow the kernel to disable that irq line and we boot the kernel with "irqpoll" so that device still operates in polling mode. But getting this fixed will help. If device interrupts are enabled only after driver has had a chance to reset the device, things will be better, at least from kdump perspective. Thanks Vivek -
pci_enable_device() doesn't deal with this; in most PCI setups I've seen, there is no control at PCI level over whether a device generates an interrupt on the bus. Certainly the memory and io command enables have no effect on the ability of the device to cause an interrupt. In most cases, only the driver of the device knows how to disable interrupts on any particular device. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: -
PCI grew an interrupt enable while you weren't looking: PCI_COMMAND_INTX_DISABLE No idea about ARM, but almost all PCI devices made in the past few years support that bit. Unless you are using a PCI Express device (maybe PCI-X too?), though, you cannot count on the bit's presence. It was added in PCI 2.3 I think. Older PCI devices certainly do not have this standardized bit. Jeff -
That's fine for devices which conform to the later PCI specs, but not No PCI device that I have has that bit - including the raid card I bought last year... In any case, relying on such a new control bit to implement this kind of functionality would result in a very hit and miss result; Linux tends to get used on things other than the bleeding edge of hardware technology. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: -
How does Tejun's patch or thesis rely on this new control bit? He explicitly mentions DISABLE_INTX variability... Jeff -
[cc'ing Andi, Hi!] Hello, Many recent ATA and network controllers do and most new ones will I don't think INTX_DISABLE is on the bleeding edge of hardware technology and many common cases will benefit from using it (just think about the number of newish notebook users). The problem with INTX_DISABLE is that there doesn't seem to be any way to tell whether writing to that bit is safe or not. You are right in that turning off IRQ mechanisms in pci_enable_device() doesn't fix all the problems as PCI-wise it only enables IO and memory address space access, but to some extent it does because in the arch code, it enables the IRQ line and the physical IRQ line might not be shared even if the final IRQ number is shared (Andi, am I correct)? Anyways, I think the proper solution is to make sure all generic IRQ controls including INTX turned off early in the boot during PCI subsystem initialization (ie. do the disable part of pcim_prepare_device() early in the boot before any IRQ line is requested) and let each driver enable after initialization as necessary and do similar things during resume. Note that drivers still need to be modified to signify when the device is initialized enough to enable IRQ, and bus mastering. We can also arch-dep IRQ enabling to the activation time. That will give us more protection even when INTX_DISABLE is not available. Thanks. -- tejun -
