It's been more than a week since -rc5, but I blame everybody (including
me) being away for Linux.conf.au and then me waiting for a few days
afterwards to let everybody sync up.
So there it is, -rc6, hopefully the last -rc of the series.
I'd like everybody to take a really good look at any regressions that
Adrian has been pointing out, and that very much includes the people who
reported them too, so that we can confirm whether they are still active
and relevant.
As to -rc6 itself: the bulk of it are the MTD updates (including a few new
drivers), and the POWER update (and the bulk of _that_ in terms of patch
size being defconfig updates ;)
But there's various random fixes in infiniband, DVB, network drivers,
scsi, usb, some filesystems (cifs, jffs2, nfs, ntfs, ocfs2) as well as
core networking too.
Oh, and KVM, of course.
And stuff I probably have already forgotten.
ShortLog appended.
Linus
---
Adrian Bunk (7):
[MTD] SSFDC must depend on BLOCK
[MTD] [NAND] rtc_from4.c: use lib/bitrev.c
[MTD] make drivers/mtd/cmdlinepart.c:mtdpart_setup() static
[SCSI] qla2xxx: make qla2x00_reg_remote_port() static
more ftape removal
[IRDA] vlsi_ir.{h,c}: remove kernel 2.4 code
[NET]: Process include/linux/if_{addr,link}.h with unifdef
Adrian Friedli (1):
HID: GEYSER4_ISO needs quirk
Adrian Hunter (2):
[MTD] OneNAND: Implement read-while-load
[MTD] OneNAND: Handle DDP chip boundary during read-while-load
Akinobu Mita (2):
[JFFS2] Use rb_first() and rb_last() cleanup
[SCSI] iscsi: fix crypto_alloc_hash() error check
Al Viro (5):
funsoft: ktermios fix
horizon.c: missing __devinit
s2io bogus memset
fix prototype of csum_ipv6_magic() (ia64)
s2io bogus memset
Alan Cox (1):
[MTD] MAPS: esb2rom: use hotplug safe interfaces
Alexey Dobriyan (2):
[MTD] JEDEC probe: fix comment typo (devic)
[MIPS] There is no __GNUC_MAJOR__
Amit Choudha...ata_piix survives exactly one suspend resume cylce. After resuming the second time the disk is not longer usable. After the first resume a simple "emacs -nw bla.txt" takes already ~45sec to launch, but there are no kernel messages. During the second resume the ATA interrupt gets disabled due to an unhandled interrupt. This is 100% reproducible. So I can provide as much info as needed. tglx Boot: SCSI subsystem initialized libata version 2.00 loaded. ata_piix 0000:00:1f.2: version 2.00ac7 ata_piix 0000:00:1f.2: MAP [ P0 P2 XX XX ] ata_piix 0000:00:1f.2: invalid MAP value 0 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 22 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x18D0 ctl 0x18C6 bmdma 0x18B0 irq 21 ata2: SATA max UDMA/133 cmd 0x18C8 ctl 0x18C2 bmdma 0x18B8 irq 21 scsi0 : ata_piix PM: Adding info for No Bus:host0 ata1.00: ATA-7, max UDMA/133, 195371568 sectors: LBA48 NCQ (depth 0/32) ata1.00: ata1: dev 0 multi count 16 ata1.00: configured for UDMA/133 scsi1 : ata_piix PM: Adding info for No Bus:host1 scsi 0:0:0:0: Direct-Access ATA ST9100824AS 3.14 PQ: 0 ANSI: 5 PM: Adding info for scsi:0:0:0:0 SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 < sda5 > sda3 sd 0:0:0:0: Attached scsi disk sda 1st Suspend: ata_piix 0000:00:1f.2: suspend ACPI: PCI interrupt for device 0000:00:1f.2 disabled PIIX_IDE 0000:00:1f.1: suspend .... PIIX_IDE 0000:00:1f.1: LATE suspend 1st Resume: ata1.00: configured for UDMA/133 SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB) sda: Write Protect is off sda: Mod...
Is this a regression, or behavior that's always been present? If its a regression, what changeset caused the problem? Jeff -
Hey. I just discovered that crap. I'm going to bisect tomorrow. Bed time here in good old Europe. :) tglx -
It seems to be there in 2.6.18 already, although it takes more suspend/resume cycles to show up. So it's just the surfacing of some longer standing problem. Just went unnoticed. tglx -
Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd (sky2: power management/MSI workaround) makes the problem go away. With the commit it breaks sky2 resume on my laptop: 1. request_irq in early resume is triggering: BUG: sleeping function called from invalid context at /home/tglx/work/kernel/vanilla/linux-2.6/mm/slab.c:3034 This is easy resolvable by moving the request_irq into the normal resume path. There is no need to have this in early resume. 2. The network device is unusable after resume. The only way to resurect it is: rmmod/insmod. The reason is, that the driver grabs the normal PCI irq on resume, but the pci express resume routes it away. All we get is an unhandled spurious interrupt on the irq line which was used by the net device before suspend: irq 219, desc: c045bb80, depth: 0, count: 9607, unhandled: 0 ->handle_irq(): c0155c20, handle_bad_irq+0x0/0x1f0 ->chip(): c0418920, no_irq_chip+0x0/0x40 ->action(): 00000000 IRQ_DISABLED set unexpected IRQ trap at vector db tglx -
Does this fix it?
---
drivers/net/sky2.c | 43 ++++++++++++++++++-------------------------
1 file changed, 18 insertions(+), 25 deletions(-)
--- sky2-2.6.orig/drivers/net/sky2.c 2007-01-29 10:05:12.000000000 -0800
+++ sky2-2.6/drivers/net/sky2.c 2007-01-29 10:29:56.000000000 -0800
@@ -3675,6 +3675,12 @@
sky2_write32(hw, B0_IMSK, 0);
sky2_power_aux(hw);
+ /* Turn off IRQ to avoid power management bug (see resume) */
+ if (hw->msi) {
+ free_irq(pdev->irq, hw);
+ pci_disable_msi(pdev);
+ }
+
pci_save_state(pdev);
pci_enable_wake(pdev, pci_choose_state(pdev, state), wol);
pci_set_power_state(pdev, pci_choose_state(pdev, state));
@@ -3700,6 +3706,18 @@
sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
+ /* Can't re-enable MSI because kernel resume ordering is broken
+ * and calls device resume before ACPI (BIOS) is called.
+ * BIOS then resets device to INTx!
+ */
+ if (hw->msi) {
+ err = request_irq(pdev->irq, sky2_intr, IRQF_SHARED,
+ hw->dev[0]->name, hw);
+ if (err)
+ goto out;
+ hw->msi = 0;
+ }
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (netif_running(dev)) {
@@ -3721,29 +3739,6 @@
pci_disable_device(pdev);
return err;
}
-
-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif
static void sky2_shutdown(struct pci_dev *pdev)
@@ -3783,8 +3778,6 @@
#ifdef CONFIG_PM
.suspend = sky2_suspend,
.resume = sky2...patching file drivers/net/sky2.c Hunk #1 FAILED at 3675. Hunk #2 succeeded at 3625 (offset -81 lines). Hunk #3 succeeded at 3738 with fuzz 1 (offset -1 lines). Hunk #4 succeeded at 3668 with fuzz 2 (offset -110 lines). 1 out of 4 hunks FAILED -- saving rejects to file drivers/net/sky2.c.rej # grep -c sky2_power_aux drivers/net/sky2.c 0 Shrug. tglx -
On Mon, 29 Jan 2007 21:10:30 +0100
Sorry it was against the last patch I sent to Jeff for netdev.
Here is against 2.6.20-rc6
---
drivers/net/sky2.c | 43 ++++++++++++++++++-------------------------
1 files changed, 18 insertions(+), 25 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index a2e804d..d85de63 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -3598,6 +3598,12 @@ static int sky2_suspend(struct pci_dev *
}
}
+ /* Turn off IRQ to avoid power management bug (see resume) */
+ if (hw->msi) {
+ free_irq(pdev->irq, hw);
+ pci_disable_msi(pdev);
+ }
+
sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
sky2_set_power_state(hw, pstate);
@@ -3619,6 +3625,18 @@ static int sky2_resume(struct pci_dev *p
sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
+ /* Can't re-enable MSI because kernel resume ordering is broken
+ * and calls device resume before ACPI (BIOS) is called.
+ * BIOS then resets device to INTx!
+ */
+ if (hw->msi) {
+ err = request_irq(pdev->irq, sky2_intr, IRQF_SHARED,
+ hw->dev[0]->name, hw);
+ if (err)
+ goto out;
+ hw->msi = 0;
+ }
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (netif_running(dev)) {
@@ -3639,29 +3657,6 @@ static int sky2_resume(struct pci_dev *p
out:
return err;
}
-
-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif
static struct pci_driver...Still the same problem. The only difference of this patch to the previous version is, that the unhandled interrupt message is gone. As I said before: Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added this hackery in the first place, makes the device survive suspend/resume. tglx -
I see the same symptoms on my Intel Mac Mini, and reverting the commit also allows the driver to seemingly resume correctly. However after coming out of sleep I need to reconfigure the network interface. No need to rmmod/insmod, just ifdown/ifup is sufficient (but of course shouldn't be necessary, should it?). If I don't reconfigure it, ping from/to the box will work, but nothing more complicated like ssh will go through. Fred. -
That's probably a userspace problem. Are you using DHCP ? tglx -
Yep DHCP. Is that a known issue? I never had to reconfigure with older kernels. Fred. -
Is dhclient running after resume ? What's the output of ifconfig (before you do ifdown/up) ? Have you checked the syslog ? tglx -
The process is of course in the process list, if that's what you mean by
The output is always the same modulo the transmitted packet numbers:
eth0 Link encap:Ethernet HWaddr 00:16:CB:A2:E4:43
inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::216:cbff:fea2:e443/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:269 errors:0 dropped:0 overruns:0 frame:0
TX packets:57 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:72528 (70.8 KiB) TX bytes:7900 (7.7 KiB)
Interrupt:17
Yes of course. Nothing interesting.
Fred.
-Just got the same issue on one of my test boxen. Different network card though. The interface comes up fine, but DNS is not working. ifdown/up resolves it. /me keeps an eye on that. tglx -
The Sony VAIO BIOS resets to INTx on resume. This happens
after device resume, so device irq's get misrouted.
This hack turns off MSI on this laptop, until power management
initialization order is fixed.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
---
drivers/pci/quirks.c | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ef882a8..9a64179 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -21,6 +21,7 @@ #include <linux/pci.h>
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/acpi.h>
+#include <linux/dmi.h>
#include "pci.h"
/* The Mellanox Tavor device gives false positive parity errors
@@ -1779,6 +1780,37 @@ static void __devinit quirk_nvidia_ck804
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_CK804_PCIE,
quirk_nvidia_ck804_msi_ht_cap);
+
+/* On Sony VAIO laptop, BIOS resets MSI during resume. */
+static __initdata struct dmi_system_id sony_dmi_table[] = {
+ {
+ .ident = "Sony Vaio",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "PCG-"),
+ },
+ },
+ {
+ .ident = "Sony Vaio",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "VGN-"),
+ },
+ },
+ { }
+};
+
+static void __init quirk_sony_msi(struct pci_dev *dev)
+{
+ if (!dmi_check_system(sony_dmi_table))
+ return;
+
+ pci_msi_quirk = 1;
+ printk(KERN_WARNING "PCI: MSI sony quirk detected. pci_msi_quirk set.\n");
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801BA_6,
+ quirk_sony_msi);
+
#endif /* CONFIG_PCI_MSI */
EXPORT_SYMBOL(pcie_mch_quirk);
--
1.4.1
-Err? My Sony VAIO does _NOT_ do that. It works fine without that. It's just the sky2 hackery which fucked up things. tglx -
Still it stands: Your sky2 patch #44ade178249fe53d055fd92113eaa271e06acddd is broken. Just get it. tglx -
On Tue, 30 Jan 2007 01:22:54 +0100 What machine and BIOS version? -- Stephen Hemminger <shemminger@linux-foundation.org> -
VGN-SZ2XP_C BIOS: R0081N0 tglx -
On Tue, 30 Jan 2007 01:31:33 +0100 Mine is: VGN-N170G BIOS: R0020J4 It might be BIOS bug that has been fixed, but updating the BIOS requires Windows. It checks for some ID so even Wine won't work. -- Stephen Hemminger <shemminger@linux-foundation.org> -
I suspect some BIOSes do *not* screw up the MSI thing on resume, and others do. I would suggest that the real fix is to not do that kind of hackery at suspend/resume time (because we can't know what the heck the BIOS does), and instead just do one of two cases: - since MSI is known to be broken for the sky2 driver due to firmware bugs, just disable it by default if CONFIG_PM is enabled. The advantages of MSI just aren't all that compelling. Possibly add a command line option to force MSI to be enabled regardless. Simple, direct, and should work for everybody. - Just add a command line to disable MSI for people that it breaks for. I don't actually like this one. It defaults to the unsafe behaviour, and while that makes sense in a "well, your machine is broken anyway" kind of way, the thing is, the advantages of MSI just aren't big enough to warrant defaulting to a known-unsafe thing, even if only a small percentage of machines are affected. With _eventually_ maybe having a third possible situation: - some way of figuring it out dynamically. The third case doesn't seem to be very likely in the short term, though, which is why I'd suggest one of the first two (the first one being probably the best one). Comments? Linus -
commmit 44ade178249fe53d055fd92113eaa271e06acddd breaks sane
MSI/ACPI/BIOS combinations. It's impossible to keep broken and sane
MSI/ACPI/BIOSes happy at the same time.
Revert the patch and disable MSI for sky2 when CONFIG_PM is enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index a2e804d..420fef7 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -91,7 +91,11 @@ static int copybreak __read_mostly = 128;
module_param(copybreak, int, 0);
MODULE_PARM_DESC(copybreak, "Receive copy threshold");
+#ifdef CONFIG_PM
+static int disable_msi = 1;
+#else
static int disable_msi = 0;
+#endif
module_param(disable_msi, int, 0);
MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
@@ -3601,6 +3605,7 @@ static int sky2_suspend(struct pci_dev *pdev, pm_message_t state)
sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
sky2_set_power_state(hw, pstate);
+
return 0;
}
@@ -3640,28 +3645,6 @@ out:
return err;
}
-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif
static struct pci_driver sky2_driver = {
@@ -3672,8 +3655,6 @@ static struct pci_driver sky2_driver = {
#ifdef CONFIG_PM
.suspend = sky2_suspend,
.resume = sky2_resume,
- .suspend_late = sky2_suspend_late,
- .resume_early = sky2_resume_early,
#endif
};
-On Mon, 29 Jan 2007 14:37:23 -0800 (PST) MSI works fine for almost all systems (except AMD systems where Module option out already exists. -
Why do you ignore reality? MSI does *not* work fine, exactly because the firmware screws it up. The fact that on a "hardware level" it may work is totally irrelevant. The *only* thing that matters is what people actually see. "Positivism" may not be a hot philosophy these days any more, but dang, it certainly is better than what you seem to espouse: "in theory things work fine". And if you don't like positivism, how about just simple scientific method: a theory is *proven*wrong* by a single observation to the opposite. And we have several people standing up saying that your theory is wrong. Linus -
On Mon, 29 Jan 2007 15:04:06 -0800 (PST) Why do you insist on maintaining the wrong initialization order on resume? When I raised the issue, Len brought up that the resume order did not match spec, but then there has been slow progress in fixing it (it's buried in -mm tree). -- Stephen Hemminger <shemminger@linux-foundation.org> -
It's not getting merged, SINCE IT DOESN'T WORK. It causes all sorts of problems, because ACPI requires all kinds of things to be up and running in order to actually work, and that in turn breaks all the devices that have different ordering constraints. ACPI is a piece of sh*t. It asks the OS to do impossible things, like running it early in the config sequence when it then at the same time wants to depend on stuff that are there *late* in the sequence. It's not the first time this insane situation has happened, either. But we'll try to merge the patch that totally switches around the whole initialization order hopefully early after 2.6.20. But no way in hell do we do it now, and I personally suspect we'll end reverting it when we do try it just because it will probably break other things. But we'll see. In the meantime, sky2 doesn't work with MSI. Linus -
And it will not be the last:-)
There are really two cases, one is easy, one hard:
1. The ACPI spec and our knowledge of how the HW and talking to our own BIOS
folks tells us quite a bit about how things are supposed to work.
2. "Windows Bug Compatibility" (tm)
When OEMs build systems and test them only with Windows, then
the implementation quirks of Windows get ingrained in the platforms.
Linux then tries to run on the same platform and wonders why
the BIOS does "unusual" things. The answer is because it has been
only tested on Windows and BIOS quirks slip through Windows testing.
To be fair, the exact same thing would happen in reverse to Windows
if vendors only tested with Linux.
http://www.linuxfirmwarekit.org/ is intended to help mitigate some of this
problem. So at least vendors that care about Linux can make sure that
they minimize the curve balls they throw us.
An example of a recent curve ball is when the BIOS supplies two APIC (MADT)
tables. Well, the spec says there should be only one... We have proof
that Windows doesn't use the 1st for enumerating processors because
Windows works on a box with a garbled 1st table.
If we prove that Windows doesn't use the second either then it means
they enumerate processors via the DSDT -- which means bringing up
the ACPI interpreter before bringing up SMP -- and that would require
I agree with this plan, and I concur with your outlook.
I think Rafel is holding the ball here as we wait for an SMP-safe freezer:
http://lists.osdl.org/pipermail/linux-pm/2006-December/004233.html
cheers,
-Len
-Well, as we can do cpu hotplug these days... we could do this. Just boot up with single cpu, then bring up additional cpus at runtime... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Hi, Well, no longer. :-) The freezer in 2.6.20-rc6 should be SMP-safe and the patches to change the suspend-resume code ordering are in -mm: pm-change-code-ordering-in-mainc.patch swsusp-change-code-ordering-in-diskc.patch swsusp-change-code-order-in-diskc-fix.patch swsusp-change-code-ordering-in-userc.patch swsusp-change-code-ordering-in-userc-sanity.patch swsusp-change-pm_ops-handling-by-userland-interface.patch I have no problems whatsoever with these patches on SMP boxes and if anyone has, please let me know. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King -
Hi. I've been running an SMP box here with the matching changes for Suspend2, with no problems. I believe the algorithm looks good. Regards, Nigel -
On Mon, 29 Jan 2007 16:12:27 -0800 (PST) On one and only one platform. It works fine on others. Don't blame the driver, stop it in PCI. -- Stephen Hemminger <shemminger@linux-foundation.org> -
How sure are you that it's only those Sony laptops? Linus -
i'm wondering, could we go with Thomas' temporary patch that disables sky2 MSI if CONFIG_PM is enabled - we could revert that after 2.6.20. It's not like MSI is a life and death feature. On IO-APIC systems vectors are abundant and in any case we share irqs just fine. The true advantage of MSI is minimal. (MSI-X has the potential to be better by being message based, but in reality it still goes through the full IRQ layer.) MSI might be useful on really, really large systems - but i really hope those really large systems dont rely on CONFIG_PM. Meanwhile Thomas' patch maximizes the amount of working hardware (it has the chance to produce working systems in 100% of the cases) - which is a few orders of magnitude more important than IRQ management micro-costs. Am i missing anything? Ingo -
Sharing irqs /sucks/. I routinely have to fight a USB device dying, because the ATA device is causing an interrupt storm, or vice versa. /Very/ common headache. Other than that, they use a tiny bit fewer CPU cycles, and allow simplification of the interrupt handler (saving another few CPU cycles). The biggest benefit is (a) for hardware designers, where MSI means a cleaner h/w design, and (b) preparation of drivers and the kernel systems for MSI-only hardware. At present only high end hardware is MSI-only (like infiniband), but that's the future direction. Jeff -
Hi,
TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
see below) always returns error.
---------------- code begin -----------------------------
retries = 0;
do {
memset((void *)cmd, 0, MAX_COMMAND_SIZE);
cmd[0] = TEST_UNIT_READY;
the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
0, &sshdr, SR_TIMEOUT,
MAX_RETRIES);
retries++;
} while (retries < 5 &&
(!scsi_status_is_good(the_result) ||
(scsi_sense_valid(&sshdr) &&
sshdr.sense_key == UNIT_ATTENTION)));
---------------- code end -----------------------------
I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
and retries==5; on silicon image 3132, i get the_result=0x2eb.
Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
what's wrong?
Conke
-What does the sense data returned in the sense buffer say is wrong? Jeff -
I dump scsi_sense_hdr as follows: sshdr.response_code = 0x70 sshdr.sense_key = 0x2 sshdr.asc = 0x3a sshdr.ascq = 0x1 sshdr.additional_length = 0x0 the sense_key is 0x2 (NOT_READY), but the expected UNIT_ATTENTION :( BTW, I am sorry for a mistake, Sil3132 also returns 0x8000002, not 0x2eb as I said in the first mail. In a word, all cases return "the_result" as 0x8000002. Conke -
the bytes 0 ~ 13 in sense buffer are: 70 00 02 00 00 00 00 0a 00 00 00 00 3a other bytes are all 0x00; in fact this issue can be reproduced in any libata driver, either sata or pata. Conke -
[resend] any suggestion ? -
btw., MSI is not really needed to avoid the sharing of irqs: x86 has 224 IRQ vectors which is abundant for all but the largest boxes. Even the smallest laptop tends to have an IO-APIC with at least 24 pins - which is enough to never have to share irqs. How system designers can still end up with mapping so many devices to the same pin is really their fault. so MSI's only true accomplishment AFAICS is that it now says on the hardware level that "you must not share IRQs". Well, doh... Ingo -
Yeah. Admittedly, ATA is very special because it is still edge-triggered most of the time (for legacy reasons): 14: 389907 0 IO-APIC-edge ide0 so if it shares an irq with a device that has level-triggered assumptions, those two dont intermix very well. That's why i have the delayed-disable patches (see the two patches below), which will unify the two methods, and the irq flow handling method will be mostly a 'performance hint' not a correctness issue. This has been in -rt for quite a few weeks now and it works well. btw., it would be great if you could help us here: could you perhaps, from a past example, outline a specific case of such an ATA/USB IRQ storm and how it occured (precisely) - and what the fix was? I'd like to analyze a specific case to make sure the genirq layer recovers from such cases more gracefully. In general, i think the IRQ subsystem needs to become more failure-resilient and needs to become more auto-learning (and these two dont stand in the way of good performance). This problem of shared IRQs will be with us for at least another 10 years, if not more. (for example ISA is /still/ not dead everywhere and it was already legacy technology 15 years ago when Linux was started.) Ingo -------------------> Subject: irq: do not mask interrupts by default From: Ingo Molnar <mingo@elte.hu> never mask interrupts immediately upon request. Disabling interrupts in high-performance codepaths is rare, and on the other hand this change could recover lost edges (or even other types of lost interrupts) by conservatively only masking interrupts after they happen. (NOTE: with this change the highlevel irq-disable code still soft-disables this IRQ line - and if such an interrupt happens then the IRQ flow handler keeps the IRQ masked.) mark i8529A controllers as 'never loses an edge'. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/i386/kernel/i8259.c | 1 + arch/x86_64/kernel/i8259.c | 1 + ...
Easy to name an example, as they are pretty generic. When sharing irqs -- usually ATA is configured to PCI native (IO-APIC-fasteoi) -- any interrupt storm causes the other devices sharing that irq to crap themselves (kernel turns off irq, suggests irqpoll, etc.) ATA is unfortunately easier to cause interrupt storms than most because the standard PCI IDE definition has __no__ possible way to indicate certain interrupt conditions are pending. You have to /know/ that you are expecting an interrupt, which causes problems if the hardware decides to send the interrupt early or late, rather than when its expected. Most modern hardware has a read/write/clear interrupt status register that gives you an immediate summary of the pending interrupt conditions, and an easy way to ack the pending events. ATA does not have any such capability. That said, stuff like AHCI or sata_sil or sata_sil24 do have modern designs with the expected interrupt status register(s), so they do not suffer from the problems suffered by the more legacy-like hardware (ata_piix, sata_via, pata_*) Jeff -
ok. Can you suggest any way for me to reproduce such a bug artificially on a test system? [i have both old and new systems, so if you can think of a way for me to trigger this i'd be happy to try] I /think/ my two patches should automatically avoid the 'cap themselves' effect you outlined: the absolutely worst case should be that we'll have twice the IRQ rate of the optimal one - but no irq storm nor lost interrupts should happen due to irq trigger type mismatches, ever - as long as the basic mapping of device to IRQ is correct. [ I tried to push to include this in v2.6.20 but i lost that argument ;-) ] Ingo -
Should be pretty easy. With either the old-IDE driver or libata, complete a command without acknowledging an interrupt. For libata, that means poking around in ata_host_intr() and avoiding well-built hardware like AHCI. Anything that uses ata_piix driver, basically all Intel machines, should be applicable in the "not well built" category... :) Jeff -
ok, here's one victi^H^H^H^H testbox that seems to match your description: 18: 3 0 IO-APIC-fasteoi uhci_hcd:usb3, ohci1394 19: 2413090 0 IO-APIC-fasteoi uhci_hcd:usb2, libata 22: 168 0 IO-APIC-fasteoi HDA Intel 23: 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5 so i should try to generate some missing ACK [this meaning a missing driver-level ack, right?] on IRQ#19's libata handler - and i should expect a screaming interrupt? Or non-working USB? Or both? [ i can hunt for other hardware if this doesnt look broken enough to you :-) ] Ingo -
Yep, that's a good candidate for such experiments :) Jeff -
Happens to be the same thing, which causes a stale interrupt on the second suspend/resume cycle. tglx -
-
On Mon, 29 Jan 2007 16:25:48 -0800 (PST) I do not underestimate the ability of BIOS writers to screw things up. -- Stephen Hemminger <shemminger@linux-foundation.org> -
On Mon, 29 Jan 2007 23:23:21 +0100 But the fix is necessary on laptops where ACPI messes with MSI/INTx on resume. -- Stephen Hemminger <shemminger@linux-foundation.org> -
And the fix is unnecessary and counter productive on laptops, where ACPI does the right thing. tglx -
2.6.20-rc6-git (today) on a dual core laptop:
PM: Preparing system for mem sleep
Disabling non-boot CPUs ...
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc6 #3
-------------------------------------------------------
pm-suspend/3601 is trying to acquire lock:
(cpu_bitmask_lock){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f
but task is already holding lock:
(workqueue_mutex){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (workqueue_mutex){--..}:
[<c0140880>] __lock_acquire+0x8dd/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<c0136d14>] __create_workqueue+0x61/0x136
[<f8bfe62e>] cpufreq_governor_dbs+0xa1/0x30e [cpufreq_ondemand]
[<c02b2c3c>] __cpufreq_governor+0x9e/0xd2
[<c02b2df7>] __cpufreq_set_policy+0x187/0x209
[<c02b3056>] store_scaling_governor+0x164/0x1b1
[<c02b24f9>] store+0x37/0x48
[<c01aeb8d>] sysfs_write_file+0xb3/0xdb
[<c0175e0f>] vfs_write+0xaf/0x163
[<c017645d>] sys_write+0x3d/0x61
[<c0103f8c>] sysenter_past_esp+0x5d/0x99
[<ffffffff>] 0xffffffff
-> #2 (dbs_mutex){--..}:
[<c0140880>] __lock_acquire+0x8dd/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<f8bfe612>] cpufreq_governor_dbs+0x85/0x30e [cpufreq_ondemand]
[<c02b2c3c>] __cpufreq_governor+0x9e/0xd2
[<c02b2df7>] __cpufreq_set_policy+0x187/0x209
[<c02b3056>] store_scaling_governor+0x164/0x1b1
[<c02b24f9>] store+0x37/0x48
[<c01aeb8d>] sy...This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19 with patches available. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : MIPS Malta: CONFIG_MTD=n compile error References : http://lkml.org/lkml/2007/1/25/122 Submitter : Jan Altenberg <jan@linutronix.de> Caused-By : Ralf Baechle <ralf@linux-mips.org> commit b228f4c54df37b53c6f364aa7f3efa4280bcc4f0 Handled-By : Jan Altenberg <jan@linutronix.de> Patch : http://lkml.org/lkml/2007/1/25/122 Status : patch available Subject : NFS triggers WARN_ON() in invalidate_inode_pages2_range() References : http://bugzilla.kernel.org/show_bug.cgi?id=7826 Submitter : Andrew Clayton <andrew@digital-domain.net> Caused-By : Andrew Morton <akpm@osdl.org> commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140 Handled-By : Trond Myklebust <trond.myklebust@fys.uio.no> Patch : http://lkml.org/lkml/2007/1/24/323 Status : patch available -
This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : problems with CD burning References : http://www.spinics.net/lists/linux-ide/msg06545.html Submitter : Uwe Bugla <uwe.bugla@gmx.de> Status : unknown Subject : pktcdvd fails with pata_amd References : http://bugzilla.kernel.org/show_bug.cgi?id=7810 http://lkml.org/lkml/2007/1/25/128 Submitter : Gerhard Dirschl <gd@spherenet.de> Caused-By : Christoph Hellwig <hch@lst.de> commit 3b00315799d78f76531b71435fbc2643cd71ae4c commit 406c9b605cbc45151c03ac9a3f95e9acf050808c Status : problem is being debugged Subject : powerpc64: performance monitor exception References : http://ozlabs.org/pipermail/linuxppc-dev/2007-January/030045.html Submitter : Livio Soares <livio@eecg.toronto.edu> Caused-By : Paul Mackerras <paulus@samba.org> commit d04c56f73c30a5e593202ecfcf25ed43d42363a2 Status : problem is being discussed Subject : BUG: at fs/inotify.c:172 set_dentry_child_flags() References : http://bugzilla.kernel.org/show_bug.cgi?id=7785 Submitter : Cijoml Cijomlovic Cijomlov <cijoml@volny.cz> Handled-By : Nick Piggin <nickpiggin@yahoo.com.au> Status : problem is being debugged -
-------- Original-Nachricht -------- Datum: Sat, 27 Jan 2007 18:42:30 +0100 Von: Adrian Bunk <bunk@stusta.de> An: Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@osdl.org> Hi everybody, the problem I already reported for earlier release candidates of kernel 2.6.20 (rc1 – 5) unfortunately stills persists. The regression has become more extreme: While in earlier release candidates nerolinux recognized my burning devices at least after the first start and then never again after all following starts the situation in rc6 is different from that: The CD and DVD burning devices aren´t recognized even once and the drive seek errors I already reported are still there. nerolinux runs excellently with kernel 2.6.19.2, but only shows an “image recorder” (i. e. no burning device at all) in kernel 2.6.20-rc6. Still hope that this terrible bug will not be part of the final version of 2.6.20! Regards Uwe P. S.: I already reported that 2.6.20-rc4-mm1 is not bootable at all. -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer -
FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is
run, the kernel never gives up retrying whatever command failed, so I
get...
[ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
[ 4362.981475] ide: failed opcode was: unknown
[ 4362.986183] hdd: drive not ready for command
endlessly.
-Mike
-On Mon, 29 Jan 2007 07:26:03 +0100 Do you have time to bisect it? -
Unfortunately, I'm git impaired. I am rummaging as we speak though. -Mike -
Ok, I'm personally heading to bed, but it rally should be as simple as
- get the git tree in the first place
- do
git bisect good v2.6.19
git bisect bad v2.6.20-rc2
.. it will pick a point for you to try ..
.. compile, boot, test ..
"git bisect {good|bad}" depending on results
- until (found)
(Of course, you should check that -rc2 really is bad to make sure. I think
that's what Uwe reported, though. And I don't think we've done anything
after -rc2 that could impact this, so I don't doubt it).
Linus
-This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : NULL pointer dereference at as_move_to_dispatch() References : http://lkml.org/lkml/2007/1/22/141 Submitter : Andrew Vasquez <andrew.vasquez@qlogic.com> Status : unknown Subject : reboot instead of powerdown (CONFIG_USB_SUSPEND) References : http://lkml.org/lkml/2006/12/25/40 http://bugzilla.kernel.org/show_bug.cgi?id=7828 Submitter : Berthold Cogel <cogel@rrz.uni-koeln.de> François Valenduc <francois.valenduc@skynet.be> Handled-By : Alan Stern <stern@rowland.harvard.edu> Status : problem is being debugged Subject : usb somehow broken (CONFIG_USB_SUSPEND) References : http://lkml.org/lkml/2007/1/11/146 Submitter : Prakash Punnoor <prakash@punnoor.de> Handled-By : Oliver Neukum <oliver@neukum.org> Alan Stern <stern@rowland.harvard.edu> Status : problem is being debugged Subject : fix geode_configure() References : http://lkml.org/lkml/2007/1/9/216 Submitter : Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Caused-By : takada <takada@mbf.nifty.com> commit e4f0ae0ea63caceff37a13f281a72652b7ea71ba Handled-By : takada <takada@mbf.nifty.com> Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Status : patches are being discussed -
It was a cool booting, have really enjoyed this. I have one question which is open (seems ignored or missed by u guys). migration_cost=33 for 2.6.20-rc5 migration_cost=159 for 2.6.20-rc6 ~Akula2 -
Hi,
It doesn't build for me.
make O=/dir
[..]
security/built-in.o: In function `security_set_bools':
(.text+0x12471): undefined reference to `flow_cache_genid'
security/built-in.o: In function `security_load_policy':
(.text+0x128b3): undefined reference to `flow_cache_genid'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [_all] Error 2
334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e is first bad commit
commit 334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e
Author: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Date: Mon Jan 15 16:38:45 2007 -0800
[SELINUX]: increment flow cache genid
Currently, old flow cache entries remain valid even after
a reload of SELinux policy.
This patch increments the flow cache generation id
on policy (re)loads so that flow cache entries are
revalidated as needed.
Thanks to Herbet Xu for pointing this out. See:
http://marc.theaimsgroup.com/?l=linux-netdev&m=116841378704536&w=2
There's also a general issue as well as a solution proposed
by David Miller for when flow_cache_genid wraps. I might be
submitting a separate patch for that later.
I request that this be applied to 2.6.20 since it's
a security relevant fix.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Regards,
Michal
--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
-