Hi, This change causes attached drives to no longer be detected and function on the PowerPC Buffalo Linkstation machines: <7>pata_sil680 0000:00:0c.0: version 0.4.8 <6>sil680: 133MHz clock. <6>scsi0 : pata_sil680 <6>scsi1 : pata_sil680 <6>ata1: PATA max UDMA/133 irq 18 <6>ata2: PATA max UDMA/133 irq 18 If I roll back this driver to before this change with 2.6.24.2 it works: <7>pata_sil680 0000:00:0c.0: version 0.4.7 <6>sil680: 133MHz clock. <6>scsi0 : pata_sil680 <6>scsi1 : pata_sil680 <6>ata1: PATA max UDMA/133 cmd 0xbffed0 ctl 0xbffed8 bmdma 0xbffef0 irq 18 <6>ata2: PATA max UDMA/133 cmd 0xbffee0 ctl 0xbffee8 bmdma 0xbffef8 irq 18 <6>ata1.00: ATA-6: WDC WD3200JB-00KFA0, 08.05J08, max UDMA/100 <6>ata1.00: 625142448 sectors, multi 0: LBA48 <6>ata1.00: configured for UDMA/100 <5>scsi 0:0:0:0: Direct-Access ATA WDC WD3200JB-00K 08.0 PQ: 0 ANSI: 5 <5>sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) <5>sd 0:0:0:0: [sda] Write Protect is off <7>sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 <5>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA <5>sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) <5>sd 0:0:0:0: [sda] Write Protect is off <7>sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 <5>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA <6> sda: sda1 sda2 sda3 sda4 <5>sd 0:0:0:0: [sda] Attached SCSI disk <5>sd 0:0:0:0: Attached scsi generic sg0 type 0 I have ensured the other sil680 driver is not enabled! --
Hrm... and we need those patches for blades that have a sil680 controller with no working PIO on the PCI host... That's strange though. Somebody with knowledge of that HW (or specs) who can spot something ? Could it be an issue with timing ? I don't have HW access to this machine. If somebody could send one to me I could do more investigation. --
Ben, would an ssh access to such a machine and to a terminal server suffice? Thanks Guennadi --- Guennadi Liakhovetski --
If I can remote-reboot it, yes. Cheers, Ben. --
It says clearly in the code where to start. See the FIXME notes in both libata-sff and libata-core about MMIO. Neither the DMA transfer start or the probe SRST sequence are correct with MMIO posting and this hasn't been fixed as I pointed out was needed when I originally NAKked the change. Without those being fixed (especially SRST) on any device with heavy PCI posting of mmio your controller *wont work*. Alan --
The dbdma start is mostly harmless (things don't get posted for -that- long), though I suppose it's worth fixing. Would reading back dmactl do in that case or do you foresee any kind of side effect ? (Maybe only doing it for MMIO ?) As for SRST, I'm not totally confident how safe it is to read back there while doing the reset sequence, so I'm tempted to really only do it for MMIO and use altstat rather than ctl/stat (the later tends to have side effects which we don't want here). What do you think ? The main problem from here is that I don't know whether we are using MMIO or PIO from libata-core. Maybe I can add a host flag indicate that such flushing is needed ? In the meantime, Guennadi, can you check if that patch helps for you (to see if that is indeed the problem): diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 004dae4..1451a52 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3461,10 +3461,13 @@ static int ata_bus_softreset(struct ata_port *ap, unsigned int devmask, /* software reset. causes dev0 to be selected */ iowrite8(ap->ctl, ioaddr->ctl_addr); + ioread16(ioaddr->nsect_addr); udelay(20); /* FIXME: flush */ iowrite8(ap->ctl | ATA_SRST, ioaddr->ctl_addr); + ioread16(ioaddr->nsect_addr); udelay(20); /* FIXME: flush */ iowrite8(ap->ctl, ioaddr->ctl_addr); + ioread16(ioaddr->nsect_addr); /* wait a while before checking status */ ata_wait_after_reset(ap, deadline); diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 60cd4b1..81d5828 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -273,6 +273,7 @@ void ata_bmdma_start(struct ata_queued_cmd *qc) * FIXME: The posting of this write means I/O starts are * unneccessarily delayed for MMIO */ + ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_CMD); } /** Cheers, Ben. --
Unfortunately this patch appears to give same result as in the original post. Guennadi and I are looking into arranging access to a device. Thanks! <7>pata_sil680 0000:00:0c.0: version 0.4.8 <6>sil680: 133MHz clock. <6>scsi0 : pata_sil680 <6>scsi1 : pata_sil680 <6>ata1: PATA max UDMA/133 irq 18 <6>ata2: PATA max UDMA/133 irq 18 Tim --
Yes. Alan loves to complain about lack of MMIO flush, but in practice this is rarely the source of problems such as the one you describe. But if its broken its broken, and we need to revert. Any luck getting benh access to the device? Jeff --
Especially since reverting it will break a whole bunch of cell blades, which wouldn't be nice (those cannot do PIO and have that controller). In the meantime, we might "workaround" with a hack to only enable MMIO on those cell blades, something like: #ifdef CONFIG_PPC64 if (machine_is(cell)) mmio = 1; #endif That might get us out of the regression until we find the proper solution ? Cheers, Ben. --
pata_sil680 is broken on Linkstation amd Kurobox HG machines since 2.6.24.
Work around the breakage until a real fix is found.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
---
How about the one below? I'd really like to get it in for 2.6.25 to avoid
a second broken stable kernel on these machines.
Ben, please verify that your cell machines still work.
Thanks
Guennadi
diff --git a/drivers/ata/pata_sil680.c b/drivers/ata/pata_sil680.c
index 503245a..75179ff 100644
--- a/drivers/ata/pata_sil680.c
+++ b/drivers/ata/pata_sil680.c
@@ -335,9 +335,12 @@ static int __devinit sil680_init_one(struct pci_dev *pdev,
};
const struct ata_port_info *ppi[] = { &info, NULL };
static int printed_version;
+#ifdef CONFIG_PPC64
struct ata_host *host;
void __iomem *mmio_base;
- int rc, try_mmio;
+ int rc;
+#endif
+ int try_mmio;
if (!printed_version++)
dev_printk(KERN_DEBUG, &pdev->dev, "version " DRV_VERSION "\n");
@@ -350,7 +353,8 @@ static int __devinit sil680_init_one(struct pci_dev *pdev,
return -ENODEV;
}
- if (!try_mmio)
+#ifdef CONFIG_PPC64
+ if (!try_mmio || !machine_is(cell))
goto use_ioports;
/* Try to acquire MMIO resources and fallback to PIO if
@@ -396,6 +400,7 @@ static int __devinit sil680_init_one(struct pci_dev *pdev,
&sil680_sht);
use_ioports:
+#endif
return ata_pci_init_one(pdev, ppi);
}
--On Wed, 26 Mar 2008 00:31:19 +0100 (CET) Just disable the mmio patch on all systems - we know it doesn't work, we know what work needs to be done, it should remain off for everyone until that work is done. It should never have been merged in the first place and I think my statement to that effect has been proven nicely. Alan --
Agreed. We did some patches to work on the possible write posting issues, and they didn't fix the problem, though they still should go in ultimately (after more testing, I need to dig them back out of my pile). So I think the MMIO is on standby now until we really corner what's going on. I hope to be able to work a bit on Guennadi's setup one of these days though I can't promise, I may instead get that SiL controller out of a cell blade and stick it in every HW I have around see if I can reproduce the problem. I think I have a couple different revisions even. We need to keep it enabled for the QS20 cell blades as it's the only way those will boot (and they happen to apparently not suffer from any adverse effect, we've tortured them pretty deeply, could be luck due to how the bridge works on them, or due to other workarounds we do for horrid breakage in the PCI host bridge that end up making everything else work too) but everybody else can stick with PIO. Jeff, can you ask Linus to pull my patch (the one Alan just acked) that does that in .25 ? Cheers, Ben. --
We're working on it... We've got devices, but they have to be recovered with jtag first, which requires some soldering... I was told this should happen end of this / beginning of the next week. Thanks Guennadi --- Guennadi Liakhovetski --
The dmactl read back should be just fine, or any other DMA register (eg Agreed - we know some controllers crap themselves spectacularly on anything which causes a SATA data transfer to be needed during a reset so the status is probably safest. The fact its not fixed is because nobody Easier to add that to the ioxxxx ops I suspect (ioispio/ioismmio say) ? Alan --
Maybe, though that will involve fixing all the arch versions which do different things. In fact, I'm not even sure I can tell you 100% after the fact on ppc64, I have to double check. I'd rather stick a flag in there to be safe, also since altstatus isn't always there (which is why I used nsect in the test patch I sent to Guennadi). I'm pretty sure I can rely on all MMIO controllers having an altstatus but I'd rather still make that explicit with a host flag to avoid unintended consequences to others. Ben. --
Did anyone fix all the mmio posting bugs in libata-core that were pointed out when I originally NAKked using MMIO, or did they just add the driver. If the latter then you need to grep the various fix this notes in libata-core around reset/probe of an SFF controller in particular. --
| Sam Ravnborg | Are Section mismatches out of control? |
| Karl Meyer | PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out" |
| Bart Van Assche | Re: Is gcc thread-unsafe? |
| Adrian Bunk | Re: [Bug #10493] mips BCM47XX compile error |
git: | |
| Junio C Hamano | Re: [RFC/PATCH] git-branch: default to --track |
| Linus Torvalds | cleaner/better zlib sources? |
| Peter Stahlir | Git as a filesystem |
| Yossi Leybovich | corrupt object on git-gc |
| Manuel Wildauer | Re: Editing C with... |
| Mark Thomas | [i386/Thinkpad T41]USB mouse + Xorg obsd 4.1 |
| Stijn | Re: libiconv problem |
| Daniel Ouellet | Re: Router performance on OpenBSD and OpenBGPD |
| Felix Radensky | RE: e1000e "Detected Tx Unit Hang" |
| Johann Baudy | Packet mmap: TX RING and zero copy |
| David Miller | Re: 2.6.26/tg3 ping roundtrip times > 2000 ms on local network |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
