Hi, I'm seeing timeout errors when transfering "big" (over 1MB or so) files to a SD card (small files are ok): [ 251.956666] mmcblk0: error -110 transferring data [ 251.979810] end_request: I/O error, dev mmcblk0, sector 1572758 [ 251.981477] Buffer I/O error on device mmcblk0p1, logical block 1572742 [ 251.983198] lost page write due to I/O error on mmcblk0p1 [ 251.983788] end_request: I/O error, dev mmcblk0, sector 1572759 [ 251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572743 [ 251.983788] lost page write due to I/O error on mmcblk0p1 [ 251.983788] end_request: I/O error, dev mmcblk0, sector 1572760 [ 251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572744 [ 251.983788] lost page write due to I/O error on mmcblk0p1 [ 251.983788] end_request: I/O error, dev mmcblk0, sector 1572761 [ 251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572745 [ 251.983788] lost page write due to I/O error on mmcblk0p1 [ 253.513336] mmcblk0: error -110 transferring data (trashed FS here) kernel in use is 2.6.27-rc6, and this is the controller: 09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22) 09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12) 09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12) 09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12) Googling around I found a patch[1] that introduced a quirk to address an off-by-one issue for the timeout value in certain controllers. With this patch: diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index e3a8133..46899ee 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -575,7 +575,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data) host->data = data; host->data_early = 0; - count = sdhci_calc_timeout(host, data); + count = sdhci_calc_timeout(host, data) + 1; writeb(count, ...
On Tue, 23 Sep 2008 23:24:59 +0200
Looks like the card died on you there. Did you build the kernel
yourself? If so, it would be helpful if you could enable
CONFIG_MMC_DEBUG and give me a complete dmesg dump from when this
The patch doubles the timeout, so something is clearly broken if that
solves the issue. I'd like to identify if it's the driver, controller
or card first though.
Are you experiencing this with just this card, or with all?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
--
I was just shooting in the dark ;-) I found the patch floating around Actually I have only that card... I'll try to borrow another one for testing. Btw, that card works fine under windows (on the same machine) and my N800 is happy with it too. Luca --
Hum, cannot reproduce (but it was consistently failing when I tested the patch... the only difference is a mkfs in between). I just got a few retries: [ 187.158063] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000040 [ 187.656190] mmc0: clock 0Hz busmode 1 powermode 1 cs 0 Vdd 21 width 0 timing 0 [ 187.658366] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21 width 0 timing 0 [ 187.660500] mmc0: clock 128906Hz busmode 1 powermode 2 cs 1 Vdd 21 width 0 timing 0 [ 187.661634] mmc0: starting CMD0 arg 00000000 flags 000000c0 [ 187.662265] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001 [ 187.662283] mmc0: req done (CMD0): 0: 00000000 00000000 00000000 00000000 [ 187.663461] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21 width 0 timing 0 [ 187.664473] mmc0: starting CMD8 arg 000001aa flags 000002f5 [ 187.665423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001 [ 187.665439] mmc0: req done (CMD8): 0: 000001aa 00000000 00000000 00000000 [ 187.665474] mmc0: starting CMD5 arg 00000000 flags 000002e1 [ 187.666423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000 [ 187.668435] mmc0: req failed (CMD5): -110, retrying... [ 187.669423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000 [ 187.671424] mmc0: req failed (CMD5): -110, retrying... [ 187.672422] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000 [ 187.674423] mmc0: req failed (CMD5): -110, retrying... [ 187.675425] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000 [ 187.677436] mmc0: req done (CMD5): -110: 00000000 00000000 00000000 00000000 [ 187.677683] mmc0: starting CMD55 arg 00000000 flags 000000f5 [ 187.678754] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001 [ 187.678770] mmc0: req done (CMD55): 0: 00000120 00000000 00000000 00000000 [ 187.678808] mmc0: starting CMD41 arg 00000000 flags 000000e1 [ 187.679877] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001 [ 187.679893] mmc0: req done (CMD41): 0: 00ff8000 00000000 00000000 ...
On Fri, 3 Oct 2008 13:27:52 +0200
A silly question, but did you also try disabling the debugging? Since
this is a timing issue, the debugging output could be just enough to
Maybe. I've never gotten a bad block so I have no experience in how it
behaves.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
--
Ok, I managed to reproduce without the debugging, but this time it took over a couple of GB before it started failing. Block numbers differ from the previous failures (so maybe it's not a cluster of broken blocks). Will go back and test with debugging enabled. Luca --
I finally managed to capture a failure with debug enabled, I'm attaching the log. To recap: - heavy write loads sometimes result in a timeout - reads seem unaffected (I read the card multiple times without errors) - doubling the timeout (as per my first patch) makes the timeout disappear - windows doesn't suffer from the issue I've tested two other cards that do not show the problem, but they are not "high speed": mmc0: new high speed SD card at address 0002 (troubles) vs: mmc0: new SD card at address e624 (OK) Luca --
On Tue, 14 Oct 2008 23:13:39 +0200
It seems to be a transfer error at least as it is in a completely
Looking at the dump, the controller seems to be doing the correct
thing. The timeout mandated by the SD specification is 250 ms, and that
time has passed if you look at the timestamps. If you look at some
other writes further down, the card takes around 220 ms for the writes.
IOW, it would seem that this card is cutting it a bit close and
Since it is the card that's the problem here, not the controller, we'd
need to fix this in the core. Unfortunately the sdhci hardware is a bit
inflexible so any change I make will result in a doubling of the
timeout.
Could you try modifying line 283 of drivers/mmc/core/core.c where it
Windows doesn't do proper handling of timeouts. It just sets the
What's the vendor and model of the failing card?
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
--
The card is from S3+, and it's branded "SD Excel" (el-cheapo 4GB card) Luca --
On Mon, 20 Oct 2008 14:20:13 +0200
It might be. Your timeout will be in the neigbourhood of 500ms because
of the crappy resolution of the sdhci hardware.
Still, I'll queue up a patch for this...
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
--
