Re: SDHCI: timeout during data transfer

Previous thread: [PATCH] Create PNP device attributes via dev_attrs field of struct device by Drew Moseley on Tuesday, September 23, 2008 - 2:24 pm. (8 messages)

Next thread: [ANNOUNCE] kmemtrace-user repo update by Eduard - Gabriel Munteanu on Tuesday, September 23, 2008 - 2:29 pm. (1 message)
From: Luca Tettamanti
Date: Tuesday, September 23, 2008 - 2:24 pm

Hi,
I'm seeing timeout errors when transfering "big" (over 1MB or so) files to a SD
card (small files are ok):

[  251.956666] mmcblk0: error -110 transferring data
[  251.979810] end_request: I/O error, dev mmcblk0, sector 1572758
[  251.981477] Buffer I/O error on device mmcblk0p1, logical block 1572742
[  251.983198] lost page write due to I/O error on mmcblk0p1
[  251.983788] end_request: I/O error, dev mmcblk0, sector 1572759
[  251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572743
[  251.983788] lost page write due to I/O error on mmcblk0p1
[  251.983788] end_request: I/O error, dev mmcblk0, sector 1572760
[  251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572744
[  251.983788] lost page write due to I/O error on mmcblk0p1
[  251.983788] end_request: I/O error, dev mmcblk0, sector 1572761
[  251.983788] Buffer I/O error on device mmcblk0p1, logical block 1572745
[  251.983788] lost page write due to I/O error on mmcblk0p1
[  253.513336] mmcblk0: error -110 transferring data
(trashed FS here)

kernel in use is 2.6.27-rc6, and this is the controller:

09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)

Googling around I found a patch[1] that introduced a quirk to address an
off-by-one issue for the timeout value in certain controllers.

With this patch:

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index e3a8133..46899ee 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -575,7 +575,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data)
 	host->data = data;
 	host->data_early = 0;
 
-	count = sdhci_calc_timeout(host, data);
+	count = sdhci_calc_timeout(host, data) + 1;
 	writeb(count, ...
From: Pierre Ossman
Date: Thursday, October 2, 2008 - 1:17 am

On Tue, 23 Sep 2008 23:24:59 +0200

Looks like the card died on you there. Did you build the kernel
yourself? If so, it would be helpful if you could enable
CONFIG_MMC_DEBUG and give me a complete dmesg dump from when this

The patch doubles the timeout, so something is clearly broken if that
solves the issue. I'd like to identify if it's the driver, controller
or card first though.

Are you experiencing this with just this card, or with all?

Rgds
-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  rdesktop, core developer          http://www.rdesktop.org

  WARNING: This correspondence is being monitored by the
  Swedish government. Make sure your server uses encryption
  for SMTP traffic and consider using PGP for end-to-end
  encryption.
--

From: Luca Tettamanti
Date: Thursday, October 2, 2008 - 2:11 am

I was just shooting in the dark ;-) I found the patch floating around

Actually I have only that card... I'll try to borrow another one for
testing. Btw, that card works fine under windows (on the same machine)
and my N800 is happy with it too.

Luca
--

From: Luca Tettamanti
Date: Friday, October 3, 2008 - 4:27 am

Hum, cannot reproduce (but it was consistently failing when I tested
the patch... the only difference is a mkfs in between). I just got a
few retries:

[  187.158063] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000040
[  187.656190] mmc0: clock 0Hz busmode 1 powermode 1 cs 0 Vdd 21 width
0 timing 0
[  187.658366] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21
width 0 timing 0
[  187.660500] mmc0: clock 128906Hz busmode 1 powermode 2 cs 1 Vdd 21
width 0 timing 0
[  187.661634] mmc0: starting CMD0 arg 00000000 flags 000000c0
[  187.662265] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[  187.662283] mmc0: req done (CMD0): 0: 00000000 00000000 00000000 00000000
[  187.663461] mmc0: clock 128906Hz busmode 1 powermode 2 cs 0 Vdd 21
width 0 timing 0
[  187.664473] mmc0: starting CMD8 arg 000001aa flags 000002f5
[  187.665423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[  187.665439] mmc0: req done (CMD8): 0: 000001aa 00000000 00000000 00000000
[  187.665474] mmc0: starting CMD5 arg 00000000 flags 000002e1
[  187.666423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[  187.668435] mmc0: req failed (CMD5): -110, retrying...
[  187.669423] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[  187.671424] mmc0: req failed (CMD5): -110, retrying...
[  187.672422] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[  187.674423] mmc0: req failed (CMD5): -110, retrying...
[  187.675425] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00018000
[  187.677436] mmc0: req done (CMD5): -110: 00000000 00000000 00000000 00000000
[  187.677683] mmc0: starting CMD55 arg 00000000 flags 000000f5
[  187.678754] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[  187.678770] mmc0: req done (CMD55): 0: 00000120 00000000 00000000 00000000
[  187.678808] mmc0: starting CMD41 arg 00000000 flags 000000e1
[  187.679877] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
[  187.679893] mmc0: req done (CMD41): 0: 00ff8000 00000000 00000000 ...
From: Pierre Ossman
Date: Sunday, October 12, 2008 - 1:52 am

On Fri, 3 Oct 2008 13:27:52 +0200

A silly question, but did you also try disabling the debugging? Since
this is a timing issue, the debugging output could be just enough to

Maybe. I've never gotten a bad block so I have no experience in how it
behaves.

Rgds
-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  rdesktop, core developer          http://www.rdesktop.org

  WARNING: This correspondence is being monitored by the
  Swedish government. Make sure your server uses encryption
  for SMTP traffic and consider using PGP for end-to-end
  encryption.
--

From: Luca Tettamanti
Date: Tuesday, October 14, 2008 - 2:34 am

Ok, I managed to reproduce without the debugging, but this time it
took over a couple of GB before it started failing. Block numbers
differ from the previous failures (so maybe it's not a cluster of
broken blocks).
Will go back and test with debugging enabled.

Luca
--

From: Luca Tettamanti
Date: Tuesday, October 14, 2008 - 2:13 pm

I finally managed to capture a failure with debug enabled, I'm
attaching the log.
To recap:
- heavy write loads sometimes result in a timeout
- reads seem unaffected (I read the card multiple times without errors)
- doubling the timeout (as per my first patch) makes the timeout disappear
- windows doesn't suffer from the issue

I've tested two other cards that do not show the problem, but they are
not "high speed":

mmc0: new high speed SD card at address 0002 (troubles)
vs:
mmc0: new SD card at address e624 (OK)

Luca
--

From: Luca Tettamanti
Date: Tuesday, October 14, 2008 - 2:21 pm

...for real this time :-)

Luca
From: Pierre Ossman
Date: Saturday, October 18, 2008 - 1:46 pm

On Tue, 14 Oct 2008 23:13:39 +0200

It seems to be a transfer error at least as it is in a completely

Looking at the dump, the controller seems to be doing the correct
thing. The timeout mandated by the SD specification is 250 ms, and that
time has passed if you look at the timestamps. If you look at some
other writes further down, the card takes around 220 ms for the writes.
IOW, it would seem that this card is cutting it a bit close and

Since it is the card that's the problem here, not the controller, we'd
need to fix this in the core. Unfortunately the sdhci hardware is a bit
inflexible so any change I make will result in a doubling of the
timeout.

Could you try modifying line 283 of drivers/mmc/core/core.c where it

Windows doesn't do proper handling of timeouts. It just sets the

What's the vendor and model of the failing card?

-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  rdesktop, core developer          http://www.rdesktop.org

  WARNING: This correspondence is being monitored by the
  Swedish government. Make sure your server uses encryption
  for SMTP traffic and consider using PGP for end-to-end
  encryption.
--

From: Luca Tettamanti
Date: Monday, October 20, 2008 - 5:20 am

The card is from S3+, and it's branded "SD Excel" (el-cheapo 4GB card)

Luca
--

From: Pierre Ossman
Date: Sunday, October 26, 2008 - 4:35 am

On Mon, 20 Oct 2008 14:20:13 +0200

It might be. Your timeout will be in the neigbourhood of 500ms because
of the crappy resolution of the sdhci hardware.

Still, I'll queue up a patch for this...

Rgds
-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  rdesktop, core developer          http://www.rdesktop.org

  WARNING: This correspondence is being monitored by the
  Swedish government. Make sure your server uses encryption
  for SMTP traffic and consider using PGP for end-to-end
  encryption.
--

Previous thread: [PATCH] Create PNP device attributes via dev_attrs field of struct device by Drew Moseley on Tuesday, September 23, 2008 - 2:24 pm. (8 messages)

Next thread: [ANNOUNCE] kmemtrace-user repo update by Eduard - Gabriel Munteanu on Tuesday, September 23, 2008 - 2:29 pm. (1 message)