Re: strange problem with Intenso 4GB SDHC card

Previous thread: Max seep of the SD slot? by Adam Talbot on Wednesday, July 23, 2008 - 12:24 pm. (19 messages)

Next thread: Problem upgrading ncurses by Dale Schumacher on Wednesday, July 23, 2008 - 4:43 pm. (11 messages)
From: David Meder-Marouelli
Date: Wednesday, July 23, 2008 - 1:08 pm

Hi,

I observed an interesting phenomenon with my newly bought Intenso 4GB
SDHC card.

While the 512MB card shipped with my Freerunner runs reliably and stable
the new card shows the following behaviour:

1) Errors during boot process:
============================
root@om-gta02:~# dmesg |grep -E "glamo|mmc"
glamo3362 glamo3362.0: Detected Glamo core 3650 Revision 0002
(49119232Hz CPU / 81887232Hz Memory)
glamo3362 glamo3362.0: Glamo core now 49119232Hz CPU / 81887232Hz Memory)
glamo-spi-gpio glamo-spi-gpio.0: registering c0373838: jbt6k74
glamo-mci glamo-mci.0: glamo_mci driver (C)2007 Openmoko, Inc
glamo-mci glamo-mci.0: probe: mapped mci_base:c8864400 irq:0.
glamo-mci glamo-mci.0: glamo_mci_set_ios: power down.
glamo-mci glamo-mci.0: initialisation done.
mmc_set_power(power_mode=1, vdd=20
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 0kHz div=255 (req: 0kHz).
Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: Error after cmd: 0x8120
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: Error after cmd: 0x8120
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req:
16666kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 ...
From: Andy Green
Date: Wednesday, July 23, 2008 - 1:20 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:

| glamo-mci glamo-mci.0: Error after cmd: 0x120
| glamo-mci glamo-mci.0: Error after cmd: 0x8120
| glamo-mci glamo-mci.0: Error after cmd: 0x120
| glamo-mci glamo-mci.0: Error after cmd: 0x8120
| glamo-mci glamo-mci.0: Error after cmd: 0x120

This bit is normal, it's the Linux MMC / SD stack seeing if it is an
SDIO card.

| glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req:
| 16666kHz). Bus width=2

This bit is a good sign, it was able to complete getting recognized by
the stack and put into 4-bit mode at 16MHz SD_CLK.

| mmc0: new high speed SDHC card at address b368
| mmcblk0: mmc0:b368 SD    3931136KiB
|  mmcblk0:<6>glamo-mci glamo-mci.0: Error after cmd: 0x8310
| mmcblk0: error -110 sending read/write command

This first real error is ETIMEDOUT.  It can be a genuine timeout because
your card is a bit slow, but it could also mean communication problems.

| 2) I can trick it to work doing the following steps:
|    - check sd_drive parameter (not required)
|      >root@om-gta02:~# cat /sys/module/glamo_mci/parameters/sd_drive
|      >0
|    - re-set it to this (or possibly any other) value
|      >root@om-gta02:~# echo 0 > /sys/module/glamo_mci/parameters/sd_drive
|    - now the device works fine:

I don't see how that can impact it, none of our code runs when you
change that, it simply gets written to the int that holds sd_drive
behind our back.  And when you catted it, that is the real value of that
int, it doesn't keep a copy somewhere.  Maybe something else in the
threshing around helped.

|      >root@om-gta02:~# fdisk -l /dev/mmcblk0
|
| Disk /dev/mmcblk0: 4025 MB, 4025483264 bytes
| 126 heads, 61 sectors/track, 1022 cylinders
| Units = cylinders of 7686 * 512 = 3935232 bytes
|
|         Device Boot      Start         End      Blocks  Id System
| /dev/mmcblk0p1               1        1022     3927515+ 83 Linux
| ==========================
|
| A ...
From: David Meder-Marouelli
Date: Wednesday, July 23, 2008 - 1:57 pm

Hi Andy,

thanks for the quick reply.

Not really good timing would probably be a good guess since "Intenso" is
Thanks for your guidance. It's _not_ the sd_drive parameter.

It's actually just waiting for about 1-2min. After that the device is
readily accessible. (Need to force a re-read of the partition table to
mount though.)

Anyways I made sure that the data I wrote to the card yesterday is still
OK (md5sum) - although it's only around 1MB. I can do more tests tomorrow...

Cheers,

    David



_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Stefan Fröbe
Date: Wednesday, July 23, 2008 - 2:15 pm

------=_Part_84605_19152409.1216847721571
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi,

just a quick observation from my side that could possibly be related:
Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g.
2008-07-22 or 21) kernel - after an opkg upgrade this morning that got me a
new kernel I was surprised to see the card not beeing recognized anymore -
furthermore, its MBR was zeroed out, and no tool could read or reformat it
except a SD-Card recovery tool by Panasonic ( sdfv2003.exe running only
under Windows, of course ) !

I now backup'ed the partition table and mbrs in hope to be able to dd it
back, should this happen again. Sorry, but I haven't got any logs as I was
busy recovering what was left, but I'll surly save them next time ...

Stefan

uname -a
Linux om-gta02 2.6.24 #1 PREEMPT Wed Jul 23 06:34:19 CEST 2008 armv4tl
unknown

------=_Part_84605_19152409.1216847721571
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div dir="ltr">Hi,<br><br>just a quick observation from my side that could possibly be related:<br>Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g. 2008-07-22 or 21) kernel - after an opkg upgrade this morning that got me a new kernel I was surprised to see the card not beeing recognized anymore - furthermore, its MBR was zeroed out, and no tool could read or reformat it except a SD-Card recovery tool by Panasonic ( sdfv2003.exe running only under Windows, of course ) !<br>
<br>I now backup'ed the partition table and mbrs in hope to be able to dd it back, should this happen again. Sorry, but I haven't got any logs as I was busy recovering what was left, but I'll surly save them next time ... <br>
<br>Stefan <br><br>uname -a<br>Linux om-gta02 2.6.24 #1 PREEMPT Wed Jul 23 06:34:19 CEST 2008 armv4tl unknown<br><br></div>

------=_Part_84605_19152409.1216847721571--

From: Andy Green
Date: Wednesday, July 23, 2008 - 2:58 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| Hi,
|
| just a quick observation from my side that could possibly be related:
| Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g.
| 2008-07-22 or 21) kernel - after an opkg upgrade this morning that got
| me a new kernel I was surprised to see the card not beeing recognized
| anymore - furthermore, its MBR was zeroed out, and no tool could read or
| reformat it except a SD-Card recovery tool by Panasonic ( sdfv2003.exe
| running only under Windows, of course ) !
|
| I now backup'ed the partition table and mbrs in hope to be able to dd it
| back, should this happen again. Sorry, but I haven't got any logs as I
| was busy recovering what was left, but I'll surly save them next time ...

There's a race of some kind in suspend / resume that can do this, the
signature effect of it is on resume your device comes back as mmcblk1
and the logical filesystem in memory is corrupted.  We didn't see this
for a long time though.  Maybe keep an eye out for such shenanigans on
resume.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiHqaMACgkQOjLpvpq7dMpmjACeLCn3EHaubbZvLQWiBOqbJEIC
X1kAnipwM3etDG0tcQbVWArQuNNbV1vp
=qSZM
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Matt Luzum
Date: Wednesday, July 23, 2008 - 3:20 pm

I just thought I'd mention that I had a similar thing happen.  Three 
times in the past couple days, my 8 GB A-Data microSDHC card somehow 
seemed to have its partition table deleted.  No partitions would show up 
on my card anyway, although I could make new partitions and read and 
write from them with a card reader.  I haven't had time to investigate 
more thoroughly, so I don't know whether it happens on resume or if it 
only happens when I'm doing something in particular.  I've had the 
original half gig card in there since yesterday and it hasn't had any 
problems, even though the SDHC card previously had problems several 
times in short order, so there might be some difference there.

Sorry I can't be of more help with specifics, but I can confirm that 
there's someone else having this problem.

Matt


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Steven **
Date: Thursday, July 24, 2008 - 9:48 am

I experienced the same issue with my card (also the A-Data 8GB) after
flashing the kernel and rootfs builds from the 22nd.  My partition
table seems to have been deleted.  I'm pretty sure it happened after a
suspend/resume cycle (what happens when you have power management set
to "dim first, then lock").

I had a vfat and an ext3 partition on there that I was using to dual-boot.

Is there a bug report/ticket for this issue I should be adding to?

At the very least, doesn't this belong on the support mailing list
instead of community?

-Steven



_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Doug Jones
Date: Wednesday, July 23, 2008 - 3:17 pm

Not sure if this is connected with what you are seeing, but...

Something similar has been happening with SD cards on the OLPC laptop 
(another example of hardware specifically designed for the FOSS world) 
for at least the last six months.  Last time I checked, there was still 
no real fix.  Has been a major pain for people who want to multiboot  -- 
  forces them to use storage devices that don't fit inside the case.

http://dev.laptop.org/ticket/6532

I am getting the impression that interfacing to SD cards is hard.



_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Stefan Fröbe
Date: Wednesday, July 23, 2008 - 3:36 pm

------=_Part_85110_20615919.1216852606491
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Thanks for the link, seems to be quite valuable to me as it explains the
background quite well!



Well, from recent comments it looks like a 400ms delay (yuck!) in
drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as Andy
already suggested) seems to be related to the resume cycle.



At least it doesn't look like a HW issue with the card, then.

Stefan

------=_Part_85110_20615919.1216852606491
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div dir="ltr">Thanks for the link, seems to be quite valuable to me as it explains the background quite well!<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Something similar has been happening with SD cards on the OLPC laptop<br>
(another example of hardware specifically designed for the FOSS world)<br>
for at least the last six months.  Last time I checked, there was still<br>
no real fix.  </blockquote><div> </div><div>Well, from recent comments it looks like a 400ms delay (yuck!) in <br>drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as Andy already suggested) seems to be related to the resume cycle.<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Has been a major pain for people who want to multiboot  --<br>
  forces them to use storage devices that don't fit inside the case.<br>
<br>
<a href="http://dev.laptop.org/ticket/6532" target="_blank">http://dev.laptop.org/ticket/6532</a></blockquote><div> <br></div></div>At least it doesn't look like a HW issue with the card, then.<br><br>Stefan<br></div>

------=_Part_85110_20615919.1216852606491--

From: Andy Green
Date: Wednesday, July 23, 2008 - 3:49 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| Thanks for the link, seems to be quite valuable to me as it explains the
| background quite well!
|
|     Something similar has been happening with SD cards on the OLPC laptop
|     (another example of hardware specifically designed for the FOSS world)
|     for at least the last six months.  Last time I checked, there was
still
|     no real fix.
|
|
| Well, from recent comments it looks like a 400ms delay (yuck!) in
| drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as
| Andy already suggested) seems to be related to the resume cycle.
|
|     Has been a major pain for people who want to multiboot  --
|      forces them to use storage devices that don't fit inside the case.
|
|     http://dev.laptop.org/ticket/6532
|
|
| At least it doesn't look like a HW issue with the card, then.

Yes when we originally had this problem I found OLPC had it and indeed
Eee PC at that time.  What "cured" it for us was removing the low level
debug config option in the kernel, but that really is all about changing
timing too.

There's another complicated problem that can be related about the
relationship between the PMU and the Glamo.  The PMU device is only
created really late in boot because it is on I2C bus.  That means it is
suspended very early in suspend, yanking a lot of power rails (including
the CPU core power!  But it goes on long enough from caps) before the
MMC stack has a chance to talk to the card and close it down gently.

Although suspend / resume has been acting well these last weeks it is
fragile and we'll be doing a lot more work on it.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiHtWgACgkQOjLpvpq7dMqZIQCgkQdTUr6+4RrgkvCVG3fgWt3y
Od0AmwTQhXUu0Iklzfbi+1f0I4oWn3kT
=q9q1
-----END PGP ...
From: Doug Jones
Date: Wednesday, July 23, 2008 - 4:21 pm

Just spent some time looking at that bug ticket again.  I've been trying 
to follow this story ever since my OLPC trashed the partition table on 
my 16GB SDHC card back in January.

A consensus seems to be building that suspend/resume is involved.  But I 
don't think anybody really understands what's going on, and people have 
been bashing their heads against this again and again for six months.

If something like this is happening in the Neo, then this could turn 
into a world of hurt.


Recommendation:


Everybody, get a Micro-SD card and stick it in your Neo.  Put some 
random files on it, you don't have to do anything serious with it, just 
let it sit in there for a while.  Periodically check that you can still 
see those files.  If you lose those files, post about it.  Maybe we 
should start a new thread to keep track of this data.

There have been some indications that partition type may have some 
effect on this problem on the OLPC.  So, shrink the default vfat 
partition that came on the card and put an ext3 on there too.  If you 
want to be adventurous, try some other types.

If more people start seeing problems like this, we ought to start 
comparing notes with the OLPC people who are working on this.  And 
Pierre Ossman too, he did a lot of work on SD support in the kernel.


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Mikael Berthe
Date: Wednesday, July 23, 2008 - 10:34 pm

Happens to me with ext3 partitions as well (or mixed vfat/ext3
partitions).

However if I restore the partition table the data are not corrupted,
at least so far it's been all right...
-- 
MiKael

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Doug Jones
Date: Wednesday, July 23, 2008 - 11:39 pm

Most people who use SD cards on OLPC are leaving them formatted as vfat, 
because Sugar can't see any other type.  I don't recall seeing any 
reports of partition table mangling from these people, who are the vast 
majority of OLPC users.

It's when they try something other than vfat that the corruption occurs. 
  When it happened to me, I had one vfat and one ext3 on there.

So on the OLPC at least, the corruption does seem to be correlated with 
partition type.


Because the number of people trying different file systems on the SD 
card was relatively small, and because the corruption was happening sort 
of randomly, it's been difficult to figure out what's happening.  Sample 
size too small.

People were making all kinds of assumptions based on the limited 
reports, some of them wrong.  If you look through that bug ticket and 
also various reports on forums and lists, you find contradictory 
information regarding which builds were affected, which file systems 
were affected, and so on.  This kind of noise makes it a lot harder to 
debug.  It's still not clear that this problem is understood, even six 
months later.

This is why I suggested that lots of people try various file systems in 
their Neos, even if they don't need to have a card in there right now. 
If there is indeed an SD problem in the Neo, we need lots of reports, 
otherwise the sampling size problem might drag out the debugging for 
months as happened with OLPC.

If there is no problem with SD, then there's no harm done.  You'll just 
have a lot of people walking around with SD cards they aren't using yet 
sitting inside their phones.



And of course it's entirely possible that the OLPC / SD problem has 
nothing to do with the Neo.

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: AVee
Date: Thursday, July 24, 2008 - 5:06 pm

I haven't been able to find a publicly available version of any SD Card or 
SDHC spec. But both of the documents below seem to suggest that fat32 is part 
of the specification:
http://www.sdcard.org/about/sdhc/
http://www.kingston.com/flash/pdf_files/MKF_1127_SDHC_Topic_Paper.pdf

If that is the case both the controller and logic on the card may assume it 
contains a single full-size fat32 partition. They surely will not be tested 
with anything other then fat32. 

Does anyone here have access to official specs from the SDCard Association? It 
be interesting to have at least a hint about what the spec says about 
partitioning...

AVee

-- 
I always finish what I...

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: arne anka
Date: Thursday, July 24, 2008 - 12:53 am

> Everybody, get a Micro-SD card and stick it in your Neo.  Put some

should that apply to multiboot or to _every_ use of the sd card?
i use suspend/resume more or less successfully for a week or 10 days now  
and the files on my sd card (4gb, how do i determine the exact name from a  
running system?) still are unharmed.
gta02, 2007.2, upgrade every or every second day.

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Stefan Fröbe
Date: Thursday, July 24, 2008 - 4:38 am

------=_Part_89505_7053446.1216899525664
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi all,

I can now reliably reproduce the issue, as dd'ing the mbr back to the card
so far restores sane behaviour :

If sd_drive is set to "0", then after a resume from "sync && apm -s" the MBR
of my 4GB SanDisk is wiped - so far I haven't noticed any other errors, but
have not looked very closely.

To recover, I use the following commands:
---------------
# re-write MBR
dd if=mmcblk0_512_1.dump of=/dev/mmcblk0
# recognize partitions again
echo "1">/sys/module/glamo_mci/parameters/sd_drive
apm -s
----------------

So it looks as if the sd_drive parameter does have a role in this - any
suggestions on what else I should try, or what logs you guys need ?

Unmounting the card before suspend should help, and I'll also gladly try
another kernel and other partitions setup if I find the time.

Btw, this all happens on a 4GB SanDisk with 4 primary partitions:
20M vfat + (196M +196M +3.2G ) ext2 and kernel om-gta02 2.6.24 Wed Jul 23
06:34:19

Stefan

PS: Can somebody please tell me how to re-initialize the card without going
through another suspend/resume cycle ?





------=_Part_89505_7053446.1216899525664
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div dir="ltr">Hi all,<br><br>I can now reliably reproduce the issue, as dd'ing the mbr back to the card so far restores sane behaviour :<br><br>If sd_drive is set to "0", then after a resume from "sync && apm -s" the MBR of my 4GB SanDisk is wiped - so far I haven't noticed any other errors, but have not looked very closely.<br>
<br>To recover, I use the following commands:<br>---------------<br># re-write MBR<br>dd if=mmcblk0_512_1.dump of=/dev/mmcblk0<br># recognize partitions again<br>echo "1">/sys/module/glamo_mci/parameters/sd_drive<br>
apm ...
From: Andy Green
Date: Thursday, July 24, 2008 - 4:59 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| Hi all,
|
| I can now reliably reproduce the issue, as dd'ing the mbr back to the
| card so far restores sane behaviour :
|
| If sd_drive is set to "0", then after a resume from "sync && apm -s" the
| MBR of my 4GB SanDisk is wiped - so far I haven't noticed any other
| errors, but have not looked very closely.
...

| PS: Can somebody please tell me how to re-initialize the card without
| going through another suspend/resume cycle ?

sd_drive setting isn't actually used until next time we access the card,
so provoking an access will do it, eg, touch /something ; sync.

But the two explanations for what goes on seem mixed still here, we
affect sd_drive and we do a suspend.  My guess / hope is that this
problem is coming from the suspend action alone and the change of
sd_drive is bogus here.  Maybe you can bang on it a little more trying
to disprove that hypothesis?

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiIbpQACgkQOjLpvpq7dMoFygCfagDp2oeJBH3TWSCtzgfeKiBX
SOkAnibMEKKHWCf7w5UDCp+9Jy2V8aqj
=FgHS
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Stefan Fröbe
Date: Thursday, July 24, 2008 - 5:53 am

Good point, but now it is getting really strange (all with sd_drive=0
prior to suspend):
Adding a "touch /media/mmcblk0p4/suspending" and it works,
also adding the sync and it doesn't, and finally also adding a "sleep
1" to the line and it gives me mixed results.
Maybe it is a timing issue, and the previously mentioned 400-500ms

Will do if I find the time, but for now completely unmounting the card
seems like the only viable solution apart from dd'ing mbrs back and
forth until the root cause is found...
Hopefully the data storage on card is not impeded, but personally I
will do backups more often now ;-) ...

STefan

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Joerg Reisenweber
Date: Friday, July 25, 2008 - 7:52 pm

--nextPart3284080.SFzY8Rje2l
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline


As I think this seems to be quite a good clue to what's really happening he=
re,
quote from the OLPC ticket #6532:
cc dilinger added=20
 I've spend some time digging deep into the bowels of the VFS and block lay=
er=20
and gathering some debug output and have an explanation for the partition=20
table corruption:=20
 Upon coming out of resume, the SD code, with CONFIG_MMC_UNSAFE_SUSPEND=20
enabled, checks to see if there is a card plugged into the system and wheth=
er=20
that card is the same as the one that was plugged into the system at suspen=
d=20
time. This is accomplished by reading the card ID of the device and for som=
e=20
reason, very possibly #1339, we fail this detection. In this case, the kern=
el=20
removes the old device from the system and in this execution path, the=20
partition information for this device is zeroed.=20
 Even though the device is removed, the device is still mounted and upon=20
unmount, ext2 syncs the superblock, even if the file system is sync'd=20
beforehand. The superblock is block 0 of the partition and the block layer=
=20
adds to this the partition start offset before submitting the write to the=
=20
lower layers. As the partition information has already been zeroed out, we=
=20
end up writing to block 0 of the disk itself, overwriting the partition tab=
le=20
and the geometry information. I've verified this by both gathering debug=20
output and 'dd' + 'hexdump' of corrupted and uncorrupted media.=20
 Some interesting points:=20
We are able to delete a block device even though it is still mounted.=20
Even though the device has been deleted, the write submitted to it does not=
=20
fail.=20
 Note that this is still not 100% reproducible and in certain cases the=20
superblock write during unmount does fail with block I/O errors, meaning th=
at=20
the queue is properly deleted. As ...
From: Doug Jones
Date: Friday, July 25, 2008 - 11:02 pm

Joerg Reisenweber wrote:


<snip>


Yes, anybody working on this issue really ought to read that ticket in 
its entirety:

http://dev.laptop.org/ticket/6532

(keeping in mind that some of the earlier entries, from months ago, may 
contain erroneous data)

Note that some people think that this problem may affect things other 
than the SD card, and that external storage connected through USB might 
have a problem too.

OLPC really really wants to do aggressive power management.  They want 
to do things like halt the processor between keystrokes.  If they can't 
do suspend and resume in < 100 msec, they may not ever be able to 
deliver the holy grail:  a laptop that can run for ten minutes on the 
power provided by one minute of muscle power from a four-year-old child.

If they can achieve this, then OpenMoko ought to be able to achieve such 
aggressive power management too.  That's how you get really long battery 
life.

It seems that reconciling the need for data integrity on flash drives 
with the desire to achieve excellent power management is a hard problem.

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Friday, July 25, 2008 - 11:57 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:

| As I think this seems to be quite a good clue to what's really
happening here,
| quote from the OLPC ticket #6532:
| (HTH)
| cc dilinger added
|  I've spend some time digging deep into the bowels of the VFS and
block layer
| and gathering some debug output and have an explanation for the partition
| table corruption:

That's a great post you found Joerg and it is to the point.  But what is
killing me is this was "working" seemingly until we followed Sean
McNeil's lead about removing printks of all things from PMU driver while
trying to find the GSM crash in resume problem.  That's not to blame his
insight; clearly if we only work because async printks let it work,
we're not really working at all.  Previous to that, enabling synchronous
low level debug in the kernel forced the bad behaviour and disabling it
gave the good behaviour.

This is ultimately a resume race of some kind, the VFS layer corruption
and taking a whiz on block 0 (noticeable as it is) is downstream of
whatever is truly responsible.

I have made half the device a child of the PMU device now reflecting the
relationship more clearly and this is honoured by the suspend / resume
ordering now.  It means that we still have power at the SD Card while
glamo-mci driver is suspending, but we still fail to get a good result
from the last command sent on suspend, cmd7 to deselect the card.  I
have to fix more problems today before I can see how MMC resumes from it.

I'm also doing this on 2.6.26 in case MCI layer changes make a different
result.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiKyrkACgkQOjLpvpq7dMpUbQCeK6fTmQRTf8A/Fm6Ze7Phiw/U
YsYAn06RnSAuZbZUtRnP+56jY4Bau9OF
=lJhb
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing ...
From: Doug Jones
Date: Saturday, July 26, 2008 - 10:57 am

Andrew Burgess said (on the OLPC bug tracker):

"...For me the SD card corruption is 100% fixed now. I run a swap to the 
first sd card partition and I could guarantee partition wipe by turning 
off power or shutting down with swap on. I could work around it 100% by 
running swapoff before power down. I never enabled suspend. Now 
everything works. It suspends and resumes at will with swap running. 
Shutdown or mash the power button, partition table is fine..."

http://dev.laptop.org/ticket/6532#comment:63





_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: vale
Date: Saturday, July 26, 2008 - 4:52 pm

i had the same problem here, with sandisk 8gb sdhc class 4 using ext2
partition. partition table was completley deleted :(

hope this gets fixed soon.
-- 
View this message in context: http://n2.nabble.com/strange-problem-with-Intenso-4GB-SDHC-card-tp579169p584908.html
Sent from the Openmoko Community mailing list archive at Nabble.com.


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Sunday, July 27, 2008 - 12:31 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| Andrew Burgess said (on the OLPC bug tracker):
|
| "...For me the SD card corruption is 100% fixed now. I run a swap to the
| first sd card partition and I could guarantee partition wipe by turning
| off power or shutting down with swap on. I could work around it 100% by
| running swapoff before power down. I never enabled suspend. Now
| everything works. It suspends and resumes at will with swap running.
| Shutdown or mash the power button, partition table is fine..."
|
| http://dev.laptop.org/ticket/6532#comment:63

What is the swap situation on the image you are running?  I noticed
Debian was doing something about it on initscripts, but I didn't notice
before that we run swap on ASU?

If as I believe this is very sensitive to a race, then not syncing swap
can change the behaviour miles away from the swap itself and change the
symptom, as could a bunch of other stuff.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiMJGQACgkQOjLpvpq7dMoBGQCfTZGoHzScbI+RtGdmKEtLlN0X
4fgAnjeHUc2iowstBWGgXmeVjd9kEr0k
=uvX+
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: David Meder-Marouelli
Date: Thursday, July 24, 2008 - 5:25 am

Hi Stefan,

maybe it turn out to be the same problem that I also have.

After the the card settles (see above in this thread), i can do the
following:

1) fdisk -l /dev/mmcblk0
    Result: error
2) fdisk -l /dev/mmcblk0
    Result: empty partition table!!!
3) fdisk -l /dev/mmcblk0
    Result: correct full partition table!!!

[Note: nothing else changed in between]

To re-ead the partition table just start "fdisk /dev/mmcblk0", verify
that partition table is ok with "p", re-write unaltered partition table
with "w". After that it's re-read by the kernel and the Frerunner
recognises it.

Cheers,

    David



_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Stefan Fröbe
Date: Thursday, July 24, 2008 - 5:41 am

...
No matter how often I call this, it never changes as the MBR is

 That works great after I dd the mbr back, thanks for the pointer!

Stefan

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Steven **
Date: Thursday, July 24, 2008 - 4:37 pm

Does anyone know an equivalent tool for Linux?
fdisk just says "unable to open".  So, I can't even re-write the data
to the card.  It's just dead!

-Steven


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Steven **
Date: Thursday, July 24, 2008 - 4:56 pm

Nevermind.  Gparted could read it.

-Steven


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Wednesday, July 23, 2008 - 2:56 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| Hi Andy,
|
| thanks for the quick reply.
|
| Andy Green schrieb:
|> | mmc0: new high speed SDHC card at address b368
|> | mmcblk0: mmc0:b368 SD    3931136KiB
|> |  mmcblk0:<6>glamo-mci glamo-mci.0: Error after cmd: 0x8310
|> | mmcblk0: error -110 sending read/write command
|>
|> This first real error is ETIMEDOUT.  It can be a genuine timeout because
|> your card is a bit slow, but it could also mean communication problems.
| Not really good timing would probably be a good guess since "Intenso" is
| a low cost brand around here (germany).

I found that the timeout code doesn't take care if the MMC stack asks it
for a bigger timeout than it can handle in Glamo hardware, also another
unlikely issue to do with giving enough clocks after powerup I will make
patches for tonight.

|> | 2) I can trick it to work doing the following steps:
|> |    - re-set it to this (or possibly any other) value
|> |      >root@om-gta02:~# echo 0 >
|> /sys/module/glamo_mci/parameters/sd_drive
| Thanks for your guidance. It's _not_ the sd_drive parameter.
|
| It's actually just waiting for about 1-2min. After that the device is
| readily accessible. (Need to force a re-read of the partition table to
| mount though.)
|
| Anyways I made sure that the data I wrote to the card yesterday is still
| OK (md5sum) - although it's only around 1MB. I can do more tests
tomorrow...

Thanks, these kind of issues are obviously pretty interesting right now.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiHqSsACgkQOjLpvpq7dMrJTQCfTC2yr+WJz/kBg/KWkRuQ5zw2
6N0AniKXzaA4tbhsBoe/Zy0gqRKXbc7n
=JLG7
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: arne anka
Date: Friday, July 25, 2008 - 1:54 am

i got a 4 gig card too (can't say if from intenso, have to check the  
wrapping).
my card's boot sector is not erased but -- after a resume the card is  
mounted wrongly!

fstab says as mountpoint /media/card and after booting that's where the  
card is.
after suspend/resume the card (often) is mounted to /media/mmcblk1p1  
instead -- thus every attempt to read from or write to the sd card goes to  
the built-in memory instead.

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Friday, July 25, 2008 - 2:41 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| i got a 4 gig card too (can't say if from intenso, have to check the
| wrapping).
| my card's boot sector is not erased but -- after a resume the card is
| mounted wrongly!
|
| fstab says as mountpoint /media/card and after booting that's where the
| card is.
| after suspend/resume the card (often) is mounted to /media/mmcblk1p1
| instead -- thus every attempt to read from or write to the sd card
goes to
| the built-in memory instead.

This is definitely the signature of the old MMC resume problems, not
anything else.

I am deep down that mine at the minute, this is getting intensely looked at.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiJn8YACgkQOjLpvpq7dMonOACfTUfolw3wMrey+x51IwF/zl4W
MxsAnjIkOZu9PLLGPZEX6L4l7a1g6x2d
=vnNT
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: arne anka
Date: Friday, July 25, 2008 - 2:51 am

oh, ok. thought it to be related to the wiped out boot sector and thus  

is there a workaround for the time being?

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Friday, July 25, 2008 - 3:36 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
|> This is definitely the signature of the old MMC resume problems, not
|> anything else.
|
| oh, ok. thought it to be related to the wiped out boot sector and thus
| maybe helpful.

It can be related, I expect the trashed block 0 thing is also
suspend-related only.

|> I am deep down that mine at the minute, this is getting intensely
looked
|> at.
|
| is there a workaround for the time being?

Not that I know of.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiJrMsACgkQOjLpvpq7dMot8wCfXb8MuDMypvOB9u/wFNxtjkp6
n/gAn3HE4ZZ7BhlTbgeK5krC1ppHBQaM
=Pyiu
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: arne anka
Date: Friday, July 25, 2008 - 9:12 am

btw: just checked and the default _device_ ist mmcblk0p1, after resume it  
disappears and instead there is mmcblk1p1.
dunno if it is news to you ...

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: arne anka
Date: Friday, July 25, 2008 - 11:43 am

well, the card is a kingston 4gb class 4.
yesterday i downloaded a map with tangogps and meanwhile the fr went to  
suspend. afterwards a few directories were misisng.
trying
ls /media/card/Maps/om/adirectorynotthere/
gives errors (something w/ read access i think)
so i decided to delete the entire /media/card/Maps folder from within the  
fr -- but that fails with "read-only filesystem". and indedd magically the  
card is mounted read only.
mount /media/card -o remount,rw
changes that, but next thing i know is "read only filesystem".

not sure if that fits into the original problem with the fried boot sector.
is there a trac ticket where i may add those informations?

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andrew Burgess
Date: Monday, August 11, 2008 - 8:24 am

Perhaps this will help until Andy gets the new kernel ready.

I had vanishing partition table problems on my OLPC XO (fixed now in
kernel 2.6.25). I found that I could avoid data loss by doing two
things:
1) create a swap partition as the first partition.
2) when the table vanishes, recreate it EXACTLY as before (IOW same
size swap partiton followed by same size (Or remainder of card)
filesystem partition.

For me, the filesystem data reappeared undamaged.

This worked I think because the bug mashes sever sectors at the
beginning of the SD card, and by letting the swap partition take the
hit your data survives.

HTH


ext2/3 will remount RO if there are filesystem errors, which there

probably, IME several of the first sectors are involved

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: AVee
Date: Friday, July 25, 2008 - 2:44 pm

I can confirm seeing the same behaviour with a Sandisk 8GB card...

AVee

-- 
I haven't lost my mind -- it's backed up on tape somewhere.

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Thomas B.
Date: Friday, July 25, 2008 - 4:16 pm

Me too (also Sandisk 8GB). It bit me while running Qtopia from SD card:
Upon resuming the system crashed, probably because the rootfs was
gone...

I could reproduce this after having booted from flash. dmesg log of the
suspend/resume cycle is attached.

Regards,
Thomas

From: Gianluigi
Date: Thursday, August 7, 2008 - 11:51 pm

Any fix for this problem?

I've made many test with a 4G SD.
If use it for boot qtopia from vfat and ext2 partions it work fine. Slowly but 
fine.

If I start system from 2007.2 in flash the mounted SD partitions give me many 
errors.

Sometime was mounted 1st partion, sometime all of than.
I've tried to access it in reading and writing and frequently I get I/O errors 
in dmesg.

On time was loose partition table (in suspend/resume session).

I've tried many partitions schemas (mixed FAT and ext2) but always I/O errors 
come.

I've reformatted in a unique partition with FAT and I've filled in it about 
1GB of maps for tangogps from my PC.

The partition was mounted in /media/card and when tangogps start to read from 
SD some file was read good and many other not.
The screen become full noised when SD is reading and get I/O errors.
After the read go to end the screen come back showing fine.

The 4GB SD with my PC work fine, no I/O errors.
The 512M SD in FR package always work very well.

The 4GB SD is a Apacer Micro SDHC 4GB Class 6.

-- 
The sooner our happiness together begins, the longer it will last.
		-- Miramanee, "The Paradise Syndrome", stardate 4842.6

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: vale
Date: Friday, August 8, 2008 - 12:59 am

yes i am loosing my partition table every couple of day. very annoying! i
cant trust anymore my data beeing secure on the openmoko.

i have a sandisk 8gb class4 sdhc card with one ext3 partition. 

i hope there will be a fix soon, but i'm loosing hope :(


-- 
View this message in context: http://n2.nabble.com/strange-problem-with-Intenso-4GB-SDHC-card-tp579169p680129.html
Sent from the Openmoko Community mailing list archive at Nabble.com.


_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Friday, August 8, 2008 - 3:10 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:
| yes i am loosing my partition table every couple of day. very annoying! i
| cant trust anymore my data beeing secure on the openmoko.
|
| i have a sandisk 8gb class4 sdhc card with one ext3 partition.
|
| i hope there will be a fix soon, but i'm loosing hope :(

Hey don't lose hope.  There are two issues.  First is just some big
cards are too slow to respond at default 16MHz clock with Glamo 16-bit
clock count timeout counter.  See this

https://docs.openmoko.org/trac/ticket/1743

Suspend / resume (partition overwrite is only a suspend / resume issue)
has been fundamentally broken on GTA02 since before I got here last
December, it didn't work at all until a series of deathmatches with it.
~ The biggest deathmatch of all to clean and fix it is going on at the
minute on 2.6.26 branch here and it exposed the biggest underlying
problem for us which is Glamo behaviours.  Assuming I kill it before it
kills me, we will have a far less racy and more complete suspend and
resume ordering situation then.

Other projects using Linux also have that problem of partition overwrite
on resume, but I suspect resume ordering and racing is behind their
problems too.  When we clear that in the 2.6.26 branch we stand a chance
to synthesize random or moving delays in resume action and try to flush
out where it comes from.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkicG6cACgkQOjLpvpq7dMoYOACdHxsTW9deFxs0p6xlP99mbPbk
788An3r++lSrpASbliku2hSP5nROaKbz
=Hgum
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Joachim Steiger
Date: Monday, August 11, 2008 - 8:36 am

Andy Green wrote:


i played with it a bit and came to the conclusion that it eats exactly
1024byte from the beginning of the 'physical' blockdevice. atleast when
i backup these to nand, write them back via dd after loosing it and do a
ioctl via fdisk /dev/mmcblk0 -> press w to trigger the block layer
rereading the device i am fine.

sounds weird.. is a buffer getting nulled on suspend, and gets written
back to disk even if it shouldnt?


-- 

Joachim Steiger
Openmoko Central Services

_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
From: Andy Green
Date: Monday, August 11, 2008 - 1:31 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Somebody in the thread at some point said:

| i played with it a bit and came to the conclusion that it eats exactly
| 1024byte from the beginning of the 'physical' blockdevice. atleast when
| i backup these to nand, write them back via dd after loosing it and do a
| ioctl via fdisk /dev/mmcblk0 -> press w to trigger the block layer
| rereading the device i am fine.
|
| sounds weird.. is a buffer getting nulled on suspend, and gets written
| back to disk even if it shouldnt?

Huh.  Well SD is predicated around 512 byte blocks, so it is a 2-block
transaction.  When I dump what the driver sees for requests, they are
usually 2 or 8 512 byte blocks (1KBytes or 4KBytes).  So that part isn't
very foreign to the kind of transactions that are seen.

In 2.6.24 the PMU is taken down real early in suspend, it yanks SD Card
power (and CPU core power LOL) long before the MCI / MMC / SD driver and
stack try to deselect the card nicely.  One of the changes in
andy-2.6.26 is to make Glamo (and other things) a child of the PMU, so
the ordering is all changed around and SD Card can complete suspend
sanely with the card still powered.  The PMU goes down towards the end
of all the suspend actions too which is much better considering CPU core
power.

The main striking thing about the SD Card overwrite issue for me is that
we got rid of it for a long time just by removing the synchronous low
level debug config... we could literally make this overwrite issue come
and go by basically changing timing of suspend and resume actions alone.
~ So I believe that we deal with races at the heart of all this.

- -Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkigoaYACgkQOjLpvpq7dMqk/ACfXI+8eLkpvn9/QBd3i9COMIMf
UH0An0Q0vm3BL+kN2CH4/7kvjK3tyyzU
=UJPs
-----END PGP SIGNATURE-----

_______________________________________________
Openmoko community mailing ...
Previous thread: Max seep of the SD slot? by Adam Talbot on Wednesday, July 23, 2008 - 12:24 pm. (19 messages)

Next thread: Problem upgrading ncurses by Dale Schumacher on Wednesday, July 23, 2008 - 4:43 pm. (11 messages)