Hi, I observed an interesting phenomenon with my newly bought Intenso 4GB SDHC card. While the 512MB card shipped with my Freerunner runs reliably and stable the new card shows the following behaviour: 1) Errors during boot process: ============================ root@om-gta02:~# dmesg |grep -E "glamo|mmc" glamo3362 glamo3362.0: Detected Glamo core 3650 Revision 0002 (49119232Hz CPU / 81887232Hz Memory) glamo3362 glamo3362.0: Glamo core now 49119232Hz CPU / 81887232Hz Memory) glamo-spi-gpio glamo-spi-gpio.0: registering c0373838: jbt6k74 glamo-mci glamo-mci.0: glamo_mci driver (C)2007 Openmoko, Inc glamo-mci glamo-mci.0: probe: mapped mci_base:c8864400 irq:0. glamo-mci glamo-mci.0: glamo_mci_set_ios: power down. glamo-mci glamo-mci.0: initialisation done. mmc_set_power(power_mode=1, vdd=20 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 0kHz div=255 (req: 0kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: Error after cmd: 0x120 glamo-mci glamo-mci.0: Error after cmd: 0x8120 glamo-mci glamo-mci.0: Error after cmd: 0x120 glamo-mci glamo-mci.0: Error after cmd: 0x8120 glamo-mci glamo-mci.0: Error after cmd: 0x120 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req: 16666kHz). Bus width=0 glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | glamo-mci glamo-mci.0: Error after cmd: 0x120 | glamo-mci glamo-mci.0: Error after cmd: 0x8120 | glamo-mci glamo-mci.0: Error after cmd: 0x120 | glamo-mci glamo-mci.0: Error after cmd: 0x8120 | glamo-mci glamo-mci.0: Error after cmd: 0x120 This bit is normal, it's the Linux MMC / SD stack seeing if it is an SDIO card. | glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req: | 16666kHz). Bus width=2 This bit is a good sign, it was able to complete getting recognized by the stack and put into 4-bit mode at 16MHz SD_CLK. | mmc0: new high speed SDHC card at address b368 | mmcblk0: mmc0:b368 SD 3931136KiB | mmcblk0:<6>glamo-mci glamo-mci.0: Error after cmd: 0x8310 | mmcblk0: error -110 sending read/write command This first real error is ETIMEDOUT. It can be a genuine timeout because your card is a bit slow, but it could also mean communication problems. | 2) I can trick it to work doing the following steps: | - check sd_drive parameter (not required) | >root@om-gta02:~# cat /sys/module/glamo_mci/parameters/sd_drive | >0 | - re-set it to this (or possibly any other) value | >root@om-gta02:~# echo 0 > /sys/module/glamo_mci/parameters/sd_drive | - now the device works fine: I don't see how that can impact it, none of our code runs when you change that, it simply gets written to the int that holds sd_drive behind our back. And when you catted it, that is the real value of that int, it doesn't keep a copy somewhere. Maybe something else in the threshing around helped. | >root@om-gta02:~# fdisk -l /dev/mmcblk0 | | Disk /dev/mmcblk0: 4025 MB, 4025483264 bytes | 126 heads, 61 sectors/track, 1022 cylinders | Units = cylinders of 7686 * 512 = 3935232 bytes | | Device Boot Start End Blocks Id System | /dev/mmcblk0p1 1 1022 3927515+ 83 Linux | ========================== | | A ...
Hi Andy,
thanks for the quick reply.
Not really good timing would probably be a good guess since "Intenso" is
Thanks for your guidance. It's _not_ the sd_drive parameter.
It's actually just waiting for about 1-2min. After that the device is
readily accessible. (Need to force a re-read of the partition table to
mount though.)
Anyways I made sure that the data I wrote to the card yesterday is still
OK (md5sum) - although it's only around 1MB. I can do more tests tomorrow...
Cheers,
David
_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
------=_Part_84605_19152409.1216847721571 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, just a quick observation from my side that could possibly be related: Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g. 2008-07-22 or 21) kernel - after an opkg upgrade this morning that got me a new kernel I was surprised to see the card not beeing recognized anymore - furthermore, its MBR was zeroed out, and no tool could read or reformat it except a SD-Card recovery tool by Panasonic ( sdfv2003.exe running only under Windows, of course ) ! I now backup'ed the partition table and mbrs in hope to be able to dd it back, should this happen again. Sorry, but I haven't got any logs as I was busy recovering what was left, but I'll surly save them next time ... Stefan uname -a Linux om-gta02 2.6.24 #1 PREEMPT Wed Jul 23 06:34:19 CEST 2008 armv4tl unknown ------=_Part_84605_19152409.1216847721571 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline <div dir="ltr">Hi,<br><br>just a quick observation from my side that could possibly be related:<br>Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g. 2008-07-22 or 21) kernel - after an opkg upgrade this morning that got me a new kernel I was surprised to see the card not beeing recognized anymore - furthermore, its MBR was zeroed out, and no tool could read or reformat it except a SD-Card recovery tool by Panasonic ( sdfv2003.exe running only under Windows, of course ) !<br> <br>I now backup&#39;ed the partition table and mbrs in hope to be able to dd it back, should this happen again. Sorry, but I haven&#39;t got any logs as I was busy recovering what was left, but I&#39;ll surly save them next time ... <br> <br>Stefan <br><br>uname -a<br>Linux om-gta02 2.6.24 #1 PREEMPT Wed Jul 23 06:34:19 CEST 2008 armv4tl unknown<br><br></div> ------=_Part_84605_19152409.1216847721571--
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | Hi, | | just a quick observation from my side that could possibly be related: | Until yesterday, my 4GB SanDisk card worked fine with a recent (e.g. | 2008-07-22 or 21) kernel - after an opkg upgrade this morning that got | me a new kernel I was surprised to see the card not beeing recognized | anymore - furthermore, its MBR was zeroed out, and no tool could read or | reformat it except a SD-Card recovery tool by Panasonic ( sdfv2003.exe | running only under Windows, of course ) ! | | I now backup'ed the partition table and mbrs in hope to be able to dd it | back, should this happen again. Sorry, but I haven't got any logs as I | was busy recovering what was left, but I'll surly save them next time ... There's a race of some kind in suspend / resume that can do this, the signature effect of it is on resume your device comes back as mmcblk1 and the logical filesystem in memory is corrupted. We didn't see this for a long time though. Maybe keep an eye out for such shenanigans on resume. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiHqaMACgkQOjLpvpq7dMpmjACeLCn3EHaubbZvLQWiBOqbJEIC X1kAnipwM3etDG0tcQbVWArQuNNbV1vp =qSZM -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
I just thought I'd mention that I had a similar thing happen. Three times in the past couple days, my 8 GB A-Data microSDHC card somehow seemed to have its partition table deleted. No partitions would show up on my card anyway, although I could make new partitions and read and write from them with a card reader. I haven't had time to investigate more thoroughly, so I don't know whether it happens on resume or if it only happens when I'm doing something in particular. I've had the original half gig card in there since yesterday and it hasn't had any problems, even though the SDHC card previously had problems several times in short order, so there might be some difference there. Sorry I can't be of more help with specifics, but I can confirm that there's someone else having this problem. Matt _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
I experienced the same issue with my card (also the A-Data 8GB) after flashing the kernel and rootfs builds from the 22nd. My partition table seems to have been deleted. I'm pretty sure it happened after a suspend/resume cycle (what happens when you have power management set to "dim first, then lock"). I had a vfat and an ext3 partition on there that I was using to dual-boot. Is there a bug report/ticket for this issue I should be adding to? At the very least, doesn't this belong on the support mailing list instead of community? -Steven _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Not sure if this is connected with what you are seeing, but... Something similar has been happening with SD cards on the OLPC laptop (another example of hardware specifically designed for the FOSS world) for at least the last six months. Last time I checked, there was still no real fix. Has been a major pain for people who want to multiboot -- forces them to use storage devices that don't fit inside the case. http://dev.laptop.org/ticket/6532 I am getting the impression that interfacing to SD cards is hard. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
------=_Part_85110_20615919.1216852606491 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks for the link, seems to be quite valuable to me as it explains the background quite well! Well, from recent comments it looks like a 400ms delay (yuck!) in drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as Andy already suggested) seems to be related to the resume cycle. At least it doesn't look like a HW issue with the card, then. Stefan ------=_Part_85110_20615919.1216852606491 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline <div dir="ltr">Thanks for the link, seems to be quite valuable to me as it explains the background quite well!<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Something similar has been happening with SD cards on the OLPC laptop<br> (another example of hardware specifically designed for the FOSS world)<br> for at least the last six months. &nbsp;Last time I checked, there was still<br> no real fix. &nbsp;</blockquote><div>&nbsp;</div><div>Well, from recent comments it looks like a 400ms delay (yuck!) in <br>drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as Andy already suggested) seems to be related to the resume cycle.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Has been a major pain for people who want to multiboot &nbsp;--<br> &nbsp;forces them to use storage devices that don&#39;t fit inside the case.<br> <br> <a href="http://dev.laptop.org/ticket/6532" target="_blank">http://dev.laptop.org/ticket/6532</a></blockquote><div>&nbsp;<br></div></div>At least it doesn&#39;t look like a HW issue with the card, then.<br><br>Stefan<br></div> ------=_Part_85110_20615919.1216852606491--
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | Thanks for the link, seems to be quite valuable to me as it explains the | background quite well! | | Something similar has been happening with SD cards on the OLPC laptop | (another example of hardware specifically designed for the FOSS world) | for at least the last six months. Last time I checked, there was still | no real fix. | | | Well, from recent comments it looks like a 400ms delay (yuck!) in | drivers/mmc/core/sd.c is a temporary workaround, but the root cause (as | Andy already suggested) seems to be related to the resume cycle. | | Has been a major pain for people who want to multiboot -- | forces them to use storage devices that don't fit inside the case. | | http://dev.laptop.org/ticket/6532 | | | At least it doesn't look like a HW issue with the card, then. Yes when we originally had this problem I found OLPC had it and indeed Eee PC at that time. What "cured" it for us was removing the low level debug config option in the kernel, but that really is all about changing timing too. There's another complicated problem that can be related about the relationship between the PMU and the Glamo. The PMU device is only created really late in boot because it is on I2C bus. That means it is suspended very early in suspend, yanking a lot of power rails (including the CPU core power! But it goes on long enough from caps) before the MMC stack has a chance to talk to the card and close it down gently. Although suspend / resume has been acting well these last weeks it is fragile and we'll be doing a lot more work on it. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiHtWgACgkQOjLpvpq7dMqZIQCgkQdTUr6+4RrgkvCVG3fgWt3y Od0AmwTQhXUu0Iklzfbi+1f0I4oWn3kT =q9q1 -----END PGP ...
Just spent some time looking at that bug ticket again. I've been trying to follow this story ever since my OLPC trashed the partition table on my 16GB SDHC card back in January. A consensus seems to be building that suspend/resume is involved. But I don't think anybody really understands what's going on, and people have been bashing their heads against this again and again for six months. If something like this is happening in the Neo, then this could turn into a world of hurt. Recommendation: Everybody, get a Micro-SD card and stick it in your Neo. Put some random files on it, you don't have to do anything serious with it, just let it sit in there for a while. Periodically check that you can still see those files. If you lose those files, post about it. Maybe we should start a new thread to keep track of this data. There have been some indications that partition type may have some effect on this problem on the OLPC. So, shrink the default vfat partition that came on the card and put an ext3 on there too. If you want to be adventurous, try some other types. If more people start seeing problems like this, we ought to start comparing notes with the OLPC people who are working on this. And Pierre Ossman too, he did a lot of work on SD support in the kernel. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Happens to me with ext3 partitions as well (or mixed vfat/ext3 partitions). However if I restore the partition table the data are not corrupted, at least so far it's been all right... -- MiKael _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Most people who use SD cards on OLPC are leaving them formatted as vfat, because Sugar can't see any other type. I don't recall seeing any reports of partition table mangling from these people, who are the vast majority of OLPC users. It's when they try something other than vfat that the corruption occurs. When it happened to me, I had one vfat and one ext3 on there. So on the OLPC at least, the corruption does seem to be correlated with partition type. Because the number of people trying different file systems on the SD card was relatively small, and because the corruption was happening sort of randomly, it's been difficult to figure out what's happening. Sample size too small. People were making all kinds of assumptions based on the limited reports, some of them wrong. If you look through that bug ticket and also various reports on forums and lists, you find contradictory information regarding which builds were affected, which file systems were affected, and so on. This kind of noise makes it a lot harder to debug. It's still not clear that this problem is understood, even six months later. This is why I suggested that lots of people try various file systems in their Neos, even if they don't need to have a card in there right now. If there is indeed an SD problem in the Neo, we need lots of reports, otherwise the sampling size problem might drag out the debugging for months as happened with OLPC. If there is no problem with SD, then there's no harm done. You'll just have a lot of people walking around with SD cards they aren't using yet sitting inside their phones. And of course it's entirely possible that the OLPC / SD problem has nothing to do with the Neo. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
I haven't been able to find a publicly available version of any SD Card or SDHC spec. But both of the documents below seem to suggest that fat32 is part of the specification: http://www.sdcard.org/about/sdhc/ http://www.kingston.com/flash/pdf_files/MKF_1127_SDHC_Topic_Paper.pdf If that is the case both the controller and logic on the card may assume it contains a single full-size fat32 partition. They surely will not be tested with anything other then fat32. Does anyone here have access to official specs from the SDCard Association? It be interesting to have at least a hint about what the spec says about partitioning... AVee -- I always finish what I... _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
> Everybody, get a Micro-SD card and stick it in your Neo. Put some should that apply to multiboot or to _every_ use of the sd card? i use suspend/resume more or less successfully for a week or 10 days now and the files on my sd card (4gb, how do i determine the exact name from a running system?) still are unharmed. gta02, 2007.2, upgrade every or every second day. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
------=_Part_89505_7053446.1216899525664 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all, I can now reliably reproduce the issue, as dd'ing the mbr back to the card so far restores sane behaviour : If sd_drive is set to "0", then after a resume from "sync && apm -s" the MBR of my 4GB SanDisk is wiped - so far I haven't noticed any other errors, but have not looked very closely. To recover, I use the following commands: --------------- # re-write MBR dd if=mmcblk0_512_1.dump of=/dev/mmcblk0 # recognize partitions again echo "1">/sys/module/glamo_mci/parameters/sd_drive apm -s ---------------- So it looks as if the sd_drive parameter does have a role in this - any suggestions on what else I should try, or what logs you guys need ? Unmounting the card before suspend should help, and I'll also gladly try another kernel and other partitions setup if I find the time. Btw, this all happens on a 4GB SanDisk with 4 primary partitions: 20M vfat + (196M +196M +3.2G ) ext2 and kernel om-gta02 2.6.24 Wed Jul 23 06:34:19 Stefan PS: Can somebody please tell me how to re-initialize the card without going through another suspend/resume cycle ? ------=_Part_89505_7053446.1216899525664 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline <div dir="ltr">Hi all,<br><br>I can now reliably reproduce the issue, as dd&#39;ing the mbr back to the card so far restores sane behaviour :<br><br>If sd_drive is set to &quot;0&quot;, then after a resume from &quot;sync &amp;&amp; apm -s&quot; the MBR of my 4GB SanDisk is wiped - so far I haven&#39;t noticed any other errors, but have not looked very closely.<br> <br>To recover, I use the following commands:<br>---------------<br># re-write MBR<br>dd if=mmcblk0_512_1.dump of=/dev/mmcblk0<br># recognize partitions again<br>echo &quot;1&quot;&gt;/sys/module/glamo_mci/parameters/sd_drive<br> apm ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | Hi all, | | I can now reliably reproduce the issue, as dd'ing the mbr back to the | card so far restores sane behaviour : | | If sd_drive is set to "0", then after a resume from "sync && apm -s" the | MBR of my 4GB SanDisk is wiped - so far I haven't noticed any other | errors, but have not looked very closely. ... | PS: Can somebody please tell me how to re-initialize the card without | going through another suspend/resume cycle ? sd_drive setting isn't actually used until next time we access the card, so provoking an access will do it, eg, touch /something ; sync. But the two explanations for what goes on seem mixed still here, we affect sd_drive and we do a suspend. My guess / hope is that this problem is coming from the suspend action alone and the change of sd_drive is bogus here. Maybe you can bang on it a little more trying to disprove that hypothesis? - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiIbpQACgkQOjLpvpq7dMoFygCfagDp2oeJBH3TWSCtzgfeKiBX SOkAnibMEKKHWCf7w5UDCp+9Jy2V8aqj =FgHS -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Good point, but now it is getting really strange (all with sd_drive=0 prior to suspend): Adding a "touch /media/mmcblk0p4/suspending" and it works, also adding the sync and it doesn't, and finally also adding a "sleep 1" to the line and it gives me mixed results. Maybe it is a timing issue, and the previously mentioned 400-500ms Will do if I find the time, but for now completely unmounting the card seems like the only viable solution apart from dd'ing mbrs back and forth until the root cause is found... Hopefully the data storage on card is not impeded, but personally I will do backups more often now ;-) ... STefan _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
--nextPart3284080.SFzY8Rje2l Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline As I think this seems to be quite a good clue to what's really happening he= re, quote from the OLPC ticket #6532: cc dilinger added=20 I've spend some time digging deep into the bowels of the VFS and block lay= er=20 and gathering some debug output and have an explanation for the partition=20 table corruption:=20 Upon coming out of resume, the SD code, with CONFIG_MMC_UNSAFE_SUSPEND=20 enabled, checks to see if there is a card plugged into the system and wheth= er=20 that card is the same as the one that was plugged into the system at suspen= d=20 time. This is accomplished by reading the card ID of the device and for som= e=20 reason, very possibly #1339, we fail this detection. In this case, the kern= el=20 removes the old device from the system and in this execution path, the=20 partition information for this device is zeroed.=20 Even though the device is removed, the device is still mounted and upon=20 unmount, ext2 syncs the superblock, even if the file system is sync'd=20 beforehand. The superblock is block 0 of the partition and the block layer= =20 adds to this the partition start offset before submitting the write to the= =20 lower layers. As the partition information has already been zeroed out, we= =20 end up writing to block 0 of the disk itself, overwriting the partition tab= le=20 and the geometry information. I've verified this by both gathering debug=20 output and 'dd' + 'hexdump' of corrupted and uncorrupted media.=20 Some interesting points:=20 We are able to delete a block device even though it is still mounted.=20 Even though the device has been deleted, the write submitted to it does not= =20 fail.=20 Note that this is still not 100% reproducible and in certain cases the=20 superblock write during unmount does fail with block I/O errors, meaning th= at=20 the queue is properly deleted. As ...
Joerg Reisenweber wrote: <snip> Yes, anybody working on this issue really ought to read that ticket in its entirety: http://dev.laptop.org/ticket/6532 (keeping in mind that some of the earlier entries, from months ago, may contain erroneous data) Note that some people think that this problem may affect things other than the SD card, and that external storage connected through USB might have a problem too. OLPC really really wants to do aggressive power management. They want to do things like halt the processor between keystrokes. If they can't do suspend and resume in < 100 msec, they may not ever be able to deliver the holy grail: a laptop that can run for ten minutes on the power provided by one minute of muscle power from a four-year-old child. If they can achieve this, then OpenMoko ought to be able to achieve such aggressive power management too. That's how you get really long battery life. It seems that reconciling the need for data integrity on flash drives with the desire to achieve excellent power management is a hard problem. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | As I think this seems to be quite a good clue to what's really happening here, | quote from the OLPC ticket #6532: | (HTH) | cc dilinger added | I've spend some time digging deep into the bowels of the VFS and block layer | and gathering some debug output and have an explanation for the partition | table corruption: That's a great post you found Joerg and it is to the point. But what is killing me is this was "working" seemingly until we followed Sean McNeil's lead about removing printks of all things from PMU driver while trying to find the GSM crash in resume problem. That's not to blame his insight; clearly if we only work because async printks let it work, we're not really working at all. Previous to that, enabling synchronous low level debug in the kernel forced the bad behaviour and disabling it gave the good behaviour. This is ultimately a resume race of some kind, the VFS layer corruption and taking a whiz on block 0 (noticeable as it is) is downstream of whatever is truly responsible. I have made half the device a child of the PMU device now reflecting the relationship more clearly and this is honoured by the suspend / resume ordering now. It means that we still have power at the SD Card while glamo-mci driver is suspending, but we still fail to get a good result from the last command sent on suspend, cmd7 to deselect the card. I have to fix more problems today before I can see how MMC resumes from it. I'm also doing this on 2.6.26 in case MCI layer changes make a different result. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiKyrkACgkQOjLpvpq7dMpUbQCeK6fTmQRTf8A/Fm6Ze7Phiw/U YsYAn06RnSAuZbZUtRnP+56jY4Bau9OF =lJhb -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing ...
Andrew Burgess said (on the OLPC bug tracker): "...For me the SD card corruption is 100% fixed now. I run a swap to the first sd card partition and I could guarantee partition wipe by turning off power or shutting down with swap on. I could work around it 100% by running swapoff before power down. I never enabled suspend. Now everything works. It suspends and resumes at will with swap running. Shutdown or mash the power button, partition table is fine..." http://dev.laptop.org/ticket/6532#comment:63 _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
i had the same problem here, with sandisk 8gb sdhc class 4 using ext2 partition. partition table was completley deleted :( hope this gets fixed soon. -- View this message in context: http://n2.nabble.com/strange-problem-with-Intenso-4GB-SDHC-card-tp579169p584908.html Sent from the Openmoko Community mailing list archive at Nabble.com. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | Andrew Burgess said (on the OLPC bug tracker): | | "...For me the SD card corruption is 100% fixed now. I run a swap to the | first sd card partition and I could guarantee partition wipe by turning | off power or shutting down with swap on. I could work around it 100% by | running swapoff before power down. I never enabled suspend. Now | everything works. It suspends and resumes at will with swap running. | Shutdown or mash the power button, partition table is fine..." | | http://dev.laptop.org/ticket/6532#comment:63 What is the swap situation on the image you are running? I noticed Debian was doing something about it on initscripts, but I didn't notice before that we run swap on ASU? If as I believe this is very sensitive to a race, then not syncing swap can change the behaviour miles away from the swap itself and change the symptom, as could a bunch of other stuff. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiMJGQACgkQOjLpvpq7dMoBGQCfTZGoHzScbI+RtGdmKEtLlN0X 4fgAnjeHUc2iowstBWGgXmeVjd9kEr0k =uvX+ -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Hi Stefan,
maybe it turn out to be the same problem that I also have.
After the the card settles (see above in this thread), i can do the
following:
1) fdisk -l /dev/mmcblk0
Result: error
2) fdisk -l /dev/mmcblk0
Result: empty partition table!!!
3) fdisk -l /dev/mmcblk0
Result: correct full partition table!!!
[Note: nothing else changed in between]
To re-ead the partition table just start "fdisk /dev/mmcblk0", verify
that partition table is ok with "p", re-write unaltered partition table
with "w". After that it's re-read by the kernel and the Frerunner
recognises it.
Cheers,
David
_______________________________________________
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community
... No matter how often I call this, it never changes as the MBR is That works great after I dd the mbr back, thanks for the pointer! Stefan _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Does anyone know an equivalent tool for Linux? fdisk just says "unable to open". So, I can't even re-write the data to the card. It's just dead! -Steven _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Nevermind. Gparted could read it. -Steven _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | Hi Andy, | | thanks for the quick reply. | | Andy Green schrieb: |> | mmc0: new high speed SDHC card at address b368 |> | mmcblk0: mmc0:b368 SD 3931136KiB |> | mmcblk0:<6>glamo-mci glamo-mci.0: Error after cmd: 0x8310 |> | mmcblk0: error -110 sending read/write command |> |> This first real error is ETIMEDOUT. It can be a genuine timeout because |> your card is a bit slow, but it could also mean communication problems. | Not really good timing would probably be a good guess since "Intenso" is | a low cost brand around here (germany). I found that the timeout code doesn't take care if the MMC stack asks it for a bigger timeout than it can handle in Glamo hardware, also another unlikely issue to do with giving enough clocks after powerup I will make patches for tonight. |> | 2) I can trick it to work doing the following steps: |> | - re-set it to this (or possibly any other) value |> | >root@om-gta02:~# echo 0 > |> /sys/module/glamo_mci/parameters/sd_drive | Thanks for your guidance. It's _not_ the sd_drive parameter. | | It's actually just waiting for about 1-2min. After that the device is | readily accessible. (Need to force a re-read of the partition table to | mount though.) | | Anyways I made sure that the data I wrote to the card yesterday is still | OK (md5sum) - although it's only around 1MB. I can do more tests tomorrow... Thanks, these kind of issues are obviously pretty interesting right now. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiHqSsACgkQOjLpvpq7dMrJTQCfTC2yr+WJz/kBg/KWkRuQ5zw2 6N0AniKXzaA4tbhsBoe/Zy0gqRKXbc7n =JLG7 -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
i got a 4 gig card too (can't say if from intenso, have to check the wrapping). my card's boot sector is not erased but -- after a resume the card is mounted wrongly! fstab says as mountpoint /media/card and after booting that's where the card is. after suspend/resume the card (often) is mounted to /media/mmcblk1p1 instead -- thus every attempt to read from or write to the sd card goes to the built-in memory instead. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | i got a 4 gig card too (can't say if from intenso, have to check the | wrapping). | my card's boot sector is not erased but -- after a resume the card is | mounted wrongly! | | fstab says as mountpoint /media/card and after booting that's where the | card is. | after suspend/resume the card (often) is mounted to /media/mmcblk1p1 | instead -- thus every attempt to read from or write to the sd card goes to | the built-in memory instead. This is definitely the signature of the old MMC resume problems, not anything else. I am deep down that mine at the minute, this is getting intensely looked at. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiJn8YACgkQOjLpvpq7dMonOACfTUfolw3wMrey+x51IwF/zl4W MxsAnjIkOZu9PLLGPZEX6L4l7a1g6x2d =vnNT -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
oh, ok. thought it to be related to the wiped out boot sector and thus is there a workaround for the time being? _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: |> This is definitely the signature of the old MMC resume problems, not |> anything else. | | oh, ok. thought it to be related to the wiped out boot sector and thus | maybe helpful. It can be related, I expect the trashed block 0 thing is also suspend-related only. |> I am deep down that mine at the minute, this is getting intensely looked |> at. | | is there a workaround for the time being? Not that I know of. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkiJrMsACgkQOjLpvpq7dMot8wCfXb8MuDMypvOB9u/wFNxtjkp6 n/gAn3HE4ZZ7BhlTbgeK5krC1ppHBQaM =Pyiu -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
btw: just checked and the default _device_ ist mmcblk0p1, after resume it disappears and instead there is mmcblk1p1. dunno if it is news to you ... _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
well, the card is a kingston 4gb class 4. yesterday i downloaded a map with tangogps and meanwhile the fr went to suspend. afterwards a few directories were misisng. trying ls /media/card/Maps/om/adirectorynotthere/ gives errors (something w/ read access i think) so i decided to delete the entire /media/card/Maps folder from within the fr -- but that fails with "read-only filesystem". and indedd magically the card is mounted read only. mount /media/card -o remount,rw changes that, but next thing i know is "read only filesystem". not sure if that fits into the original problem with the fried boot sector. is there a trac ticket where i may add those informations? _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Perhaps this will help until Andy gets the new kernel ready. I had vanishing partition table problems on my OLPC XO (fixed now in kernel 2.6.25). I found that I could avoid data loss by doing two things: 1) create a swap partition as the first partition. 2) when the table vanishes, recreate it EXACTLY as before (IOW same size swap partiton followed by same size (Or remainder of card) filesystem partition. For me, the filesystem data reappeared undamaged. This worked I think because the bug mashes sever sectors at the beginning of the SD card, and by letting the swap partition take the hit your data survives. HTH ext2/3 will remount RO if there are filesystem errors, which there probably, IME several of the first sectors are involved _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
I can confirm seeing the same behaviour with a Sandisk 8GB card... AVee -- I haven't lost my mind -- it's backed up on tape somewhere. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Me too (also Sandisk 8GB). It bit me while running Qtopia from SD card: Upon resuming the system crashed, probably because the rootfs was gone... I could reproduce this after having booted from flash. dmesg log of the suspend/resume cycle is attached. Regards, Thomas
Any fix for this problem? I've made many test with a 4G SD. If use it for boot qtopia from vfat and ext2 partions it work fine. Slowly but fine. If I start system from 2007.2 in flash the mounted SD partitions give me many errors. Sometime was mounted 1st partion, sometime all of than. I've tried to access it in reading and writing and frequently I get I/O errors in dmesg. On time was loose partition table (in suspend/resume session). I've tried many partitions schemas (mixed FAT and ext2) but always I/O errors come. I've reformatted in a unique partition with FAT and I've filled in it about 1GB of maps for tangogps from my PC. The partition was mounted in /media/card and when tangogps start to read from SD some file was read good and many other not. The screen become full noised when SD is reading and get I/O errors. After the read go to end the screen come back showing fine. The 4GB SD with my PC work fine, no I/O errors. The 512M SD in FR package always work very well. The 4GB SD is a Apacer Micro SDHC 4GB Class 6. -- The sooner our happiness together begins, the longer it will last. -- Miramanee, "The Paradise Syndrome", stardate 4842.6 _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
yes i am loosing my partition table every couple of day. very annoying! i cant trust anymore my data beeing secure on the openmoko. i have a sandisk 8gb class4 sdhc card with one ext3 partition. i hope there will be a fix soon, but i'm loosing hope :( -- View this message in context: http://n2.nabble.com/strange-problem-with-Intenso-4GB-SDHC-card-tp579169p680129.html Sent from the Openmoko Community mailing list archive at Nabble.com. _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | yes i am loosing my partition table every couple of day. very annoying! i | cant trust anymore my data beeing secure on the openmoko. | | i have a sandisk 8gb class4 sdhc card with one ext3 partition. | | i hope there will be a fix soon, but i'm loosing hope :( Hey don't lose hope. There are two issues. First is just some big cards are too slow to respond at default 16MHz clock with Glamo 16-bit clock count timeout counter. See this https://docs.openmoko.org/trac/ticket/1743 Suspend / resume (partition overwrite is only a suspend / resume issue) has been fundamentally broken on GTA02 since before I got here last December, it didn't work at all until a series of deathmatches with it. ~ The biggest deathmatch of all to clean and fix it is going on at the minute on 2.6.26 branch here and it exposed the biggest underlying problem for us which is Glamo behaviours. Assuming I kill it before it kills me, we will have a far less racy and more complete suspend and resume ordering situation then. Other projects using Linux also have that problem of partition overwrite on resume, but I suspect resume ordering and racing is behind their problems too. When we clear that in the 2.6.26 branch we stand a chance to synthesize random or moving delays in resume action and try to flush out where it comes from. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkicG6cACgkQOjLpvpq7dMoYOACdHxsTW9deFxs0p6xlP99mbPbk 788An3r++lSrpASbliku2hSP5nROaKbz =Hgum -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Andy Green wrote: i played with it a bit and came to the conclusion that it eats exactly 1024byte from the beginning of the 'physical' blockdevice. atleast when i backup these to nand, write them back via dd after loosing it and do a ioctl via fdisk /dev/mmcblk0 -> press w to trigger the block layer rereading the device i am fine. sounds weird.. is a buffer getting nulled on suspend, and gets written back to disk even if it shouldnt? -- Joachim Steiger Openmoko Central Services _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Somebody in the thread at some point said: | i played with it a bit and came to the conclusion that it eats exactly | 1024byte from the beginning of the 'physical' blockdevice. atleast when | i backup these to nand, write them back via dd after loosing it and do a | ioctl via fdisk /dev/mmcblk0 -> press w to trigger the block layer | rereading the device i am fine. | | sounds weird.. is a buffer getting nulled on suspend, and gets written | back to disk even if it shouldnt? Huh. Well SD is predicated around 512 byte blocks, so it is a 2-block transaction. When I dump what the driver sees for requests, they are usually 2 or 8 512 byte blocks (1KBytes or 4KBytes). So that part isn't very foreign to the kind of transactions that are seen. In 2.6.24 the PMU is taken down real early in suspend, it yanks SD Card power (and CPU core power LOL) long before the MCI / MMC / SD driver and stack try to deselect the card nicely. One of the changes in andy-2.6.26 is to make Glamo (and other things) a child of the PMU, so the ordering is all changed around and SD Card can complete suspend sanely with the card still powered. The PMU goes down towards the end of all the suspend actions too which is much better considering CPU core power. The main striking thing about the SD Card overwrite issue for me is that we got rid of it for a long time just by removing the synchronous low level debug config... we could literally make this overwrite issue come and go by basically changing timing of suspend and resume actions alone. ~ So I believe that we deal with races at the heart of all this. - -Andy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkigoaYACgkQOjLpvpq7dMqk/ACfXI+8eLkpvn9/QBd3i9COMIMf UH0An0Q0vm3BL+kN2CH4/7kvjK3tyyzU =UJPs -----END PGP SIGNATURE----- _______________________________________________ Openmoko community mailing ...
