Hi, my office mate has a 1GB USB stick with a currupted vfat fs. (I don't really know when and how it got currupted. I just mounted it, copied a file onto it, unmounted, unplugged. After that corruption showed up when accessing the newly copied file. I'm running 2.6.33.1.) Mounting still worked but when accessing the new file the kernel log was filled up with Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1) Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: File system has been set read-only Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1) Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1) Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568) ... Apr 22 16:30:18 zzz kernel: FAT: Filesystem error lidAT: Filesysalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fiesysalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fesystalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystem alid cluster chain (i_pos 3AT: Fiesyalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos AT: Fiesystalid cluster chain (i_pos 34568) Apr 22 16:30:18 zzz kernel: FAT: Filesystealid cluster chain (i_pos 3AT: Fiesystalid cluster chain (i_pos 34568) ... (~10000 lines) It seems that fat_fs_error() generates corrupted output (on an Athlon 4850e dual core), and the excessive amounts of output are IMHO not useful. BTW: dosfsck refused to fix it. BTW2: when my office mate plugged it into her MacBook it caused MacOS to crash ;-) Johannes --
It seems, userland or readahead or read directory entires didn't stop with EIO (dir is intended though, to salvage as many files as possible). I'll think about using the ratelimit for fs corruption report. I have no idea about message corruption, vfat just call vprintf() for it. I'll see current vprintf() locking stuff. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --
I think multiple printk per line is prone to corruption on SMP, see the comment about KERN_CONT in kernel.h. But maybe it only happens for /var/log/kern.log and my xconsole when generating too much output too quickly, "dmesg -s 10000000 | less" did not show the corruption (but only shows ~2700 lines out of the ~10000). But I think fat should use vprintf() to a buffer and then one printk() instead of multiple printk + vprintk. In xconsole it looks like this: Apr 27 12:06:45 zzz kernel: <systerrouster chain (i_pos 34568) Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568) Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568) Apr 27 12:06:45 zzz kernel: <3systerroruster chain (i_pos 34568) Apr 27 12:06:45 zzz kernel: uster chain (i_pos 34568) Thanks Johannes --
I think KERN_CONT issue doesn't explain this corruption (i.e. preempted at middle of vprintk()), so even if this is one vprintk(), this will not be fixed. One of possibility is buffer overflow of printk, so truncated the message, but of course, I'm not sure at least for now, and I'm not checking current printk stuff yet. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --
Um, please check syslog at same time with dmesg. If there is corruption in syslog, but there is not in dmesg. It sounds like userland problem. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --
I guess klogd cannot handle that much output. If you'd add some rate limiting that might fix the log corruption, too. Thanks Johannes --
I guess it meant the corruption is in only syslog. Ok, it explains this problem. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --
