Hi all Since I moved my internal HD into a USB dock externally and mount the ext4 filesystem on it, I regularly get the following errors after it has been mounted for a while (see timecode). It doesn't seem to matter which recent kernel I use. [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode. Opts: (null) [1048702.736011] EXT4-fs (sde8): error count: 3 [1048702.736016] EXT4-fs (sde8): initial error at 1289053677: ext4_journal_start_sb:251 [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719 I've forced an fs check numerous times but nothing comes up. There was only one occasion where the filesystem mounted RO but that was months ago and has since been fine. This does not appear to cause any actual problems and happens it seems at _precisely_ the moment the hard drive decides to go into sleep mode as I hear it spin down. hdparm is unable to extract details from the hard drive when it is plugged into the usb dock. The options used on this are as follows: tune2fs 1.41.12 (17-May-2010) Filesystem volume name: backup Last mounted on: /media/backup Filesystem UUID: 4eff5eb7-89fb-4db1-8cfd-f09db598a00b Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: journal_data_writeback Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 22593536 Block count: 90355576 Reserved block count: 0 Free blocks: 30330219 Free inodes: 21828725 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1002 Blocks per group: 32768 Fragments per group: ...
That's actually not an error. It's a report which is generated every 24 hours, indicating that there has been 3 errors since the last time the error count has been cleared, with the first error taking place at Sat Nov 6 10:27:57 2010 (US/Eastern) in the function ext4_journal_start_sb(), at line 251, and the most recent error taking place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function ext4_put_super() at line 719. This is a new feature which was added in 2.6.36. This information (which is stored in the ext4 superblock) gets cleared by e2fsck version 1.41.13 or newer (1.41.14 is the most recently released version). If you are using an older version of e2fsck, this information won't get cleared, which is not a disaster, although it could potentially get annoying. The solution is to simply to upgrade to the newest version of e2fsprogs if you are using the 2.6.36 or newer kernel. The idea behind this feature is to make sure that if there was a file system corruption detected, it wouldn't be lost due to log files getting overflowed. It's also helpful when users report problems to distribution support lines, since it makes it more likely support personnel will be able to determine that there has been some file system corrutpion issues, perhaps caused by hardware problems, or some other cause, including kernel bugs. Hope this helps explain what you're seeing! - Ted --
Nice. But the issue you're not mentioning is: What errors could have happened on the 6th of november? Should Con worry about those errors? OK, the chances are that he has rebooted since november, and that an older fsck fixed the errors, but not cleared the "fs errror log". Would these errors have triggered a "remount-readonly" if the fs was mounted like that? I don't reboot that often: obelix:~> uptime 09:16:19 up 175 days, 19:04, 19 users, load average: 110.10, 110.61, 111.22 (and yes, the load is quite high on that machine. I won't polute this thread about that....) Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. Does it sit on the couch all day? Is it unemployed? Please be specific! Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ --
Indeed the FS was remounted as RO at the time. I had to power down the USB device and power it up again after umounting the FS and then doing an FSCK before it was usable again, and the machine had about a month of uptime when it happened. It has not caused a problem since. Thanks, Con -- -ck --
Ted, I would like to use this opportunity to remind you about my record_journal_errstr() implementation, see: https://github.com/amir73il/ext4-snapshots/blob/next3-stable/fs/next3/super.c#L157 It records the initial errors messages (which I found to be the most interesting), in a message buffer on the unused space after the journal super block (3K on a 4K block fs). fsck prints out those messages and clears the buffer. In under a year of Next3 fs in beta and production, it has helped me many times to analyse bugs post-mortem and find the problem. If there is demand, I can post the patch for Ext4. --
I do think that this sounds like a useful addition - should be very useful in doing post mortem analysis... Thanks! Ric --
It looks like those errors were probably caused by I/O errors while
writing to the journal. Maybe due to a flakey USB connection,
perhaps? It would be useful to look at the /var/log/messagfes around
Yes, errors that get recorded into the superblock are also errors that
would cause a read-only remount if the file system is set that way.
- Ted
--
This is going to be a faq... I suppose the datetime is encoded (what format is that?) in that long number after "at". May I suggest the datetime gets decoded in the printing? Also may I suggest that the error happens immediately after mount and not after 300 seconds from mount? I just subscribed to this list exactly to report the same kind of error. Last week I was doing reliability tests for open-iscsi and this error drove me crazy. I spent days in tests where I thought I could reproduce an error in open-iscsi by disconnecting and reconnecting the network; I even reported this to the open-iscsi mailing list, but in fact it was an old error of the filesystem and it was not getting cleared by my older e2fsck 1.41.11 . If that happened immediately after mount or if it spitted the datetime in human readable format, I would have immediately guessed it was due to an existing "filesystem problem" (even though I was running fsck.ext4 -f prior to each mount) but instead I thought it was due to my 5 minutes of "networks disconnection tests" I was doing after each mount. DOH! Anyway thanks for your work: excellent filesystem. PS: I have a question for you regarding ext4 behaviour with SCSI commands resubmissions... ok I am opening another thread for that. --
Number of seconds since January 1st, 1970 midnight GMT. (Standard
Unix-time format). It's hard for the kernel to decode it since it
doesn't know what time-zone you are in, or what your local legislature
has done vis-a-vis arbitrary and cabritious adjustments to the
start/stop date of daylight savings time, etc.
It's easy enough to decode it using perl:
Hmmm.... maybe. The idea behind the 300 second delay was for the root
file system, to let e2fsck have a chance to fix the file system before
reporting the errors at mount time.
- Ted
--
The "date" program can decode these values too:
date -d @1289053677
--
Ben Pfaff
http://benpfaff.org
--
I suggest to decode it in GMT then. Like this it's not possible to guess it's a "time" value, and hence that it's an *old* error. I thought it was a pointer to something, and so I thought the error was happening now, or more likely (since it was triggered after 300 seconds) that it was some inconsistency in the filesystem being detected now, in some kind of lazy check. --
