Hi Ted, I saw your patch to store fs error information in the superblock. I think it is a very useful feature and I have implemented something similar in next3_snapshot_journal_error.patch and e2fs_next3_message_buffer.patch (attached). There is one big problem I encountered with this feature: If the file system error behavior is set to "abort" or "remount-ro", the journal recovery on the next mount will most likely write over the superblock with the errors information. To solve this problem I stored the errors message buffer in the journal superblock and copied the message buffer to the filesystem superblock on journal recovery (both on mount and fsck). fsck also displays the errors buffer and clears it. This feature helped me hunt down some rare bugs that happened on beta sites, which I had to analyse post-mortem. fsck simply gives me the first few error messages after the last time fsck was run. Amir. --
True, thanks for pointing that out; the simplest way to solve this for my purposes is to snapshot those superblock fields and restore them That's an interesting approach, although as you point out it only works on file systems with a 4k block size. Your design seems to be focused on recording only the most recent logs, which makes sense in a debugging environment. My assumption was that the most recent problems would probably be recorded in /var/log/messages, although if the problem occurred on a single-disk system, that assumption probably wouldn't hold true. I wonder if the a better solution for this particular use case is much larger ring buffer, and a hook into the printk system which is guaranteed to record *everything*, even after a panic or after the journal has been aborted and the file system has been remounted read-only. For the patch I wrote, my intention was as a supplement to /var/log/messages --- where s_first_error_time might be from long after /var/log/messages had rolled over. So I was trying to solve a somewhat different problem. (Hmm, actually, it would probably be good to save both details about the first as well as the most recent error.) - Ted --
I guess that should work. I wonder why the ERROR_FS flag is not snapshotted on mount sounds like a good feature which would be hard to implement... BTW, I think that if the file system error behavior is set to "remount-ro" a file system with ERROR_FS, should be remounted read-only on mount time. this is the only way to prevent a file system from getting over corrupted and I don't see why there is no way to enforce this with existing error behavior options. One thing that is missing from the error info is its severity level. If I would have to save just one error info, it would be the first error after fsck (i.e. transition from healthy to sick file system), but I would override it if a message of higher severity occurs. Amir. --
