Andrew Morton [interview [0]] posted on the lkml [1], "In 2.4.20-pre5 an optimisation was made to the ext3 fsync function which can very easily cause file data corruption at unmount time". This bug only affects people using ext3 in the uncommon "data=journal" mode, or files operating under "chattr -j", and does not affect the 2.5 series of kernels.
Andrew went on to say that "The symptoms are that any file data which was written within the thirty seconds prior to the unmount may not make it to disk. A workaround is to run `sync' before unmounting". He also posted a patch to fix the problem. However, soon thereafter, he posted saying that "that 'fix' didn't fix it. Sorry about that". Until a proper fix can be developed, he recommends that people "please avoid ext3/data=journal". Since "data=journal" is not the default ext3 mode, it is unlikely most people running ext3 will be affected by this. However, it is a data corruption bug so you should double-check that you use either "data=ordered" or "data=writeback" as your ext3 mode of operation.
From: Andrew Morton
To: linux-kernel Mailing List
Subject: data corrupting bug in 2.4.20 ext3, data=journal
Date: Sun Dec 01 2002 - 03:11:41 EST
In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
which can very easily cause file data corruption at unmount time. This
was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
released, and three months after the bug was merged. Unfortunate timing)
This only affects filesystems which were mounted with the `data=journal'
option. Or files which are operating under `chattr -j'. So most people
are unaffected. The problem is not present in 2.5 kernels.
The symptoms are that any file data which was written within the thirty
seconds prior to the unmount may not make it to disk. A workaround
is to run `sync' before nmounting.
The optimisation was intended to avoid writing out and waiting on the
inode's buffers when the subsequent commit would do that anyway. This
optimisation was applied to both data=journal and data=ordered modes.
But it is only valid for data=ordered mode.
In data=journal mode the data is left dirty in memory and the unmount
will silently discard it.
The fix is to only apply the optimisation to inodes
which are operating under data=ordered.
--- linux-akpm/fs/ext3/fsync.c~ext3-fsync-fix Sat Nov 30 23:37:33 2002
+++ linux-akpm-akpm/fs/ext3/fsync.c Sat Nov 30 23:39:30 2002
@@ -63,10 +63,12 @@ int ext3_sync_file(struct file * file, s
*/
ret = fsync_inode_buffers(inode);
- /* In writeback mode, we need to force out data buffers too. In
- * the other modes, ext3_force_commit takes care of forcing out
- * just the right data blocks. */
- if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
+ /*
+ * If the inode is under ordered-data writeback it is not necessary to
+ * sync its data buffers here - commit will do that, with potentially
+ * better IO merging
+ */
+ if (!ext3_should_order_data(inode))
ret |= fsync_inode_data_buffers(inode);
ext3_force_commit(inode->i_sb);
_From: Andrew Morton
To: linux-kernel Mailing List
Subject: Re: data corrupting bug in 2.4.20 ext3, data=journal
Date: Sun Dec 01 2002 - 03:52:23 EST
Andrew Morton wrote:> > ... > The fix is to only apply the optimisation to inodes which are operating > under data=ordered. > That "fix" didn't fix it. Sorry about that. Please avoid ext3/data=journal until it is sorted out.
- Google Archive Of Above Thread [2]
- Interview With Andrew Morton [2]