Re: do_remount_sb(RDONLY) race? (was: XFS oops under 2.6.23.9)

Previous thread: Re: W1: w1_slave units, standardize 1C or .001C? Break API by David Fries on Wednesday, January 23, 2008 - 12:00 am. (1 message)

Next thread: [PATCH 0/3] percpu: Optimize percpu accesses by travis on Wednesday, January 23, 2008 - 12:49 am. (9 messages)
To: <linux-kernel@...>
Cc: Jonathan Woithe <jwoithe@...>
Date: Wednesday, January 23, 2008 - 12:30 am

Last night my laptop suffered an oops during closedown. The full oops
reports can be downloaded from

http://www.atrad.com.au/~jwoithe/xfs_oops/

as photos of the screen. Since the laptop was unusable at this point I
wasn't able to cut and paste the details, and they weren't in the logs when
the machine was rebooted.

The initial complaint claims to be an "invalid opcode". Is this possibly a
memory fault developing or does it ring any bells for anyone? memtest86
finds no fault with the memory.

Kernel version was kernel.org 2.6.23.9 compiled as a low latency desktop.
The RT patches were not applied.

Regards
jonathan
--

To: Jonathan Woithe <jwoithe@...>
Cc: <linux-kernel@...>
Date: Wednesday, January 23, 2008 - 1:34 am

Assertion failed: atomic_read(&mp->m_active_trans) == 0, file:
fs/xfs/xfs_vfsops.c, line 689.

The remount read-only of the root drive supposedly completed
while there was still active modification of the filesystem

The patch in 2.6.23 that introduced this check was:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi...

Basically, the remount-readonly path was not flushing things
properly, so we changed it to flushing things properly and ensure we
got bug reports if it wasn't. Yours is the second report of not
shutting down correctly since this change went in (we've seen it
once in ~8 months in a QA environment).

I've had suspicions of a race in the remount-ro code in
do_remount_sb() w.r.t to the fs_may_remount_ro() check. That is, we
do an unlocked check to see if we can remount readonly and then fail
to check again once we've locked the superblock out and start the
remount.

The read only flag only gets set *after* we've made the filesystem
readonly, which means before we are truly read only, we can race
with other threads opening files read/write or filesystem
modifcations can take place.

The result of that race (if it is really unsafe) will be assert you
see. The patch I wrote a couple of months ago to fix the problem
is attached below....

Cheers,

Dave.

---

Set the MS_RDONLY before we check to see if we can remount
read only so that we close a race between checking remount
is ok and setting the superblock flag that allows other
processes to start modifying the filesystem while it is
being remounted.

Signed-off-by: Dave Chinner <dgc@sgi.com>
---
fs/xfs/linux-2.6/xfs_super.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2008-01-22 14:57:07.753782292 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs...

To: David Chinner <dgc@...>
Cc: Jonathan Woithe <jwoithe@...>, <linux-kernel@...>
Date: Wednesday, January 23, 2008 - 8:55 am

This code is inherently racy, and will be replaced with a proper
per-vfsmount writer count in 2.6.25 as part of the per-mount r/o patches.

--

To: David Chinner <dgc@...>
Cc: Jonathan Woithe <jwoithe@...>, <linux-kernel@...>
Date: Wednesday, January 23, 2008 - 1:54 am

Thanks for the patch. I will apply it and see what happens.

Will this be in 2.6.24?

Regards
jonathan
--

To: Jonathan Woithe <jwoithe@...>
Cc: David Chinner <dgc@...>, <linux-kernel@...>
Date: Wednesday, January 23, 2008 - 2:11 am

No - because hitting the problem is so rare that I'm not even
sure it's a problem. One of the VFS gurus will need to comment
on whether this really is a problem, and if so the correct fix
is to do_remount_sb() so that it closes the hole for everyone.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
--

Previous thread: Re: W1: w1_slave units, standardize 1C or .001C? Break API by David Fries on Wednesday, January 23, 2008 - 12:00 am. (1 message)

Next thread: [PATCH 0/3] percpu: Optimize percpu accesses by travis on Wednesday, January 23, 2008 - 12:49 am. (9 messages)