Re: [PATCH] ext4_freeze: don't return to userspace with a mutex held

Previous thread: Re: [PATCH (RESEND)] don't scan/accumulate more pages than mballoc will allocate by tytso on Monday, April 5, 2010 - 6:11 am. (5 messages)

Next thread: Re: [PATCH] e2fsprogs: Fix the overflow in e4defrag with 2GB over file by Akira Fujita on Thursday, April 1, 2010 - 1:27 am. (5 messages)
From: Eric Sandeen
Date: Monday, March 29, 2010 - 3:34 pm

This is for RH bug 568503 - 
snapshot bug: lock held when returning to user space

ext4_freeze() does jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then we return to userspace.  The
kernel does not like this:

================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0

I don't -think- we need to do this; by now we should have s_frozen
set, and nobody else should be coming down the pipe to get to
the journal.  However, just to be on the safe side, I added
a couple of vfs_check_frozen() calls in ext4 functions which will
arrive at start_this_handle(), which should ensure that we never
get any journal traffic generated while frozen.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

(ext3 will need similar changes if this patch passes
muster on review

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e14d22c..00d09f5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -241,6 +241,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);
 
+	vfs_check_frozen(sb, SB_FREEZE_WRITE);
 	/* Special case here: if the journal has aborted behind our
 	 * backs (eg. EIO in the commit thread), then we still need to
 	 * take the FS itself readonly cleanly. */
@@ -3485,8 +3486,10 @@ int ext4_force_commit(struct super_block *sb)
 		return 0;
 
 	journal = EXT4_SB(sb)->s_journal;
-	if (journal)
+	if (journal) {
+		vfs_check_frozen(sb, SB_FREEZE_WRITE);
 		ret = ext4_journal_force_commit(journal);
+	}
 
 	return ret;
 }
@@ -3535,18 +3538,16 @@ static int ext4_freeze(struct super_block *sb)
 	 * the journal.
 	 */
 	error = jbd2_journal_flush(journal);
-	if (error < 0) ...
From: tytso
Date: Sunday, April 4, 2010 - 1:52 pm

Um, I think the addition of vfs_check_frozen(), esp. to
ext4_journal_start_sb() is absolutely necessary.  What else do we have
to prevent filesystem modifications from going to the file systme
layer?  I didn't see anything in the VFS layer that checks s_frozen;
am I missing something?

					- Ted
--

From: tytso
Date: Sunday, April 4, 2010 - 2:03 pm

Added to the ext4 patch queue, with the following patch comment:

ext4: don't return to userspace after freezing the fs with a mutex held

ext4_freeze() used jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then returns to userspace.  The
kernel does not like this:

================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0

Use vfs_check_frozen() added to ext4_journal_start_sb() and
ext4_force_commit() instead.

Addresses-Red-Hat-Bugzilla: #568503

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


					- Ted
--

From: Eric Sandeen
Date: Sunday, April 4, 2010 - 2:23 pm

Well, there is __generic_file_aio_write doing vfs_check_frozen, but
I thought there was more at the vfs level to stop things from getting
to the filesystem... *shrug* I see you put the patch in as sent,
it sounds right to me.

Thanks,
-Eric
--

From: Theodore Tso
Date: Sunday, April 4, 2010 - 2:56 pm

Yeah, it looks safe but it would be nice to do some testing to make sure there's nothing we missed.

I'm also vaguely worried whether the right thing will happen in no-journal mode, but it looks like it's better than what we had before (which clearly would have never worked in no journal mode), so I decided to let it pass.

Testing the freeze functionality definitely needs to be done before we trust what happens in no journal mode, for sure.

-- Ted

--

From: Eric Sandeen
Date: Sunday, April 4, 2010 - 5:43 pm

I did go back up the callchain from where the old lock should have
caught it, so I think I've plugged the same holes that were protected

Oh good point, hadn't even thought about nojournal mode & freeze.
Always nice to "accidentally" fix things ;)


--

Previous thread: Re: [PATCH (RESEND)] don't scan/accumulate more pages than mballoc will allocate by tytso on Monday, April 5, 2010 - 6:11 am. (5 messages)

Next thread: Re: [PATCH] e2fsprogs: Fix the overflow in e4defrag with 2GB over file by Akira Fujita on Thursday, April 1, 2010 - 1:27 am. (5 messages)