just a simple test I did...
xfs_freeze -f /mnt/newtest
cp /etc/fstab /mnt/newtest
xfs_freeze -u /mnt/newtest2007-01-04 01:44:30.341979500 <4>BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock()
2007-01-04 01:44:30.385771500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
2007-01-04 01:44:30.385774500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
2007-01-04 01:44:30.385775500 <4> [<c0103dcb>] show_trace+0x12/0x14
2007-01-04 01:44:30.385777500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
2007-01-04 01:44:30.385778500 <4> [<c013a3af>] debug_mutex_unlock+0x69/0x120
2007-01-04 01:44:30.385779500 <4> [<c04b4aac>] __mutex_unlock_slowpath+0x44/0xf0
2007-01-04 01:44:30.385780500 <4> [<c04b4887>] mutex_unlock+0x8/0xa
2007-01-04 01:44:30.385782500 <4> [<c018d0ba>] thaw_bdev+0x57/0x6e
2007-01-04 01:44:30.385791500 <4> [<c026a6cf>] xfs_ioctl+0x7ce/0x7d3
2007-01-04 01:44:30.385793500 <4> [<c0269158>] xfs_file_ioctl+0x33/0x54
2007-01-04 01:44:30.385794500 <4> [<c01793f2>] do_ioctl+0x76/0x85
2007-01-04 01:44:30.385795500 <4> [<c0179570>] vfs_ioctl+0x59/0x1aa
2007-01-04 01:44:30.385796500 <4> [<c0179728>] sys_ioctl+0x67/0x77
2007-01-04 01:44:30.385797500 <4> [<c0102e73>] syscall_call+0x7/0xb
2007-01-04 01:44:30.385799500 <4> [<001be410>] 0x1be410
2007-01-04 01:44:30.385800500 <4> =======================fstab was there just fine after -u.
Linux 2.6.19.1 SMP on Pentium D.
--
-
Oh, that still hasn't been fixed? Generic bug, not XFS - the global
semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and
mutexes complain loudly when a the process unlocking the mutex is
not the process that locked it.Basically, the generic code is broken - the bd_mount_mutex needs to
be reverted back to a semaphore because it is locked and unlocked
by different processes. The following patch does this....BTW, Sami, can you cc xfs@oss.sgi.com on XFS bug reports in future;
you'll get more XFS savvy eyes there.....Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group---
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.Signed-off-by: Dave Chinner <dgc@sgi.com>
---
fs/block_dev.c | 2 +-
fs/buffer.c | 6 +++---
fs/gfs2/ops_fstype.c | 4 ++--
fs/super.c | 4 ++--
include/linux/fs.h | 2 +-
5 files changed, 9 insertions(+), 9 deletions(-)Index: 2.6.x-xfs-new/fs/block_dev.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/block_dev.c 2006-12-22 10:53:20.000000000 +1100
+++ 2.6.x-xfs-new/fs/block_dev.c 2007-01-08 08:26:15.843378600 +1100
@@ -263,7 +263,7 @@ static void init_once(void * foo, kmem_c
{
memset(bdev, 0, sizeof(*bdev));
mutex_init(&bdev->bd_mutex);
- mutex_init(&bdev->bd_mount_mutex);
+ sema_init(&bdev->bd_mount_sem, 1);
INIT_LIST_HEAD(&bdev->bd_inodes);
INIT_LIST_HEAD(&bdev->bd_list);
#ifdef CONFIG_SYSFS
Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c 2006-12-12 12:04:51.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c 2007-01-08 08:28:40.832542651 +1100
@@ -179,7 +179,7 @@ int fsync_bdev(struct block_device *bdev
* freeze_bdev -- lock a filesystem and force it...
On Mon, 8 Jan 2007 08:37:34 +1100
Sad. The alternative would be to implement
mutex_unlock_dont_warn_if_a_different_task_did_it(). Ingo? Possible?-
i'd like to avoid it as much as i'd like to avoid having to add
spin_unlock_dont_warn_if_a_different_task_did_it(). Unlocking by a
different task is usually a sign of messy locking and bugs lurking. Is
it really true that XFS's use of bd_mount_mutex is safe and justified?Ingo
-
On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
Forgot to.
Thanks for patch. It fixed the issue, no more warnings.
BTW. the fix is not in 2.6.git, either.
--
Do what you love because life is too short for anything else.-
Hm, it was proposed upstream a while ago:
http://lkml.org/lkml/2006/9/27/137
I guess it got lost?
-
Seems like it. Andrew, did this ever get queued for merge?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
On Tue, 9 Jan 2007 10:47:28 +1100
Seems not. I think people were hoping that various nasties in there
would go away. We return to userspace with a kernel lock held??
-
Well, there might be nicer solutions, but for now we should revert the
broken commit to change the lock type.-
Is a semaphore any worse than the current mutex in this respect? At
least unlocking from another thread doesn't violate semaphore rules. :)-Eric
-
On Mon, 08 Jan 2007 21:12:40 -0600
I assume that if we weren't returning to userspace with a lock held, this
mutex problem would simply go away.
-
Well nobody's asserting that the filesystem must always be locked &
unlocked by the same thread, are they? That'd be a strange rule to
enforce upon the userspace doing the filesystem management wouldn't it?
Or am I thinking about this wrong...-Eric
-
On Mon, 08 Jan 2007 21:38:05 -0600
I don't even know what code we're talking about here...
I'm under the impression that XFS will return to userspace with a
filesystem lock held, under the expectation (ie: requirement) that
userspace will later come in and release that lock.If that's not true, then what _is_ happening in there?
If that _is_ true then, well, that sucks a bit.
-
It's not a filesystem lock. It's a per-blockdevice lock that prevents
multiple people from freezing the filesystem at the same time, aswell
as providing exclusion between a frozen filesystem an mount-related
activity. It's a traditional text-box example for a semaphore.
-
This can be done without needing to hold a semaphore across the
freeze/thaw.In the XFS case, we never try to lock the semaphore a second
time - the freeze code checks if the filesystem is not already
(being) frozen before calling freeze_bdev(). On thaw it also
checks that the filesystem is frozen before calling thaw_bdev().IOWs, you can safely do:
# xfs_freeze -f /dev/sda1; xfs_freeze -f /dev/sda1; xfs_freeze -f /dev/sda1;
# xfs_freeze -u /dev/sda1; xfs_freeze -u /dev/sda1; xfs_freeze -u /dev/sda1;And the filesystem will only be frozen once and thawed once. The second
and subsequent incantations of the freeze/thaw are effectively ignored
and don't block.IMO, if we need to prevent certain operations from occurring when the
filesystem is frozen, those operations need to explicitly check the
frozen state and block i.e. do something like:wait_event(sb->s_wait_unfrozen, (sb->s_frozen < SB_FREEZE_WRITE));
If you need to prevent unmounts from occurring while snapshotting a
frozen filesystem, then the snapshot code needs to take the s_umount
semaphore while the snapshot is in progress. We should not be making
frozen filesystems unmountable....Thoughts?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
Its not really XFS, its more the generic device freezing code
(freeze_bdev) which is called by both XFS and the device mapper
suspend interface (both of which are exposed to userspace via
ioctls). These interfaces are used when doing block levelThis particular case was a device mapper stack trace, hence the
confusion, I think. Both XFS and DM are making the same genericIndeed, its a fairly ordinary interface, but thats too late to go
fix now I guess (since its exposed to userspace already). A remount
flag along the lines of readonly may have been a better way to go...
perhaps. *shrug*... not clear - I guess the problem the original
authors had there (whoever they were, I dunno), was that the block
layer wants to call up to the filesystem to quiesce itself, and at
some later user-defined point to unquiesce itself... which is a bit
is passing -EAGAIN back without wrapping it up in ERR_PTR(), which
it needs to since freeze_bdev returns a struct super_block pointer.cheers.
--
Nathan-
Yup. it's the freeze_bdev/thaw_bdev use of the bd_mount_mutex()
that's the problem. I fail to see _why_ we need to hold a lock
across the freeze/thaw - the only reason i can think of is to
hold out new calls to sget() (via get_sb_bdev()) while the
filesystem is frozen though I'm not sure why you'd need to
do that. Can someone explain why we are holding the lock from
freeze to thaw?FWIW, the comment in get_sb_bdev() seems to imply s_umount is supposed
to ensure that we don't get unmounted while frozen.Indeed, in the comment above freeze_bdev:
* If a superblock is found on this device, we take the s_umount semaphore
* on it to make sure nobody unmounts until the snapshot creation is done.implies this as well, but freeze_bdev does not take the s_umount
semaphore, nor does any filesystem that implements ->write_super_lockfs()So the code is clearly at odds with the comments here.
IMO, you should be able to unmount a frozen filesystem - behaviour
should be the same as crashing while frozen, so i think the commentsUserspace knows nothing about that lock, so we can change that without
changing the the userspace API.Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
...that would be true, AFAICS.
cheers.
--
Nathan-
| Andy Whitcroft | Re: 2.6.23-rc6-mm1 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Alan | Re: [RFC] Heads up on sys_fallocate() |
git: | |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Winkler, Tomas | RE: iwlwifi: fix build bug in "iwlwifi: fix LED stall" |
