This series implements the ability to check a superblock for writers without consulting the sb->s_files list. This is on Al and Christoph's wishlist as something that can happen now that we have r/o bind mounts merged. Trond, I'm ccing you because there is some minor NFS touching going on here. -- Dave --
In order to lock out new writers during remount operations,
we need to hold the cpu_writer->lock's. But, remounts can
sleep, so we can not use spinlocks. Make them mutexes.
We do the fiddling in mnt_drop_write() because we used to
rely on the spin_lock() disabling preemption. Since the
mutex does not do that, we have to use another method
to avoid underflow.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---
linux-2.6.git-dave/fs/namespace.c | 52 +++++++++++++++++++++-----------------
1 file changed, 29 insertions(+), 23 deletions(-)
diff -puN fs/namespace.c~make-cpu_writer-lock-a-mutex fs/namespace.c
--- linux-2.6.git/fs/namespace.c~make-cpu_writer-lock-a-mutex 2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c 2008-04-29 11:56:39.000000000 -0700
@@ -173,7 +173,7 @@ struct mnt_writer {
* If holding multiple instances of this lock, they
* must be ordered by cpu number.
*/
- spinlock_t lock;
+ struct mutex lock;
struct lock_class_key lock_class; /* compiles out with !lockdep */
unsigned long count;
struct vfsmount *mnt;
@@ -185,7 +185,7 @@ static int __init init_mnt_writers(void)
int cpu;
for_each_possible_cpu(cpu) {
struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
- spin_lock_init(&writer->lock);
+ mutex_init(&writer->lock);
lockdep_set_class(&writer->lock, &writer->lock_class);
writer->count = 0;
}
@@ -200,7 +200,7 @@ static void unlock_mnt_writers(void)
for_each_possible_cpu(cpu) {
cpu_writer = &per_cpu(mnt_writers, cpu);
- spin_unlock(&cpu_writer->lock);
+ mutex_unlock(&cpu_writer->lock);
}
}
@@ -252,8 +252,8 @@ int mnt_want_write(struct vfsmount *mnt)
int ret = 0;
struct mnt_writer *cpu_writer;
- cpu_writer = &get_cpu_var(mnt_writers);
- spin_lock(&cpu_writer->lock);
+ cpu_writer = &__get_cpu_var(mnt_writers);
+ mutex_lock(&cpu_writer->lock);
if (__mnt_is_readonly(mnt)) {
ret = -EROFS;
goto out;
@@ -261,8 +261,7 @@ int mnt_want_write(struct ...We need lock_mnt_writers() during a remount in order
to keep mnt->__mnt_writers from changing so that we
get a consistent look at if a sb currently has anyone
writing to it.
But, we need to lock writers out for an extended
period, even during the ->remount_fs() operation.
That's because we do conclusively make the fs
r/o until *after* the ->remount_fs().
lock_mnt_writers() is an awfully blunt instrument
for this since it will lock out *ALL* new writers
on the entire system. That's bad because that
remount might take a bit.
So, we introduce a new semantic. Before a new
cpu_writer is established to a mount, it must
lock_super(). This is already a slow path because
the old cpu_writer->count must be coalesced into
mnt->__mnt_writers. The fast path where (mnt ==
cpu_writer->mnt) is unaffected.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---
linux-2.6.git-dave/fs/namespace.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN fs/namespace.c~must-hold-lock_super-to-set-initial-mnt_writer fs/namespace.c
--- linux-2.6.git/fs/namespace.c~must-hold-lock_super-to-set-initial-mnt_writer 2008-04-29 11:56:41.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c 2008-04-29 11:56:41.000000000 -0700
@@ -225,8 +225,20 @@ static inline void use_cpu_writer_for_mo
{
if (cpu_writer->mnt == mnt)
return;
+ /*
+ * This is a slow path taken the first time that
+ * a particular cpu_writer is used for a given
+ * mount. It lets someone (like remount) clear
+ * all the cpu_writer->mnts by lock_mnt_writers(),
+ * take lock_super(), and guarantee that no one
+ * else can get through here and acquire a new
+ * mnt_want_write() until after they
+ * unlock_super().
+ */
+ lock_super(mnt->mnt_sb);
__clear_mnt_count(cpu_writer);
cpu_writer->mnt = mnt;
+ unlock_super(mnt->mnt_sb);
}
/*
_
--
So? Why don't we mark the fs r/o _before_ calling ->remount_fs() and if that fails, just mark it r/w again. OK, we'll deny writes in that interval, but I don't see that as a big problem. And it would simplify the implementation considerably. Miklos --
Personally, I think that's a bit messy. People might start getting -EROFS when they never, ever *HAD* a r/o FS. They may have made a request for one, but they never actually hard one. I also understand what you're saying. If we were able to loosen up some of the requirements it would certainly make the patches simpler. How does everyone else feel about this? -- Dave --
We're moving a big chunk of the burden of keeping people from writing to r/o filesystems from the superblock into the vfsmount. This requires that we keep track somehow of the mounts that might allow writes to a superblock. You could track this directly by keeping a count of such mounts in the superblock, but I think this is much simpler in practice. On remount rw->ro operations: 1. Use lock_mnt_writers() to exclude any new mnt_writers for any mount on the entire system, which also clears each cpu_writer->mnt. 2. Lock superblock to stop anyone from setting any new cpu_writer->mnt if the mount is of this sb. 3. unlock_mnt_writers() to let other fs's on the system to go about their business. 4. Consult each mount of the sb to examine its mnt->__mnt_writers. Remember, this is stable because (1) cleared out all the per-cpu writers, and in (2) the sb lock is held, keeping out any new writers. 5. release sb lock at end of remount operation (after ->remount_fs()) I'm not completely happy with the sb_locked thing in do_remount_sb(). Expanding the use of lock_super() over a larger area may make the code look simpler, but will not be as obviously correct. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> --- linux-2.6.git-dave/fs/file_table.c | 24 ----------- linux-2.6.git-dave/fs/namespace.c | 4 - linux-2.6.git-dave/fs/super.c | 65 ++++++++++++++++++++++++++----- linux-2.6.git-dave/include/linux/fs.h | 2 linux-2.6.git-dave/include/linux/mount.h | 10 +++- 5 files changed, 65 insertions(+), 40 deletions(-) diff -puN fs/file_table.c~check_mount_writers_at_superblock_remount fs/file_table.c --- linux-2.6.git/fs/file_table.c~check_mount_writers_at_superblock_remount 2008-04-29 11:56:42.000000000 -0700 +++ linux-2.6.git-dave/fs/file_table.c 2008-04-29 11:56:42.000000000 -0700 @@ -365,30 +365,6 @@ void file_kill(struct file *file) } } -int fs_may_remount_ro(struct super_block ...
The next patch will introduce a list of all mounts for a given
superblock. In order to keep the list, we need to make sure
all filesystems attaching a mount to a superblock get added
to this list.
NFS currently bypasses the simple_set_mnt() function, and sets
mnt_sb directly. This patch makes it use a helper function,
instead.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---
linux-2.6.git-dave/fs/namespace.c | 11 +++++++++--
linux-2.6.git-dave/fs/nfs/super.c | 10 +++++-----
linux-2.6.git-dave/include/linux/fs.h | 1 +
3 files changed, 15 insertions(+), 7 deletions(-)
diff -puN fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS fs/namespace.c
--- linux-2.6.git/fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS 2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c 2008-04-29 11:56:39.000000000 -0700
@@ -402,12 +402,19 @@ static void __mnt_unmake_readonly(struct
spin_unlock(&vfsmount_lock);
}
-int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
+int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb)
{
mnt->mnt_sb = sb;
- mnt->mnt_root = dget(sb->s_root);
+ add_mount_to_sb_list(mnt, sb);
return 0;
}
+EXPORT_SYMBOL(simple_set_mnt_no_get);
+
+int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
+{
+ mnt->mnt_root = dget(sb->s_root);
+ return simple_set_mnt_no_get(mnt, sb);
+}
EXPORT_SYMBOL(simple_set_mnt);
diff -puN fs/nfs/super.c~introduce_simple_set_mnt_no_get_helper_for_NFS fs/nfs/super.c
--- linux-2.6.git/fs/nfs/super.c~introduce_simple_set_mnt_no_get_helper_for_NFS 2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/nfs/super.c 2008-04-29 11:56:39.000000000 -0700
@@ -1635,7 +1635,7 @@ static int nfs_get_sb(struct file_system
goto error_splat_root;
s->s_flags |= MS_ACTIVE;
- mnt->mnt_sb = s;
+ simple_set_mnt_no_get(mnt, s);
mnt->mnt_root = mntroot;
error = 0;
@@ -1727,7 +1727,7 @@ static int ...My only concern is the proliferation of 'simple_*' operations: in some cases in libfs.c we explicitly label those as being for in-memory/ramfs filesystems, whereas in other cases (such as this one) the name appears to be used for more generic functions. Cheers Trond --
I'm all to happy to give it a different name. Any suggestions? -- Dave --
would the vfs_* prefix make sense here? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
We're moving a big chunk of the burden of keeping people from
writing to r/o filesystems from the superblock into the
vfsmount. This requires that we consult the superblock's
vfsmounts when things like remounts occur.
So, introduce a list of vfsmounts hanging off the superblock.
We'll use this in a bit.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---
linux-2.6.git-dave/fs/namespace.c | 12 ++++++++++++
linux-2.6.git-dave/fs/super.c | 1 +
linux-2.6.git-dave/include/linux/fs.h | 1 +
linux-2.6.git-dave/include/linux/mount.h | 1 +
4 files changed, 15 insertions(+)
diff -puN fs/namespace.c~reintroduce_list_of_vfsmounts_over_superblock fs/namespace.c
--- linux-2.6.git/fs/namespace.c~reintroduce_list_of_vfsmounts_over_superblock 2008-04-29 11:56:40.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c 2008-04-29 11:56:40.000000000 -0700
@@ -402,6 +402,13 @@ static void __mnt_unmake_readonly(struct
spin_unlock(&vfsmount_lock);
}
+void add_mount_to_sb_list(struct vfsmount *mnt, struct super_block *sb)
+{
+ spin_lock(&vfsmount_lock);
+ list_add(&mnt->mnt_sb_list, &sb->s_vfsmounts);
+ spin_unlock(&vfsmount_lock);
+}
+
int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb)
{
mnt->mnt_sb = sb;
@@ -420,6 +427,10 @@ EXPORT_SYMBOL(simple_set_mnt);
void free_vfsmnt(struct vfsmount *mnt)
{
+ spin_lock(&vfsmount_lock);
+ if (mnt->mnt_sb)
+ list_del(&mnt->mnt_sb_list);
+ spin_unlock(&vfsmount_lock);
kfree(mnt->mnt_devname);
mnt_free_id(mnt);
kmem_cache_free(mnt_cache, mnt);
@@ -585,6 +596,7 @@ static struct vfsmount *clone_mnt(struct
mnt->mnt_root = dget(root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
+ add_mount_to_sb_list(mnt, sb);
if (flag & CL_SLAVE) {
list_add(&mnt->mnt_slave, &old->mnt_slave_list);
diff -puN fs/super.c~reintroduce_list_of_vfsmounts_over_superblock fs/super.c
--- ...