Hi,
This email contains an RFC patch that introduces init and exit routines to
the file_system_type structure. These routines were mentioned in
an email I saw about XFS starting threads that aren't needed when no
XFS filesystems are mounted.
So I decided to try and implement the infrastructure to do this.
Please let me know what you think, I'm pretty sure I'll be missing
something I won't know about (like a lock, or a refcount), but feedback
would be appreciated.
--
This patch adds tracking to filesystem types, whereby the number of mounts
of a particular filesystem type can be determined. This has the added
benefit of introducing init and exit routines for filesystem types, which
are called on the first mount and last unmount of the filesystem type,
respectively.
This is useful for filesystems which share global resources between all
mounts, but only need these resources when at least one filesystem is
mounted. For example, XFS creates a number of kernel threads which aren't
required when there are no XFS filesystems mounted. This patch will allow
XFS to start those threads just before the first filesystem is mounted, and
to shut them down when the last filesystem has been unmounted.
Signed-off-by: Tom Spink <tspink@gmail.com>
---
fs/namespace.c | 9 +++++++++
fs/super.c | 25 +++++++++++++++++++++++++
include/linux/fs.h | 3 +++
3 files changed, 37 insertions(+), 0 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 4fc302c..bfa2f39 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1025,6 +1025,7 @@ static void shrink_submounts(struct vfsmount *mnt, struct list_head *umounts);
static int do_umount(struct vfsmount *mnt, int flags)
{
struct super_block *sb = mnt->mnt_sb;
+ struct file_system_type *type = sb->s_type;
int retval;
LIST_HEAD(umount_list);
@@ -1108,6 +1109,14 @@ static int do_umount(struct vfsmount *mnt, int flags)
security_sb_umount_busy(mnt);
up_write(&namespace_sem);
...Hi, I'm just adding people to CC here, but also I had a couple of thoughts after reviewing my own code. I see that do_kern_mount is encapsulated with the BKL, but would it be wise to introduce a lock (e.g. a mutex) now for reading and updating nr_mounts (and hence calling ->init), rather than wait for the BKL removal to come round here? Also, have I got all the cases where a filesystem is unmounted, because I now see umount_tree, and am wondering if decrementing the nr_mounts field should be done in here, in the loop of vfsmounts... or is it sufficient to leave it at the end of do_umount? -- Regards, Tom Spink --
On Tue, May 20, 2008 at 02:06:42PM +0100, Tom Spink wrote: No, you have not and no, doing that anywhere near that layer is hopeless. a) Instances of filesystem can easily outlive all vfsmounts, let alone their attachment to namespaces. b) What should happen if init is done in the middle of exit? c) Why do we need to bother, anyway? --
Well, just for the reason I mentioned, I saw the posting about XFS starting threads (when compiled into the kernel), but there's no use of an XFS filesystem at all - there was a suggestion that something like this be tried, so I thought I'd give it a go. Thanks for replying! -- Regards, Tom Spink --
We had a discussion about filesystems starting threads without an active instance. I suggested tracking instances and add ->init / ->exit methods to struct file_system_type for these kinds of instances. But we should track superblock instances, not vfsmount instances of course. Tom, you probably don't even need a counter, emptyness of file_system_type.fs_supers should be indication enough. And yes we'd need locking to prevent init racing with exit. --
Hi Guys,
Thanks for looking! So I've had another go - this time taking the
superblock approach, and hopefully I've got the locking right too.
Let me know what you think (or if I'm still barking up the wrong
tree)!
---
From: Tom Spink <tspink@gmail.com>
Date: Tue, 20 May 2008 16:04:51 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation
This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type. Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.
This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem. This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created. It also lets the filesystem clean up those resources when the
last superblock is deactivated.
Signed-off-by: Tom Spink <tspink@gmail.com>
---
fs/filesystems.c | 2 ++
fs/super.c | 22 +++++++++++++++++++++-
include/linux/fs.h | 3 +++
3 files changed, 26 insertions(+), 1 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
res = -EBUSY;
else
*p = fs;
+ mutex_init(&fs->fs_supers_lock);
write_unlock(&file_systems_lock);
return res;
}
@@ -105,6 +106,7 @@ int unregister_filesystem(struct file_system_type * fs)
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
+ mutex_destroy(&fs->fs_supers_lock);
*tmp = fs->next;
fs->next = NULL;
write_unlock(&file_systems_lock);
diff --git a/fs/super.c b/fs/super.c
index 453877c..e3a3186 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -184,6 +184,11 @@ void deactivate_super(struct super_block *s)
...You can't take a mutex while holding a spinlock -- what if you had to sleep to acquire the mutex? I imagine you also don't want to hold a spinlock while calling the ->init or ->exit -- what if the fs wants to sleep in there (eg allocate memory with GFP_KERNEL). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Oh no! This is bad. I really need to devise some script to stress test my code - and also make myself pay attention to what I'm doing. Sorry for the noise, guys. -- Tom Spink --
Hi Guys, I've taken some more time to go over the locking semantics. I wrote a quick toy filesystem to simulate delays, blocking, memory allocation, etc in the init and exit routines - and with an appropriately large amount of printk's everywhere, I saw a quite a few interleavings. I *think* I may have got it right, but please, let me know what you think! The only thing that I think may be wrong with this patch is the spin_lock/unlock at the end of sget, where the superblock is list_add_tailed into the super_blocks list. I believe this opens the possibility for the same superblock being list_add_tailed twice... can anyone else see this code-path, and is it a problem? --- From: Tom Spink <tspink@gmail.com> Date: Tue, 20 May 2008 16:04:51 +0100 Subject: [PATCH] Introduce on-demand filesystem initialisation This patch adds on-demand filesystem initialisation capabilities to the VFS, whereby an init routine will be executed on first use of a particular filesystem type. Also, an exit routine will be executed when the last superblock of a filesystem type is deactivated. This is useful for filesystems that share global resources between all instances of the filesystem, but only need those resources when there are any users of the filesystem. This lets the filesystem initialise those resources (kernel threads or caches, say) when the first superblock is created. It also lets the filesystem clean up those resources when the last superblock is deactivated. Signed-off-by: Tom Spink <tspink@gmail.com> --- fs/filesystems.c | 2 ++ fs/super.c | 31 +++++++++++++++++++++++++++++-- include/linux/fs.h | 3 +++ 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index f37f872..59b2eaa 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs) res = -EBUSY; else *p = fs; + mutex_init(&fs->fs_supers_lock); ...
Hi Tom,
I spotted one definite bug; on failure, you leave the superblock on
the super_blocks list.
Your locking may well be correct, but it has the hallmarks of being "a bit
tricky" and a bit tricky means potentially buggy. How about doing the
nesting the other way round, ie take the mutex first, then the spinlock?
The code needs a bit of tweaking because you don't want to put the
superblock on any list where it can be found until it's fully
sget is a little more complex ... the fs_supers_lock would need to be
dropped in a lot more places than I've shown here:
@@ -365,11 +372,31 @@ retry:
retry:
+ mutex_lock(&type->fs_supers_lock);
spin_lock(&sb_lock);
destroy_super(s);
return ERR_PTR(err);
}
s->s_type = type;
strlcpy(s->s_id, type->name, sizeof(s->s_id));
+ if (list_empty(&type->fs_supers) && type->init) {
+ spin_unlock(&sb_lock);
+ err = type->init();
+ if (err) {
+ mutex_unlock(&type->fs_supers_lock);
+ destroy_super(s);
+ return ERR_PTR(err);
+ }
+ spin_lock(&sb_lock);
+ }
list_add_tail(&s->s_list, &super_blocks);
list_add(&s->s_instances, &type->fs_supers);
spin_unlock(&sb_lock);
+ mutex_unlock(&type->fs_supers_lock);
get_filesystem(type);
return s;
}
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
I spotted this while I was coding, and I was careful not to let it get added to the list... If the ->init routine fails, the superblock hasn't even been added to the list yet. The patch moves this line: list_add_tail(&s->s_list, &super_blocks); I had something similar earlier, but I thought it started to look slightly messy when I discovered that dropping the spinlock would lead to a racey ->init... but I hadn't thought of putting the mutex outside the spinlock; the mutex protecting ->init and ->exit (I was getting caught up in trying not to go to sleep inside a spinlock) Thanks! -- Tom Spink --
Ready for another? <g>
Here's another try, with Matthews suggestion of moving the mutex
outside the spinlock. Again, I've used a wee stress test that tries
to mount a toy filesystem many times, with random pauses in the init
routines. It seems to pass this (and again I've seen quite a few
interleavings of the calls), and a mental scan of the code paths leads
me to believe the locking is correct.
Thanks for putting up with me, guys!
-- Tom
--
From: Tom Spink <tspink@gmail.com>
Date: Wed, 21 May 2008 13:29:07 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation
This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type. Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.
This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem. This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created. It also lets the filesystem clean up those resources when the
last superblock is deactivated.
Signed-off-by: Tom Spink <tspink@gmail.com>
---
fs/filesystems.c | 2 ++
fs/super.c | 29 ++++++++++++++++++++++++++++-
include/linux/fs.h | 3 +++
3 files changed, 33 insertions(+), 1 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
res = -EBUSY;
else
*p = fs;
+ mutex_init(&fs->fs_supers_lock);
write_unlock(&file_systems_lock);
return res;
}
@@ -105,6 +106,7 @@ int unregister_filesystem(struct file_system_type * fs)
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
+ mutex_destroy(&fs->fs_supers_lock);
*tmp = ...The filesystem may want to have the superblock passed. Well, will see once a filesystem has the need for it. --
