Re: [RFC PATCH] Introduce filesystem type tracking

Previous thread: [PATCH] add support for ST M41T94 SPI RTC (patch rev. 3) by Kim B. Heino on Monday, May 19, 2008 - 4:09 am. (3 messages)

Next thread: [ANNOUNCE] util-linux-ng 2.14-rc3 by Karel Zak on Monday, May 19, 2008 - 4:24 am. (4 messages)
From: Tom Spink
Date: Monday, May 19, 2008 - 4:22 am

Hi,

This email contains an RFC patch that introduces init and exit routines to
the file_system_type structure.  These routines were mentioned in
an email I saw about XFS starting threads that aren't needed when no
XFS filesystems are mounted.

So I decided to try and implement the infrastructure to do this.

Please let me know what you think, I'm pretty sure I'll be missing
something I won't know about (like a lock, or a refcount), but feedback
would be appreciated.

--

This patch adds tracking to filesystem types, whereby the number of mounts
of a particular filesystem type can be determined.  This has the added
benefit of introducing init and exit routines for filesystem types, which
are called on the first mount and last unmount of the filesystem type,
respectively.

This is useful for filesystems which share global resources between all
mounts, but only need these resources when at least one filesystem is
mounted.  For example, XFS creates a number of kernel threads which aren't
required when there are no XFS filesystems mounted.  This patch will allow
XFS to start those threads just before the first filesystem is mounted, and
to shut them down when the last filesystem has been unmounted.

Signed-off-by: Tom Spink <tspink@gmail.com>
---
 fs/namespace.c     |    9 +++++++++
 fs/super.c         |   25 +++++++++++++++++++++++++
 include/linux/fs.h |    3 +++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 4fc302c..bfa2f39 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1025,6 +1025,7 @@ static void shrink_submounts(struct vfsmount *mnt, struct list_head *umounts);
 static int do_umount(struct vfsmount *mnt, int flags)
 {
 	struct super_block *sb = mnt->mnt_sb;
+	struct file_system_type *type = sb->s_type;
 	int retval;
 	LIST_HEAD(umount_list);
 
@@ -1108,6 +1109,14 @@ static int do_umount(struct vfsmount *mnt, int flags)
 		security_sb_umount_busy(mnt);
 	up_write(&namespace_sem);
 ...
From: Tom Spink
Date: Tuesday, May 20, 2008 - 6:06 am

Hi,

I'm just adding people to CC here, but also I had a couple of thoughts
after reviewing my own code.

I see that do_kern_mount is encapsulated with the BKL, but would it be
wise to introduce a lock (e.g. a mutex) now for reading and updating
nr_mounts (and hence calling ->init), rather than wait for the BKL
removal to come round here?

Also, have I got all the cases where a filesystem is unmounted,
because I now see umount_tree, and am wondering if decrementing the
nr_mounts field should be done in here, in the loop of vfsmounts... or
is it sufficient to leave it at the end of do_umount?

-- 
Regards,
Tom Spink
--

From: Al Viro
Date: Tuesday, May 20, 2008 - 6:43 am

On Tue, May 20, 2008 at 02:06:42PM +0100, Tom Spink wrote:

No, you have not and no, doing that anywhere near that layer is hopeless.

	a) Instances of filesystem can easily outlive all vfsmounts,
let alone their attachment to namespaces.
	b) What should happen if init is done in the middle of exit?
	c) Why do we need to bother, anyway?  
--

From: Tom Spink
Date: Tuesday, May 20, 2008 - 6:50 am

Well, just for the reason I mentioned, I saw the posting about XFS
starting threads (when compiled into the kernel), but there's no use
of an XFS filesystem at all - there was a suggestion that something
like this be tried, so I thought I'd give it a go.

Thanks for replying!

-- 
Regards,
Tom Spink
--

From: Christoph Hellwig
Date: Tuesday, May 20, 2008 - 6:57 am

We had a discussion about filesystems starting threads without an
active instance.  I suggested tracking instances and add ->init / ->exit
methods to struct file_system_type for these kinds of instances.

But we should track superblock instances, not vfsmount instances of
course.  Tom, you probably don't even need a counter, emptyness
of file_system_type.fs_supers should be indication enough.  And yes
we'd need locking to prevent init racing with exit.

--

From: Tom Spink
Date: Tuesday, May 20, 2008 - 8:18 am

Hi Guys,

Thanks for looking!  So I've had another go - this time taking the
superblock approach, and hopefully I've got the locking right too.
Let me know what you think (or if I'm still barking up the wrong
tree)!

---
From: Tom Spink <tspink@gmail.com>
Date: Tue, 20 May 2008 16:04:51 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation

This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type.  Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.

This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem.  This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created.  It also lets the filesystem clean up those resources when the
last superblock is deactivated.

Signed-off-by: Tom Spink <tspink@gmail.com>
---
 fs/filesystems.c   |    2 ++
 fs/super.c         |   22 +++++++++++++++++++++-
 include/linux/fs.h |    3 +++
 3 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
 		res = -EBUSY;
 	else
 		*p = fs;
+	mutex_init(&fs->fs_supers_lock);
 	write_unlock(&file_systems_lock);
 	return res;
 }
@@ -105,6 +106,7 @@ int unregister_filesystem(struct file_system_type * fs)
 	tmp = &file_systems;
 	while (*tmp) {
 		if (fs == *tmp) {
+			mutex_destroy(&fs->fs_supers_lock);
 			*tmp = fs->next;
 			fs->next = NULL;
 			write_unlock(&file_systems_lock);
diff --git a/fs/super.c b/fs/super.c
index 453877c..e3a3186 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -184,6 +184,11 @@ void deactivate_super(struct super_block *s)
 ...
From: Matthew Wilcox
Date: Tuesday, May 20, 2008 - 8:34 am

You can't take a mutex while holding a spinlock -- what if you had to
sleep to acquire the mutex?

I imagine you also don't want to hold a spinlock while calling the
->init or ->exit -- what if the fs wants to sleep in there (eg allocate
memory with GFP_KERNEL).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Tom Spink
Date: Tuesday, May 20, 2008 - 8:36 am

Oh no!  This is bad.  I really need to devise some script to stress
test my code - and also make myself pay attention to what I'm doing.
Sorry for the noise, guys.

-- 
Tom Spink
--

From: Tom Spink
Date: Tuesday, May 20, 2008 - 2:08 pm

Hi Guys,

I've taken some more time to go over the locking semantics.  I wrote a
quick toy filesystem to simulate delays, blocking, memory allocation,
etc in the init and exit routines - and with an appropriately large
amount of printk's everywhere, I saw a quite a few interleavings.

I *think* I may have got it right, but please, let me know what you
think!  The only thing that I think may be wrong with this patch is
the
spin_lock/unlock at the end of sget, where the superblock is
list_add_tailed into the super_blocks list.  I believe this opens the
possibility for the same superblock being list_add_tailed twice... can
anyone else see this code-path, and is it a problem?

---

From: Tom Spink <tspink@gmail.com>
Date: Tue, 20 May 2008 16:04:51 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation

This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type.  Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.

This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem.  This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created.  It also lets the filesystem clean up those resources when the
last superblock is deactivated.

Signed-off-by: Tom Spink <tspink@gmail.com>
---
 fs/filesystems.c   |    2 ++
 fs/super.c         |   31 +++++++++++++++++++++++++++++--
 include/linux/fs.h |    3 +++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
 		res = -EBUSY;
 	else
 		*p = fs;
+	mutex_init(&fs->fs_supers_lock);
 ...
From: Matthew Wilcox
Date: Tuesday, May 20, 2008 - 3:00 pm

Hi Tom,

I spotted one definite bug; on failure, you leave the superblock on
the super_blocks list.

Your locking may well be correct, but it has the hallmarks of being "a bit
tricky" and a bit tricky means potentially buggy.  How about doing the
nesting the other way round, ie take the mutex first, then the spinlock?

The code needs a bit of tweaking because you don't want to put the
superblock on any list where it can be found until it's fully

sget is a little more complex ... the fs_supers_lock would need to be
dropped in a lot more places than I've shown here:

@@ -365,11 +372,31 @@ retry:
 retry:
+	mutex_lock(&type->fs_supers_lock);
 	spin_lock(&sb_lock);
	
 		destroy_super(s);
 		return ERR_PTR(err);
 	}
 	s->s_type = type;
 	strlcpy(s->s_id, type->name, sizeof(s->s_id));
+	if (list_empty(&type->fs_supers) && type->init) {
+		spin_unlock(&sb_lock);
+		err = type->init();
+		if (err) {
+			mutex_unlock(&type->fs_supers_lock);
+			destroy_super(s);
+			return ERR_PTR(err);
+		}
+		spin_lock(&sb_lock);
+	}
 	list_add_tail(&s->s_list, &super_blocks);
 	list_add(&s->s_instances, &type->fs_supers);
 	spin_unlock(&sb_lock);
+	mutex_unlock(&type->fs_supers_lock);
 	get_filesystem(type);
 	return s;
}

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Tom Spink
Date: Tuesday, May 20, 2008 - 3:22 pm

I spotted this while I was coding, and I was careful not to let it get
added to the list...  If the ->init routine fails, the superblock
hasn't even been added to the list yet.  The patch moves this line:

list_add_tail(&s->s_list, &super_blocks);




I had something similar earlier, but I thought it started to look
slightly messy when I discovered that dropping the spinlock would lead
to a racey ->init... but I hadn't thought of putting the mutex outside
the spinlock; the mutex protecting ->init and ->exit (I was getting
caught up in trying not to go to sleep inside a spinlock)

Thanks!
-- 
Tom Spink
--

From: Tom Spink
Date: Wednesday, May 21, 2008 - 7:49 am

Ready for another?  <g>

Here's another try, with Matthews suggestion of moving the mutex
outside the spinlock.  Again, I've used a wee stress test that tries
to mount a toy filesystem many times, with random pauses in the init
routines.  It seems to pass this (and again I've seen quite a few
interleavings of the calls), and a mental scan of the code paths leads
me to believe the locking is correct.

Thanks for putting up with me, guys!

-- Tom

--

From: Tom Spink <tspink@gmail.com>
Date: Wed, 21 May 2008 13:29:07 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation

This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type.  Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.

This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem.  This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created.  It also lets the filesystem clean up those resources when the
last superblock is deactivated.

Signed-off-by: Tom Spink <tspink@gmail.com>
---
 fs/filesystems.c   |    2 ++
 fs/super.c         |   29 ++++++++++++++++++++++++++++-
 include/linux/fs.h |    3 +++
 3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
 		res = -EBUSY;
 	else
 		*p = fs;
+	mutex_init(&fs->fs_supers_lock);
 	write_unlock(&file_systems_lock);
 	return res;
 }
@@ -105,6 +106,7 @@ int unregister_filesystem(struct file_system_type * fs)
 	tmp = &file_systems;
 	while (*tmp) {
 		if (fs == *tmp) {
+			mutex_destroy(&fs->fs_supers_lock);
 			*tmp = ...
From: Jan Engelhardt
Date: Wednesday, May 21, 2008 - 2:42 am

The filesystem may want to have the superblock passed.
Well, will see once a filesystem has the need for it.

--

Previous thread: [PATCH] add support for ST M41T94 SPI RTC (patch rev. 3) by Kim B. Heino on Monday, May 19, 2008 - 4:09 am. (3 messages)

Next thread: [ANNOUNCE] util-linux-ng 2.14-rc3 by Karel Zak on Monday, May 19, 2008 - 4:24 am. (4 messages)