Re: [rfc patch] how to show propagation state for mounts

Previous thread: Re: NFS/LSM: allow NFS to control all of its own mount options by Christoph Hellwig on Tuesday, February 19, 2008 - 6:24 pm. (8 messages)

Next thread: [PATCH 00/37] Permit filesystem local caching by David Howells on Wednesday, February 20, 2008 - 12:05 pm. (56 messages)
To: <linux-fsdevel@...>
Cc: <viro@...>, <linuxram@...>
Date: Wednesday, February 20, 2008 - 11:39 am

Here's my take on the matter.

The propagation tree can be either be represented

1) "from root to leaf" listing members of peer groups and their
slaves explicitly,

2) or "from leaf to root" by identifying each peer group and then for
each mount showing the id of its own group and the id of the group's
master.

2) can have two variants:

2a) id of peer group is constant in time

2b) id of peer group may change

The current patch does 2b). Having a fixed id for each peer group
would mean introducing a new object to anchor the peer group into,
which would add complexity to the whole thing.

All of these are implementable, just need to decide which one we want.

Miklos
-

To: Miklos Szeredi <miklos@...>
Cc: <linux-fsdevel@...>, <linuxram@...>
Date: Wednesday, February 20, 2008 - 12:04 pm

Eh... Much more interesting question: since the propagation tree spans
multiple namespaces in a lot of normal uses, how do we deal with
reconstructing propagation through the parts that are not present in
our namespace? Moreover, what should and what should not be kept private
to namespace? Full exposure of mount trees is definitely over the top
(it shows potentially sensitive information), so we probably want less
than that.

FWIW, my gut feeling is that for each peer group that intersects with our
namespace we ought to expose in some form
* all vfsmounts belonging to that intesection
* the nearest dominating peer group (== master (of master ...) of)
that also has a non-empty intersection with our namespace

It's less about the form of representation (after all, we generate poll
events when contents of that sucker changes, so one *can* get a consistent
snapshot of the entire thing) and more about having it self-contained
when we have namespaces in the play.

IOW, the data in there should give answers to questions that make sense.
"Do events get propagated from this vfsmount I have to that vfsmount I have?"
is a meaningful one; ditto for "are events here propagated to somewhere I
don't see?" or "are events getting propagated here from somewhere I don't
see?".

Dumping pieces of raw graph, with IDs of nodes we can't see and without
any way to connect those pieces, OTOH, doesn't make much sense.
-

To: Al Viro <viro@...>
Cc: Miklos Szeredi <miklos@...>, <linux-fsdevel@...>, <linuxram@...>
Date: Wednesday, February 20, 2008 - 12:31 pm

Why do those last two questions deserve an answer? How will a person's
or application's behaviour be affected by whether a change will
propagate to something they don't know about and can't see?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
-

To: Matthew Wilcox <matthew@...>
Cc: Al Viro <viro@...>, Miklos Szeredi <miklos@...>, <linux-fsdevel@...>
Date: Wednesday, February 20, 2008 - 3:42 pm

Well, I do not want to be surprised to see a mount suddenly show up in
my namespace because of some action by some other user in some other
namespace. Its going to happen anyway if the namespace is forked of
a namespace that had shared mounts in them. However I would rather
prefer to know in advance the spots (mounts) where such surprises can
happen. Also I would prefer to know how my actions will effect mounts in
other namespaces.

-

To: <viro@...>
Cc: <miklos@...>, <linux-fsdevel@...>, <linuxram@...>
Date: Wednesday, February 20, 2008 - 12:27 pm

Well, assuming you see only one namespace. When I'm experimenting
with namespaces and propagations, I see both (each in a separate
xterm) and I do want to know how propagation between them happens.

Your suggestion doesn't deal with that problem.

Otherwise, yes it makes sense to have a consistent view of the tree
shown for each namespace. Perhaps the solution is to restrict viewing
the whole tree to privileged processes.

Miklos
-

To: Miklos Szeredi <miklos@...>
Cc: <viro@...>, <linux-fsdevel@...>
Date: Wednesday, February 20, 2008 - 3:29 pm

I wonder, what is wrong in reporting mounts in other namespaces that
either receive and send propagation to mounts in our namespace?

If we take that approach, we will report **only** the mounts in other
namespace which have a counter part in our namespace. After all the
filesystems backing the mounts here and there are the same(other wise
they would'nt have propagated).

And any mounts contained outside our namespace, having no propagation
relation to any mounts in our namespace, will remain hidden.

-

To: Ram Pai <linuxram@...>
Cc: Miklos Szeredi <miklos@...>, <linux-fsdevel@...>
Date: Wednesday, February 20, 2008 - 5:14 pm

A plenty. E.g. if foo trusts control over /var/blah to bar, it's not
obvious that foo has any business knowing if bar gets it from somebody
else in turn. And I'm not sure that bar has any business knowing that
foo has the damn thing attached in five places instead of just one,
let alone _where_ it has been attached.

If you get down to it, the thing is about delegating control over part
of namespace to somebody, without letting them control, see, etc. the
rest of it. So I'd rather be very conservative about extra information
we allow to piggyback on that. I don't know... perhaps with stable peer
group IDs it would be OK to show peer group ID by (our) vfsmount + peer
group ID of master + peer group ID of nearest dominating group that has
intersection with our namespace. Then we don't leak information (AFAICS),
get full propagation information between our vfsmounts and cooperating
tasks in different namespaces can figure the things out as much as possible
without leaking 3rd-party information to either.
-

To: <viro@...>
Cc: <linuxram@...>, <miklos@...>, <linux-fsdevel@...>
Date: Wednesday, February 20, 2008 - 5:35 pm

This sounds fine.

I'll have a look at implementing a stable peer group ID (it doesn't
need a separate object, I realized that now).

Miklos
-

To: <viro@...>
Cc: <linuxram@...>, <miklos@...>, <linux-fsdevel@...>
Date: Friday, February 22, 2008 - 10:46 am

Here's a patch against current -mm implementing this (with some
cleanups thrown in). Done some testing on it as well, it wasn't
entirey trivial to figure out a setup, where propagation goes out of
the namespace first, then comes back in:

mount --bind /mnt1 /mnt1
mount --make-shared /mnt1
mount --bind /mnt2 /mnt2
mount --make-shared /mnt2
newns
mount --make-slave /mnt1

old ns:
mount --make-slave /mnt2
mount --bind /mnt1/tmp /mnt1/tmp

new ns:
mount --make-shared /mnt1/tmp
mount --bind /mnt1/tmp /mnt2/tmp

Voila.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c 2008-02-22 15:27:23.000000000 +0100
+++ linux/fs/pnode.c 2008-02-22 15:27:26.000000000 +0100
@@ -9,8 +9,12 @@
#include <linux/mnt_namespace.h>
#include <linux/mount.h>
#include <linux/fs.h>
+#include <linux/idr.h>
#include "pnode.h"

+static DEFINE_SPINLOCK(mnt_pgid_lock);
+static DEFINE_IDA(mnt_pgid_ida);
+
/* return the next shared peer mount of @p */
static inline struct vfsmount *next_peer(struct vfsmount *p)
{
@@ -27,36 +31,90 @@ static inline struct vfsmount *next_slav
return list_entry(p->mnt_slave.next, struct vfsmount, mnt_slave);
}

-static int __peer_group_id(struct vfsmount *mnt)
+static void __set_mnt_shared(struct vfsmount *mnt)
{
- struct vfsmount *m;
- int id = mnt->mnt_id;
+ mnt->mnt_flags &= ~MNT_PNODE_MASK;
+ mnt->mnt_flags |= MNT_SHARED;
+}
+
+void set_mnt_shared(struct vfsmount *mnt)
+{
+ int res;

- for (m = next_peer(mnt); m != mnt; m = next_peer(m))
- id = min(id, m->mnt_id);
+ retry:
+ spin_lock(&mnt_pgid_lock);
+ if (IS_MNT_SHARED(mnt)) {
+ spin_unlock(&mnt_pgid_lock);
+ return;
+ }

- return id;
+ res = ida_get_new(&mnt_pgid_ida, &mnt->mnt_pgid);
+ spin_unlock(&mnt_pgid_lock);
+ if (res == -EAGAIN) {
+ if (ida_pre_get(&...

To: Miklos Szeredi <miklos@...>
Cc: <viro@...>, <linux-fsdevel@...>
Date: Monday, March 10, 2008 - 2:53 am

1) reports deleted inode in dentry_path() consistent with that in __d_path()
2) modified __d_path() to use prepend(), reducing the size of __d_path()
3) moved all the functionality that reports mount information in /proc under
CONFIG_PROC_FS.

Code compile tested only with and without CONFIG_PROC_FS.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
fs/dcache.c | 66 +++++++++++++++++++----------------------------
fs/namespace.c | 15 +++++++++-
fs/pnode.c | 24 +++++++++++------
fs/pnode.h | 6 +++-
fs/seq_file.c | 2 +
include/linux/dcache.h | 3 ++
include/linux/mount.h | 2 +
include/linux/seq_file.h | 3 ++
8 files changed, 73 insertions(+), 48 deletions(-)

Index: linux-2.6.24/fs/dcache.c
===================================================================
--- linux-2.6.24.orig/fs/dcache.c
+++ linux-2.6.24/fs/dcache.c
@@ -1747,6 +1747,17 @@ shouldnt_be_hashed:
goto shouldnt_be_hashed;
}

+static int prepend(char **buffer, int *buflen, const char *str,
+ int namelen)
+{
+ *buflen -= namelen;
+ if (*buflen < 0)
+ return -ENAMETOOLONG;
+ *buffer -= namelen;
+ memcpy(*buffer, str, namelen);
+ return 0;
+}
+
/**
* d_path - return the path of a dentry
* @dentry: dentry to report
@@ -1768,17 +1779,11 @@ static char *__d_path(struct dentry *den
{
char * end = buffer+buflen;
char * retval;
- int namelen;

- *--end = '\0';
- buflen--;
- if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
- buflen -= 10;
- end -= 10;
- if (buflen < 0)
+ prepend(&end, &buflen, "\0", 1);
+ if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
+ (prepend(&end, &buflen, " (deleted)", 10) != 0))
goto Elong;
- memcpy(end, " (deleted)", 10);
- }

if (buflen < 1)
goto Elong;
@@ -1805,13 +1810,10 @@ static char *__d_path(struct dentry *den
}
parent = dentry->d_parent;
prefetch(parent);
- namelen = de...

To: Ram Pai <linuxram@...>
Cc: Miklos Szeredi <miklos@...>, <viro@...>, <linux-fsdevel@...>
Date: Monday, March 10, 2008 - 3:10 am

I think that latter one goes to far, mnt_id and mn_pdid are generally
useful information that should be generated unconditionally. That will
also get rid of most of the ifdef clutter introduced in this patch.

--

To: <hch@...>
Cc: <linuxram@...>, <miklos@...>, <viro@...>, <linux-fsdevel@...>
Date: Monday, March 10, 2008 - 7:39 am

I'll fix these up and send to Andrew.

Miklos
--

To: Miklos Szeredi <miklos@...>
Cc: <viro@...>, <linuxram@...>, <linux-fsdevel@...>
Date: Wednesday, March 5, 2008 - 3:25 pm

Looks nice, and a bit of testing/playing around showed no problem on my
end.

This definately would be a nice feature to have, and heck, it might
greatly simplify an LTP testcase for mounts propagation which is long
overdue.

thanks,
--

To: <serue@...>
Cc: <viro@...>, <linuxram@...>, <linux-fsdevel@...>
Date: Wednesday, March 5, 2008 - 3:34 pm

Thanks for testing!

Ram, how is your patch progressing? Could you send me the final
version sometime, so that I can put a new version of this work
together and sumbit to -mm for more eyeballs?

Thanks,
Miklos
--

To: Miklos Szeredi <miklos@...>
Cc: <serue@...>, <viro@...>, <linux-fsdevel@...>
Date: Wednesday, March 5, 2008 - 4:23 pm

Miklos,

sorry. will have it your way tonight. Or the latest by the end
of the week.

--

Previous thread: Re: NFS/LSM: allow NFS to control all of its own mount options by Christoph Hellwig on Tuesday, February 19, 2008 - 6:24 pm. (8 messages)

Next thread: [PATCH 00/37] Permit filesystem local caching by David Howells on Wednesday, February 20, 2008 - 12:05 pm. (56 messages)