Re: unprivileged mounts git tree

Previous thread: [PATCH 8/9] Call idr_find() without locking in ipc_lock() by Nadia.Derbey on Wednesday, May 7, 2008 - 4:36 am. (3 messages)

Next thread: [PATCH 0/3] [2.6.26] ehea: Add DLPAR memory remove support by Hannes Hering on Wednesday, May 7, 2008 - 5:39 am. (1 message)
From: Miklos Szeredi
Date: Wednesday, May 7, 2008 - 5:05 am

Here's a git tree of the unprivileged mounts patchset:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git unprivileged-mounts

Could this be added to -mm (and dropped if it's in the way of
something) for some testing and added visibility until it's reviewed
by Christoph/Al?

I'm not reposting the whole patchset, since it's essentially the same
as the last submission, only updated to the latest git.  But if
somebody wants it I can post them.

Thanks,
Miklos


 Documentation/filesystems/fuse.txt |   88 ++++++++-
 Documentation/filesystems/proc.txt |   40 ++++
 fs/filesystems.c                   |   60 ++++++
 fs/fuse/inode.c                    |   21 ++
 fs/internal.h                      |    3 +-
 fs/namespace.c                     |  366 +++++++++++++++++++++++++++---------
 fs/pnode.c                         |   22 ++-
 fs/pnode.h                         |    2 +
 fs/super.c                         |   26 ---
 include/linux/fs.h                 |    7 +
 include/linux/mount.h              |    4 +
 kernel/sysctl.c                    |   16 ++
 12 files changed, 527 insertions(+), 128 deletions(-)

Miklos Szeredi (10):
      unprivileged mounts: add user mounts to the kernel
      unprivileged mounts: allow unprivileged umount
      unprivileged mounts: propagate error values from clone_mnt
      unprivileged mounts: account user mounts
      unprivileged mounts: allow unprivileged bind mounts
      unprivileged mounts: allow unprivileged mounts
      unprivileged mounts: add sysctl tunable for "safe" property
      unprivileged mounts: make fuse safe
      unprivileged mounts: propagation: inherit owner from parent
      unprivileged mounts: add "no submounts" flag

--

From: Serge E. Hallyn
Date: Thursday, August 7, 2008 - 3:27 pm

Hi Miklos,

so on the bright side I pulled this tree today and it compiled and
passed ltp with no problems.

But then I played around a bit and found I could do the following:

(hmm, i'm trying to remember the exact order :)

as root:
	mmount --bind -o user=500 /home/hallyn/etc/ /home/hallyn/etc/
	mount --bind /mnt /mnt
	mount --make-rshared /mnt
	mount --bind /dev /mnt/dev

as hallyn:
	mmount --bind /mnt /home/hallyn/etc/mnt
	/usr/src/mmount-0.3/mmount --bind mnt/dev mnt/src

Now /mnt/src contained /dev.

Is this what we want?  Do we want to tell the admin it's his fault for
not somehow forcing a slave relationship between /mnt and
/home/hallyn/etc/mnt?  Except I don't think he can do that preemptively,
it has to be done after hallyn does the mmount.

So does that mean that if non-root user X does:

	mount a b

where b is user=X but a is not, then if a is shared we should force it
to be mounted as slave at b?

-serge
--

From: Eric W. Biederman
Date: Thursday, August 7, 2008 - 5:07 pm

You are using relative directory names here which makes it confusing.

I don't think so.

I think the simplest answer is to not allow mounting of shared
subtrees controlled by a different user.

Serge I think you are right downgrading the mount from shared to slave
--

From: Serge E. Hallyn
Date: Thursday, August 7, 2008 - 5:25 pm

I assume you mean "if the mount owners don't match"?

Miklos, what do you think?

The next question then becomes, how can we prove to ourselves that that
closes the last security hole with unprivileged mounts?  So long as
we treat each mount event as a piece of information and look at it as an
information flow problem, maybe we can actually come up with a good
description of the logic that is implemented and show that there is no
way a user can "leak" info...  (where a leak is a mount event, a
violation of intended DAC on open(file) or mkdir, etc)

-serge
--

From: Miklos Szeredi
Date: Monday, August 25, 2008 - 4:01 am

Sorry about the late reply: I was on a long summer vacation...

Serge, thanks for spotting this: it looks indeed a nasty hole!  I also

"Information flow problem" doesn't mean much to me (I'm actually an
electric engineer, who ended up doing programming for living ;)

But yeah, we should think this over very carefully.  Especially
interaction with mount propagation, which has very complicated and
sometimes rather counter-intuitive semantics.

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Wednesday, August 27, 2008 - 8:36 am

I know we discussed before about whether a propagated mount from a
non-user mount to a user mount should end up being owned by the user
or not.  I don't recall (and am not checking the code at the moment
as your tree is sitting elsewhere) whether we mark the propagated
tree with the right nosuid and nodev flags, or whether we call it
a user mount or not.

-serge
--

From: Miklos Szeredi
Date: Wednesday, August 27, 2008 - 8:55 am

If the destination is a user mount, then

 - the propagated mount(s) will be owned by the same user as the destination
 - the propagated mount(s) will inherit 'nosuid' from the destination

I remember also thinking about 'nodev' and why it doesn't need similar
treatment to 'nosuid'.  The reasoning was that 'nodev' is safe as long
as permissions are enforced, namespace shuffling cannot make it
insecure.  Does that sound correct?

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Wednesday, August 27, 2008 - 11:46 am

Ok, thanks.  I look forward to playing around with it when you publish

Yes that sounds correct, thanks for the refresher.

-serge
--

From: Miklos Szeredi
Date: Wednesday, September 3, 2008 - 11:45 am

A couple of centuries later...

...here's the updated git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git unprivileged-mounts

Changes since the previous version:

 - update to apply against latest git
 - downgrade shared mounts to slave for unprivileged binds (if owners differ)
 - don't allow unprivileged recursive binds

Serge, thanks again for testing and reviewing these patches!

Miklos
--

From: Serge E. Hallyn
Date: Wednesday, September 3, 2008 - 2:54 pm

Well I see where a shared mount *should* be turned into a slave mount
when bind-mounted as a user mount, but it doesn't seem to be happening.
In particular, after doing a user mount of /mnt onto
/home/hallyn/etc/mnt, /proc/self/mountinfo ends in:

22 13 3:1 /mnt /mnt rw shared:1 - ext3 /dev/root
rw,errors=continue,user_xattr,acl,data=ordered
23 13 3:1 /mnt /root/mnt rw shared:1 - ext3 /dev/root
rw,errors=continue,user_xattr,acl,data=ordered
24 13 3:1 /mnt /home/hallyn/etc/mnt rw,user=500 shared:1 - ext3
/dev/root rw,errors=continue,user_xattr,acl,data=ordered

I assume this means that /mnt and /home/hallyn/etc/mnt are peers
in peergroup 1?

And indeed if hallyn does mount --bind /usr /home/hallyn/etc/mnt/usr,
then /mnt/usr shows the contents of /usr.

I see that in do_loopback() you are adding CL_SLAVE|CL_MAKE_SHARED to
flags so I don't get what is going on.  Still looking through the code.

-serge
--

From: Serge E. Hallyn
Date: Wednesday, September 3, 2008 - 3:02 pm

Ooh.

You predicate the turning of shared mount to a slave mount on
!capable(CAP_SYS_ADMIN).  But in fact it's the mount by a privileged
user, turning the mount into a user mount, which you want to convert.
So my series of steps was:

	as root:
		(1) mount --bind /mnt /mnt
		(2) mount --make-rshared /mnt
		(3) /usr/src/mmount-0.3/mmount --bind -o user=hallyn /mnt \
			/home/hallyn/etc/mnt
	as hallyn:
		(4) mount --bind /usr /home/hallyn/etc/mnt/usr

You are turning mounts from shared->slave at step 4, but in fact we need
to do it at step 3, where we do have CAP_SYS_ADMIN.

-serge
--

From: Miklos Szeredi
Date: Wednesday, September 3, 2008 - 3:25 pm

Well, that's arguable: I think root should be able to shoot itself in
the foot by doing step 3.  Generally we don't restrict what root can
do.  OTOH I agree that current behavior is ugly in that it provides
different semantics for privileged/non-privileged callers.

Perhaps it would be cleaner to simply not allow step 4, instead of
playing tricks with changing the propagation type.

Miklos
--

From: Serge E. Hallyn
Date: Wednesday, September 3, 2008 - 3:43 pm

Maybe I'm not thinking right, but long-term is there any reason why we
should require privilege in order to do step 3, so long as the user has
read access to the source and write access to the destination?

I don't think there is.  Other than this glitch.  That's a powerful
reason to fix the glitch.

The other argument is that, frankly, I think most people are still
either unaware of, or confused by, mounts propagation.  Letting root

If the user or admin can simply (I haven't tested)

	mmount --bind --make-rslave -o user=hallyn /mnt \
		/home/hallyn/etc/mnt

then returning -EPERM if --make-rslave was not provided is reasonable
IMO.

-serge
--

From: Miklos Szeredi
Date: Wednesday, September 3, 2008 - 11:42 pm

Hmm, I think there are infinite ways in which root can mess up mount
propagation, and this is not even the worst.  I'm not trying to
belittle this bug: done unprivileged it's unacceptable.  But with
privileges, I really don't know if we should change the propagation

Right, that sounds perfect.  the only problem is, bind mount currently
ignores the propagation flags, for no good reason I can see.

That's a separate patch though.  I'll look into it.

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 4, 2008 - 6:28 am

Cool, thanks, Miklos :)

Are you going to revert the change forcing CL_SLAVE for
!capable(CAP_SYS_ADMIN)?  I don't think we want that - I think that
*within* a set of user mounts, propagation should be safe, right?

Will you be able to do this soon?  If not, should we just do the part
returning -EPERM when turning a shared mount into a user mount? 
Because I think that would then be ready for testing in -mm, and would
love to see it tested.

Were you going to push a patch to mount to do the user mounts, or
put sample code in Documentation, git log, or under samples/?

thanks,
-serge
--

From: Miklos Szeredi
Date: Thursday, September 4, 2008 - 7:06 am

OK, let's do that first and the tricky part (propagation vs. user

I've got a patch against util-linux-ng, but Karel doesn't want to take
it (understandibly) until the kernel patches have made it into
mainline.

Here it is, if you want to play with it.

Thanks,
Miklos
----

Subject: util-linux-ng: unprivileged mounts support

From: Miklos Szeredi <mszeredi@suse.cz>

This is an experimental patch for supporing unprivileged mounts and
umounts.  The following features are added:

1) If mount/umount are suid, first try without privileges.

This is done by forking, dropping privileges in child, and redirecting
stderr to /dev/null.  If this succeeds, then parent exits with zero
exit code.  Otherwise parent continues normally (with privileges).
This isn't perfect, because the wrong error message will be printed if
mount/umount failed not because of insufficient privileges, but some
other error (e.g. mountpoint busy).

2) Support user mounts in kernel.

If /etc/mtab is a symlink (to /proc/mounts if it's been set up
correctly), then change fsuid for the duration of the mount syscall
and use the MS_SETOWNER flag.  Old kernels will simply ignore this,
and everything will work as before.  Kernels with the unprivileged
mounts patches will set the owner of the mount, and the relevant line
in /proc/PID/mounts will contain a "user=XYZ" option, making this
backward compatible with /etc/mtab.

3) Root can force a specific user for a mount with "-ouser=XYZ".  This
has worked previously as well, but that was probably just accidental
(the option was copied verbatim to /etc/mtab).

4) Add support for "submnt" and "nosubmnt" options.  These can be used
to allow or prohibit unprivileged submounting of a user mount.

Example:

root# mount --bind -ouser=xyz /home/xyz /home/xyz
xyz> mount --bind ~/foo ~/bar
xyz> umount ~/bar

Changes since last version:

 - rename options:  'nomnt' -> 'nosubmnt', 'mnt' -> 'submnt'
 - change error message for EMFILE
 - fix bug in handling 'user' ...
From: Miklos Szeredi
Date: Thursday, September 4, 2008 - 8:40 am

Here it is:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git unprivileged-mounts

I don't know what's next, this patchset has been in and out of -mm for
as long as I can remember, but it hasn't generated much interest
outside the two of us :)

I do think this is an important feature though, even if not as sexy as
some other things.

Al?  Is there any chance of this making it to 2.6.28?

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 4, 2008 - 9:17 am

but you're still doing

	if (IS_MNT_SHARED(old_nd.path.mnt) && !capable(CAP_SYS_ADMIN))
		goto out;

shouldn't it be something like

	if (IS_MNT_SHARED(old_nd.path.mnt) && (old_nd.path.mnt & MNT_USER))
		goto out;

--

From: Miklos Szeredi
Date: Thursday, September 4, 2008 - 10:42 am

Why would that be an error?  There's no real security gain to be had
from restricting a privileged user, but could cause a lot of
annoyance.  If we think this is dangerous, then protection should be
built into mount(8) with an option to override.  But not into the
kernel, IMO.

Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 4, 2008 - 10:48 am

We disagree on that.  But can we agree that the check you added is wrong?
There is no reason why a user mount should not be able to do shared
mounts, is there?  So should the check above just go away then?

-serge
--

From: Miklos Szeredi
Date: Thursday, September 4, 2008 - 11:03 am

I don't know.  It's something to think about in the future, but not
essential.  We know that without the above check the user can do bad
things: propagate mounts back into the source, and we don't want that.

We could allow binding a shared mount if

  a) the owners of the source and destination match
  b) the destination is made a slave of the source

But the current patchset doesn't allow _any_ changes to propagation
without CAP_SYS_ADMIN, so why should bind be an exception?

And yes, this is something to think about, but I think it's a rather
uncommon corner case, and so the patchset very much makes sense

No, we'd be back with the original problem.

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 4, 2008 - 11:49 am

When is propagating mounts back into the source bad?  Because you are
not preventing it.

You are preventing future propagation back into the user's own mounts,
but not into mounts not owned by the user.



Because it's not a change in propagation among existing mounts, instead
it's defining propagation for the new user mounts.  And since user
mounts don't currently exist, we're in no position to talk about

I'm willing to accept that if we simply leave the patchset as it was
before, but your new check just adds inconsistencies for absolutely zero

We still have the original problem.

When root does

	mount -bind /mnt /mnt
	mount --make-rshared /mnt
	mount --bind -o user=hallyn /mnt /home/hallyn/mnt

and hallyn does

	mount --bind /usr /home/hallyn/mnt/usr

then the kernel happily propagates the mount to /mnt/usr.
Now if hallyn does

	mount --bind /home/hallyn/mnt/usr /home/hallyn/mnt/usr2

THAT gives him a -EPERM.

To what end?

-serge
--

From: Miklos Szeredi
Date: Thursday, September 4, 2008 - 3:26 pm

Obviously, and that's exactly what root _instructed_ in the last step.
If it's a security problem, root shouldn't do that.

Your original bug report correctly pointed out the real security
problem:

|  as root:
|  	mmount --bind -o user=500 /home/hallyn/etc/ /home/hallyn/etc/
|  	mount --bind /mnt /mnt
|  	mount --make-rshared /mnt
|  	mount --bind /dev /mnt/dev
| 
|  as hallyn:
|  	mmount --bind /mnt /home/hallyn/etc/mnt
|  	/usr/src/mmount-0.3/mmount --bind mnt/dev mnt/src

Here root does nothing "unsafe", yet the user can get propagation back
into /mnt, due to the fact that a bind mount makes the new mount part
of the old peer group.  This is the security hole that is fixed, and
AFAICS the only security hole related to propagation vs. user mounts.

(I'm going to be offline tomorrow and the weekend, but will hopefully
have email access next week).

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 4, 2008 - 4:32 pm

(&(*$&%, you're right, of course.

Ok, will play with it a bit more, but I think it'd be *great* to see
this show up in -mm again.

-serge
--

From: Serge E. Hallyn
Date: Friday, September 5, 2008 - 8:31 am

Ok I should take the time to properly add these to ltp, but for now here
is the result of 15-minutes of playing around with shell scripts to do
some basic testing.

Run usermounts_root.sh as root first, then usermounts_user as a user.
Cleanup for the usermounts_root.sh side-effects is not done.

Miklos, do you have better-thought-out or more complete testcases?

-serge

=====================================================================
FILE usermounts_root.sh
=====================================================================
#!/bin/sh
MMOUNTDIR=/usr/src/mmount-0.3
MOUNT=${MMOUNTDIR}/mmount
UMOUNT=${MMOUNTIDR}/ummount

mkdir -p /mnt/shared /mnt/slave /mnt/private
mkdir /mnt/shared/d /mnt/slave/d /mnt/private/d

touch /mnt/shared/a
touch /mnt/slave/b
touch /mnt/private/c

mount --bind /mnt/shared /mnt/shared
mount --make-rshared /mnt/shared
mount --bind /mnt/shared /mnt/slave
mount --make-rslave /mnt/slave
=====================================================================


=====================================================================
FILE usermounts_user.sh
=====================================================================
#!/bin/sh
MMOUNTDIR=/usr/src/mmount-0.3
MOUNT=${MMOUNTDIR}/mmount
UMOUNT=${MMOUNTDIR}/ummount

mkdir t1

# user bind a root shared mount.  Should fail -EPERM.

$MOUNT --bind /mnt/shared t1
rc=$?
if [ $rc -eq 0 ]; then
	echo "FAIL: succeeded in user-binding a root-shared dir"
	exit 1
fi
echo "PASS: first test passed (refused to user-bind a root-shared dir"

# user bind a root shared mount, then bind into there.  Make sure
# that the two binds work, and the second is not propagated to the
# first

$MOUNT --bind /mnt/slave t1
$MOUNT --bind /mnt/private t1/d
if [ ! -f t1/d/c ]; then
	echo "failed mounting private under slave/d"
	exit 1
fi
if [ -f /mnt/slave/d/c ]; then
	echo "user mount of private under slave/d was propagated to /mnt/slave!"
	exit 1
fi
if [ -f /mnt/shared/d/c ]; then
	echo "user mount of ...
From: Miklos Szeredi
Date: Tuesday, September 9, 2008 - 6:34 am

Thanks for the scripts, I don't have any better ones, I have only been
testing by hand.

As for -mm, sure it would be nice.  I'll do a resubmission of the
whole series with proper changelog, etc.  It would be really nice if
Al and/or Christoph could review if there are any major conceptual
problems left.

Thanks,
Miklos
--

From: Eric W. Biederman
Date: Thursday, September 11, 2008 - 3:37 am

There is a weird corner case I'm trying to wrap my head around.
unlink and rmdir do not work on dentries that are mount points
in another mount namespace.

Which is at least needed for the moment so we don't leak mounts.

Once we have unprivileged mounts does that introduce a DOS attack?

Eric
--

From: Miklos Szeredi
Date: Thursday, September 11, 2008 - 7:43 am

Hmm, yes.  That's a tough one...

I think if the dentry has only user mounts, unlink should go ahead and
on success dissolve any mounts on the dentry.  Does that sound
workable?

Thanks,
Miklos
--

From: Serge E. Hallyn
Date: Thursday, September 11, 2008 - 8:20 am

Is it really a problem?  The admin can always go ahead and kill the
user, which already takes care of any mounts in private namespaces,
which I think is Eric's primary concern.  IT also takes care of that
user's processes pinning files under the mounts.  So now the admin can
umount all the user's mounts in the init namespace (using a script
parsing /proc/self/mountinfo if need be), and delete the files.

Doesn't really seem like a problem.

Or am I missing Eric's real concern?

-serge
--

From: Miklos Szeredi
Date: Thursday, September 11, 2008 - 8:44 am

It's not necessarily against the admin, it could deny service to
another user or a script running as root.

The nasty thing is: an unprivileged user can cause unlink/rmdir to
fail, even when otherwise it would have succeed.  It's not a huge
issue: unprivileged fuse mounts have the same effect, and nobody
complained yet.  But I think we have to deal with this in some way,
not just leaving it to the admin.

Thanks,
Miklos
--

From: Eric W. Biederman
Date: Thursday, September 11, 2008 - 11:54 am

Assume /user is the base unprivileged mount point.

echo dummy > /tmp/1234
mount --bind /etc /user/etc
mount --bind /tmp/1234 /user/etc/passwd

Now you can't create /etc/passwd.new and rename it to /etc/passwd.
Stopping adduser from working.

As Miklos said this can apply to any file or any directory, so it can
be a DOS against any other user on the system.

It is also contrary to classic unix and linux semantics as open files
don't otherwise prevent unlink, rename or, rmdir from happening. So
applications are not going to be ready for it.

At a practical level I recently replace chroot with mount namespaces
to simplify handling of mounts and ouch!  When a process goes crazy
and doesn't exit when you expect and then you try and delete the
directory it is a pain.

Eric

--

From: Serge E. Hallyn
Date: Friday, September 12, 2008 - 3:08 pm

Ok, but this is all done as root.  Kind of a silly thing for root to
do :)

So in order for me as an unprivileged user to pin a dentry by mounting
over it, I have to have write permission to the dentry to begin with

Except I need to own the mount as well as the dentry.  So after
root does

	mmount --bind -o user=hallyn /home/hallyn /home/hallyn
	mmount --bind -o user=hallyn /home/serge /home/serge

if user serge (uid 501) tries to

	mmount --bind /etc /home/hallyn/etc
	mmount --bind /etc /home/serge/etc

permission for the first will be denied because serge does not
have write perms to /home/hallyn/etc, and permission for the second
will be denied because only hallyn may mount under /home/serge.

If root properly did

	mmount --bind -o user=hallyn /home/hallyn /home/hallyn
	mmount --bind -o user=serge /home/serge /home/serge

and then hallyn does

	mmount --bind /etc /home/hallyn/etc

and serge does

	mmount --bind /home/hallyn/etc /home/serge/etc

then hallyn can still ummount /home/hallyn/etc.

And we've decided that users cannot (for now) do shared mounts.
--

From: Eric W. Biederman
Date: Friday, September 12, 2008 - 8:12 pm

There are less silly examples like setting up a chroot type
environment contained in a mount namespace and having a kernel oops

That second condition is interesting requiring write permission of the
dentry.  I thought we had obviated the need for that when we added
ownership to the mounts themselves.  In this case at least it shouldn't

Ok.  Let's pick on something a little more interesting.

root does:
	mmount --bind -o user=hallyn /home/hallyn /home/hallyn
hallyn does:
	mount --bind /tmp /home/hallyn/tmp
        touch dummy
	mount --bind dummy /home/hallyn/tmp/some_shared_file_I_have_write_access_to.

Which allows me to transform write permissions into the ability to
deny someone else the ability to delete a file.

This seems to mess up things like revoke.

At a practical level it is a real annoyance, regardless of the security
implications.

As a point of comparison plan9 does not have that restriction.

Eric

--

From: Serge E. Hallyn
Date: Saturday, September 13, 2008 - 6:56 pm

I wasn't saying that I believe I can win an argument by knocking down
one example :)

In fact I'm not trying to win an argument.  Because I'm quite sure you


Yup, that's an interesting example.

Still an admin *can* work around that, if he can sufficiently parse
/proc/self/mountinfo to know to umount
/home/hallyn/tmp/some_shared_file_I_have_write_access_to.



Why doesn't it have that restriction?  Does it always allow you to rm a
mounted-over file?

-serge
--

From: Eric W. Biederman
Date: Saturday, September 13, 2008 - 8:06 pm

Ok.  I couldn't find Mikolos's code when I looked quickly.



Yes.

The implementation of mounts in plan9 is a bit different.  In
plan9 mounts are looked up by mount namespace and qid (effectively
the inode number).  So the semantics are slightly different.
Requiring a new mount namespace if you want to see what a filesystem
looks like without a mount on top of a given file.

It also looks like plan9 is likely to have a unmountable and unusable mount
if you actually delete an file, from another mount namespace.

Linux actually has a very similar case where we can loose a mount if you rename
the directory that contains it to be somewhere higher in the mount hierarchy
than the location the mount was under.

All of that said there is a very good practical justification for it all.
In a filesystem where not all changes come through our local VFS the
VFS cannot prevent changes to a filesystem it can only cache and respond
to changes that do occur.

So things like sysfs_rename_dir and sysfs_move_dir routinely bypass
the VFS restriction against changes happening under mount points.  It
isn't a problem in practice because people don't mount on sysfs but the
problem is very much there today.

It looks to me like the right solution is very much to do the lazy
unmount we do for expired mounts, and purge everything that happens
and then we just don't have to worry about mounts causing a denial
of service attack and life gets simpler.

Rename is a bit trickier than unlink and rmdir in that we should allow
it on the underlying filesystem and only remove the directory if the
rename goes out of scope for the overlying mount.

Eric
--

From: Serge E. Hallyn
Date: Tuesday, September 30, 2008 - 12:39 pm

Rules according to permit_mount:

	if CAP_SYS_ADMIN, allowed
	if fs type is not marked safe for user mounts (and not bind), -EPERM
	if target is a link, -EPERM
	if target is labeled with new nosubmnt flag, -EPERM
	if target is not a user mount owned by current->uid, -EPERM


You mean if we rm something, do the same thing we do for a lazy umount?

That sounds reasonble, if the implementation is doable.  And should

The rename issue here being what you mentioned above about losing a
--

From: Miklos Szeredi
Date: Monday, October 6, 2008 - 4:05 am

The rationale for the last one is that the user might not have access
to the submounts.  Recursive bind mounts can be implemented in

# mkdir -p /tmp/a/b/c
# mkdir /tmp/x
# mount --bind /tmp/a /tmp/x
# mount --bind /bin /tmp/x/b/c
# mv /tmp/a/b/c /tmp
mv: cannot move `/tmp/a/b/c' to `/tmp/c': Device or resource busy
# mv /tmp/a/b /tmp
# ls -al /tmp/x
total 64
drwxr-xr-x  2 root root  4096 Oct  6 12:55 .
drwxrwxrwt 95 root root 57344 Oct  6 12:55 ..
# umount /tmp/x
umount: /tmp/x: device is busy.

The mount under /tmp/x/b/c is now inaccessible and lost forever.  The
only way to get rid of it is to lazy umount /tmp/x itself.

What this means is that the current EBUSY restriction only prevents
the mountpoint disappearing in a subset of cases.

Miklos
--

From: Eric W. Biederman
Date: Thursday, September 11, 2008 - 12:04 pm

I don't think only user mounts is the right filter.

We have support for lazy unmounts so it is possible to handle
that case.

Technically all we need to do is transform d_mounted from a counter
to a hlist_head and thread yet another list through struct vfs_mount
to track this.

I need to think about the semantics a little more before I have a good
feel of what makes sense.  In particular do we want a full
recursive lazy unmount or do we want to handle submounts in a different
way.

This also intersects in interesting ways with dcache pruning, and
automounting.

Eric
--

From: Eric W. Biederman
Date: Thursday, September 11, 2008 - 12:58 pm

In particular we already have mounts that expire.  So deleting the
files or directories of mount points (when allowed) should just be 
a case of expiring the mount point.

When it is ok to do that is a separate question.

Eric
--

Previous thread: [PATCH 8/9] Call idr_find() without locking in ipc_lock() by Nadia.Derbey on Wednesday, May 7, 2008 - 4:36 am. (3 messages)

Next thread: [PATCH 0/3] [2.6.26] ehea: Add DLPAR memory remove support by Hannes Hering on Wednesday, May 7, 2008 - 5:39 am. (1 message)