Re: [patch 3/6] vfs: mountinfo stable peer group id

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Miklos Szeredi <miklos@...>
Cc: <akpm@...>, <linuxram@...>, <linux-fsdevel@...>, <linux-kernel@...>, <Trond.Myklebust@...>, <dhowells@...>
Date: Friday, March 21, 2008 - 11:49 pm

On Thu, Mar 20, 2008 at 09:43:19PM +0000, Al Viro wrote:

Argh...  Doing release_mounts() after collection phase won't work ;-/
It would leave references to parents until the very end, leaving us
with false-busy shrinkable vfsmounts if we had shrinkable automounted
on top of shrinkable...

It does work for mark_mounts_for_expiry(), but not here.  We could do
the same kind of loops as now, releasing namespace_sem after each
portion of candidates, doing release_mounts() and regaining namespace_sem,
but that leaves us with indefinitely long stalls if somebody keeps
doing lookups triggering automounts.  OTOH, we probably could get away
with separate counter covering only that kind of references...  That
would be bumped in umount_tree() (at the same point where we decrement
d_mounted) and dropped in release_mounts() when we reset ->mnt_parent
and do mntput() on it.

Then we would simply make do_refcount_check() in pnode.c do
        int mycount = atomic_read(&mnt->mnt_count) - mnt->mnt_ghosts;
        return (mycount > count);
instead of what it does now, and everything would work fine...

So, let's define mnt->mnt_ghosts by requiring that outside of vfsmount_lock
it would be equal to number of vfsmounts with ->mnt_parent == mnt that are
_not_ on child list of mnt.

	We'd need to decrement it in release_mounts(), increment in
mnt_set_mountpoint(), decrement again in attach_mnt() (which strongly
suggests that increment should happen in _callers_ of mnt_set_mountpoint(),
so that attach_mnt() wouldn't modify it at all), decrement in commit_tree(),
and increment in umount_tree() at the same point where we play with d_mounted.
AFAICS, that's all.

	Shifting increment from mnt_set_mountpoint() and commit_tree()
to theirs callers and collapsing where possible, we get the following:
	* decrement in release_mounts() when resetting ->mnt_parent
	* increment in propagate_mnt() after call of mnt_set_mountpoint()
	* decrement in attach_recursive_mnt() in the loop calling
commit_tree() for clones (on mountpoint of each clone).
	* increment in umount_tree() at the point where we update d_mounted.

All these places are under vfsmount_lock, so we are fine with plain int; no
atomics needed.

So...  Attack plan: introduce mnt_ghosts+use it in propagate_mnt_busy()
(that gets rid of false-busy stuff), then switch shrink_submounts() and
mark_mounts_for_expiry() to the scheme from the previous posting, then
call shrink_submounts() from do_umount() unconditionally, removing it from
->umount_begin() instances, then restore sane prototype for shrink_submounts().
Four patches...

Comments?  Ram, Miklos, Trond?
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch 3/6] vfs: mountinfo stable peer group id, Miklos Szeredi, (Thu Mar 13, 5:26 pm)
Re: [patch 3/6] vfs: mountinfo stable peer group id, Miklos Szeredi, (Wed Mar 19, 12:41 pm)
Re: [patch 3/6] vfs: mountinfo stable peer group id, Miklos Szeredi, (Wed Mar 19, 2:37 pm)
Re: [patch 3/6] vfs: mountinfo stable peer group id, Miklos Szeredi, (Fri Mar 21, 4:57 am)
Re: [patch 3/6] vfs: mountinfo stable peer group id, Al Viro, (Fri Mar 21, 11:49 pm)
Re: [patch 3/6] vfs: mountinfo stable peer group id, Christoph Hellwig, (Mon Mar 24, 4:54 am)