Re: [UNIONFS] 00/29 Unionfs and related patches pre-merge review (v2)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Al Viro <viro@...>
Cc: Erez Zadok <ezk@...>, <torvalds@...>, <akpm@...>, <hch@...>, <viro@...>, <linux-kernel@...>, <linux-fsdevel@...>, <mhalcrow@...>
Date: Saturday, February 2, 2008 - 2:45 pm

In message <20080126084532.GG27894@ZenIV.linux.org.uk>, Al Viro writes:

[concerns about lower directories moving around...]


OK, so if I understand you, your concerns center around the fact that lower
directories can be moved around (i.e., topology changes), what happens then
for operations that go through the stackable f/s, and what should users
expect to see.


Since stacking and NFS have some similarities, I first checked w/ the NFS
people to see what are their semantics in a similar scenario: an NFS client
could be validating a directory, then issue, say, a ->create; but in
between, the server could have moved the directory that was validated.  In
NFS, the ->create operation succeeds, and creates the file in the new
location of the directory which was validated.

Unionfs's behavior is similar: the newly created file will be successfully
created in the moved directory.  The only exception is that if a lower
branch is marked readonly by unionfs, a copyup will take place.

This had not been a problem for unionfs users to date.  The main reason is
that when unionfs users modify lower files, they often do so while there's
little to no activity going through the union itself.  And while it doesn't
prevent directories from being moved around, this common usage mode does
reduce the frequency in which topology changes can be an issue for unionfs
users.

I'll submit a patch to document this behavior.


Well, it was UCB, UCLA, and Sun.  I don't think back in the early 90s they
were too concerned about topology changes; f/s stacking was a new idea and
they wanted to explore what can be done with it conceptually, not produce
commercial-grade software (still, remind me to tell you the first-hand story
I've learned about of how full-blown stacking _almost_ made it into Solaris
2.0 :-)

The only known reference to try and address this coherency problem was
Heidemann's SOSP'95 paper titled "Performance of cache coherence in
stackable filing."  The paper advocated changing the whole VFS and the
caches (page cache + dnlc) to create a "unified cache manager" that was
aware of complex layering topologies (including fan-out and fan-in).  It was
even able to handle compression layers, where file data offsets change b/t
the layers (a nasty problem).  Code for this unified cache manager was never
released AFAIK.  I think Heidemann's approach was elegant, but I don't think
it was practical as it required radical VFS/VM surgery.  Ironically, MS
Windows has a single I/O cache manager that all storage and filesystem
modules talk to directly (they're not allowed to pass IRPs directly b/t
layers): so Windows can handle such coherency better than most Unix systems
can today.

I've always thought of a different way to allow users to write to lower
branches -- through the union.  This is similar to what an old AT&T
unioning-like file system named "3DFS" did.  3DFS introduced a new directory
called "..." so if you cd to /mntpt/... then you got to the next level down
the stack (as if you popped the top one and now you see how the union looks
like without the top layer).  And if you "cd /mntpt/.../..." then you see
the view without the top two layers, etc.

So my idea is similar: to introduce virtual directory views that restrict
access to a single lower branch within the union.  So if someone does a "cd
/mnt/unionfs/.1" then they get access to branch 1; "cd /mnt/unionfs/.2" gets
access to branch 2; etc.  While this technique will waste a few names, it's
probably worth the savings in terms of cache-coherency pains (plus, the
actual virtual directory names can be configurable at mount time to allow
users to choose a non-conflicting dir name).  With this idea, users will be
actually accessing a one-branch union, but all ops and locks will have to go
through the union: no one would be able to modify lower files directly.

However, for this "view" idea to work, I'll need a way to lock-out or hide
the lower directories from the namespace, so no one can access them in r-w
mode any longer (if at all).  Moreover, even if such a method was available,
one would have to decide what to do with any open files on the lower branch
(killing such processes or closing the descriptors seems somewhat harsh --
maybe there'd be a way to "promote" them up the stack?)


I agree that WH is really needed and would certainly help.  I've always felt
that WH support wasn't technically very challenging, but logistically it was
going to be a nightmare -- having to coordinate with many
subsystem/filesystem maintainers over a lengthy period of time, no?  You
seem to be more optimistic that it can be achieved more quickly.  If we can
talk in more detail about it in person, I wouldn't mind taking a stab at
this.

BTW, before such WH support can be useful enough for unionfs, it'll have to
be supported in ext*, tmpfs, _and_ nfs* (those are the most popular ones in
use).

Cheers,
Erez.
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [UNIONFS] 00/29 Unionfs and related patches pre-merge re..., Erez Zadok, (Sat Feb 2, 2:45 pm)
Re: [UNIONFS] 00/29 Unionfs and related patches pre-merge re..., Christoph Hellwig, (Thu Jan 10, 11:08 am)
[PATCH 06/29] Unionfs: main header file, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 20/29] Unionfs: super_block operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 11/29] Unionfs: lower-level lookup routines, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 21/29] Unionfs: extended attributes operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 10/29] Unionfs: dentry revalidation, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 24/29] Unionfs: debugging infrastructure, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 12/29] Unionfs: rename method and helpers, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 05/29] Unionfs: fanout header definitions, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 28/29] VFS: export release_open_intent symbol, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 14/29] Unionfs: readdir helper functions, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 25/29] Unionfs file system magic number, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 16/29] Unionfs: inode operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 18/29] Unionfs: address-space operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 23/29] Unionfs: miscellaneous helper routines, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 04/29] Unionfs: main Makefile, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 22/29] Unionfs: async I/O queue, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 08/29] Unionfs: basic file operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 15/29] Unionfs: readdir state helpers, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 03/29] Makefile: hook to compile unionfs, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 01/29] Unionfs: documentation, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 17/29] Unionfs: unlink/rmdir operations, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 09/29] Unionfs: lower-level copyup routines, Erez Zadok, (Thu Jan 10, 10:59 am)
[PATCH 27/29] VFS path get/put ops used by Unionfs, Erez Zadok, (Thu Jan 10, 10:59 am)