login
Header Space

 
 

[RFC 1/2] AUFS: merging/stacking several filesystems

Previous thread: [PATCH 09/14] Unionfs: use noinline_for_stack by Erez Zadok on Tuesday, April 1, 2008 - 5:06 pm. (17 messages)

Next thread: [PATCH 0/7] XFS: case-insensitive lookup and Unicode support by Barry Naujok on Wednesday, April 2, 2008 - 2:25 am. (3 messages)
To: <linux-fsdevel@...>
Date: Wednesday, April 2, 2008 - 1:12 am

Hello fs-developers,

I am developing a stackable unification filesystem which unifies several
directories and provides a merged single directory.
I guess most people already knows what it is. When users access a file,
the access will be passed/re-directed/converted (sorry, I am not sure
which English word is correct) to the real file on the member
filesystem. The member filesystem is called 'lower filesytstem' or
'branch' and has a mode 'readonly' and 'readwrite.' And the file
deletion is handled as 'whiteout' on the upper writable branch.

On this ML, there have been discussions about UnionMount (Jan Blunck
and Bharata B Rao) and Unionfs (Erez Zadok). They took different
approaches to implement the merged-view.
The former tries putting it into VFS, and the latter implements as a
separate filesystem.
(If I misunderstand about these implementations, please let me know and
I shall correct it. Because it is a long time ago when I read their
source files last time.)
UnionMount's approach will be able to small, but may be hard to share
branches between several UnionMount since the whiteout in it is
implemented in the inode on branch filesystem and always
shared. According to Bharata's recent post, readdir does not seems to
be finished yet.
Unionfs has a longer history. When I got the idea of stacking
filesystem (Aug 2005), it already existed. It has virtual super_block,
inode, dentry and file objects and they have an array pointing lower
same kind objects. After contributing many patches for Unionfs, I
re-started my project AUFS (Jun 2006).

In AUFS, the structure of filesystem is simlilar to Unionfs, but I
implemented my own ideas, approaches and enhancements in it.
Here are some of them and the intention of this post is to get some
initial feedback about its design.
You can see the actual details, documents, CVS logs, and how people
are using it from
&lt;http://aufs.sf.net&gt;.

Kindly review and let me know your comments.


o file mapping -- mmap and sharing pages
------...
To: <linux-fsdevel@...>
Date: Thursday, April 3, 2008 - 2:58 am

o readdir -- virtual dir block on memory (VDIR)
----------------------------------------------------------------------
This is an approach I posted a few months ago replying UnionMount's
post. It constructs a virtual dir block on memory. For readdir, aufs
calls vfs_readdir() internally for each lower dirs, merges their
entries with eliminating the whiteout-ed ones, and gives it the the
file (dir) object. So the file object has its entry list until it is
closed. The entry list will be updated when the file position is zero
and becomes old. This decision is made in aufs automatically.

It may consume rather large memory and cpu cycles. To reduce the number
of memory allocations, the implementation became rather tricky .

Some people may call it can be a security hole or DoS attack since the
opened and once readdir-ed dir (file object) holds its entry list and
becomes a pressure for system memory. But I'd say it is similar to
files under /proc or /sys. The virtual files on procfs and sysfs also
holds a memory page (generally) while they are opened. When an idea to
reduce memory for them is introduced, it will be applied to aufs too.


o policies for selecting one among multiple writable branches,
  parent-dir, round-robin and most-free-space
----------------------------------------------------------------------
When the number of writable branch is more than one, aufs has to decide
the target branch for file creation or copy-up. By default, the highest
writable branch which has the parent (or ancestor) dir of the target
file is chosen (top-down-parent policy).
By user's request, aufs has some other policies to select the writable
branch, round-robin and most-free-space policies for file creation, and
top-down-parent, bottom-up-parent and bottom-up policies for copy-up.

As expected, the round-robin policy selects in circular. When you have
two writable branches and creates 10 new files, 5 files will be
created for each branch. mkdir(2) systemcall is an exception. When you
create ...
To: <akpm@...>, <linux-fsdevel@...>
Date: Monday, May 12, 2008 - 12:43 am

:::

I have posted some of ideas, design or approaches which are implemented
in AUFS stackable filesystem about a month before.
While I have a plan to implement some more features still, the current
AUFS status is better and used many people for years.
Since I have received requests to submit AUFS into the mainline more
than once, Now I'd ask you to include AUFS into mainline.
But the source is large (see below).
Should I send all of these files to this ML, or ask you to download them
from CVS?
If AUFS was much smaller, I would send files here without asking.


Junjiro Okajima

----------------------------------------------------------------------
$ wc -l fs/aufs25/*.[ch]
    56 fs/aufs25/aufs.h
   109 fs/aufs25/br_fuse.c
   391 fs/aufs25/br_nfs.c
    69 fs/aufs25/br_xfs.c
   932 fs/aufs25/branch.c
   345 fs/aufs25/branch.h
  1043 fs/aufs25/cpup.c
    82 fs/aufs25/cpup.h
   246 fs/aufs25/dcsub.c
    54 fs/aufs25/dcsub.h
   485 fs/aufs25/debug.c
   210 fs/aufs25/debug.h
  1020 fs/aufs25/dentry.c
   384 fs/aufs25/dentry.h
   425 fs/aufs25/dinfo.c
   573 fs/aufs25/dir.c
   146 fs/aufs25/dir.h
   113 fs/aufs25/dlgt.c
   597 fs/aufs25/export.c
   661 fs/aufs25/f_op.c
   826 fs/aufs25/file.c
   246 fs/aufs25/file.h
   185 fs/aufs25/finfo.c
   708 fs/aufs25/hin_or_dlgt.c
   188 fs/aufs25/hinode.h
  1114 fs/aufs25/hinotify.c
   844 fs/aufs25/i_op.c
   828 fs/aufs25/i_op_add.c
   582 fs/aufs25/i_op_del.c
   832 fs/aufs25/i_op_ren.c
   290 fs/aufs25/iinfo.c
   425 fs/aufs25/inode.c
   336 fs/aufs25/inode.h
   307 fs/aufs25/misc.c
   201 fs/aufs25/misc.h
   243 fs/aufs25/module.c
    78 fs/aufs25/module.h
  1489 fs/aufs25/opts.c
   245 fs/aufs25/opts.h
   349 fs/aufs25/plink.c
   111 fs/aufs25/robr.c
   268 fs/aufs25/sbinfo.c
   906 fs/aufs25/super.c
   409 fs/aufs25/super.h
   107 fs/aufs25/sysaufs.c
   150 fs/aufs25/sysaufs.h
   498 fs/aufs25/sysfs.c
   112 fs/aufs25/sysrq.c
   963 fs/aufs25/vdir.c
   653 fs/aufs25/vfsub.c
   493 fs/aufs25/vfsub.h
   69...
To: <hooanon05@...>
Cc: <linux-fsdevel@...>
Date: Monday, May 12, 2008 - 12:54 am

We'd be interested in hearing if aufs addresses any of the shortcomings

Nobody is set up to handle cvs, sorry.  It would be worth the time to
migrate it to git.

--
To: Andrew Morton <akpm@...>
Cc: <linux-fsdevel@...>, <linux-kernel@...>
Date: Friday, May 16, 2008 - 10:20 am

Ok, I will send these patches to two MLs.
I'd ask linux-kernel people to read previous posts too.
http://marc.info/?l=linux-fsdevel&amp;m=120716468102834&amp;w=2
http://marc.info/?l=linux-fsdevel&amp;m=120720593114664&amp;w=2

 Documentation/filesystems/aufs/README      |  374 +++++++
 Documentation/filesystems/aufs/aufs.5      | 1608 ++++++++++++++++++++++++++++
 Documentation/filesystems/aufs/aulchown.c  |   29 +
 Documentation/filesystems/aufs/auplink     |  170 +++
 Documentation/filesystems/aufs/mount.aufs  |  205 ++++
 Documentation/filesystems/aufs/umount.aufs |   33 +
 fs/Kconfig                                 |    2 +
 fs/Makefile                                |    1 +
 fs/aufs/Kconfig                            |  203 ++++
 fs/aufs/Makefile                           |   57 +
 fs/aufs/aufs.h                             |   56 +
 fs/aufs/br_fuse.c                          |  109 ++
 fs/aufs/br_nfs.c                           |  391 +++++++
 fs/aufs/br_xfs.c                           |   69 ++
 fs/aufs/branch.c                           |  933 ++++++++++++++++
 fs/aufs/branch.h                           |  345 ++++++
 fs/aufs/cpup.c                             | 1043 ++++++++++++++++++
 fs/aufs/cpup.h                             |   82 ++
 fs/aufs/dcsub.c                            |  246 +++++
 fs/aufs/dcsub.h                            |   54 +
 fs/aufs/debug.c                            |  485 +++++++++
 fs/aufs/debug.h                            |  210 ++++
 fs/aufs/dentry.c                           | 1020 ++++++++++++++++++
 fs/aufs/dentry.h                           |  384 +++++++
 fs/aufs/dinfo.c                            |  425 ++++++++
 fs/aufs/dir.c                              |  573 ++++++++++
 fs/aufs/dir.h                              |  146 +++
 fs/aufs/dlgt.c                             |  113 ++
 fs/aufs/export.c                           |  597 +++++++++++
 fs/aufs/f_op.c                             |  665 ++++++++++++
 fs/aufs/file.c            ...
To: Andrew Morton <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>
Date: Friday, May 16, 2008 - 10:36 am

These pathces are against linux-trees.git v2.6.25-mm1.


Junjiro Okajima
--
To: <linux-fsdevel@...>
Date: Thursday, April 3, 2008 - 2:53 am

o readdir -- virtual dir block on memory (VDIR)
----------------------------------------------------------------------
This is an approach I posted a few months ago replying UnionMount's
post. It constructs a virtual dir block on memory. For readdir, aufs
calls vfs_readdir() internally for each lower dirs, merges their
entries with eliminating the whiteout-ed ones, and gives it the the
file (dir) object. So the file object has its entry list until it is
closed. The entry list will be updated when the file position is zero
and becomes old. This decision is made in aufs automatically.

It may consume rather large memory and cpu cycles. To reduce the number
of memory allocations, the implementation became rather tricky .

Some people may call it can be a security hole or DoS attack since the
opened and once readdir-ed dir (file object) holds its entry list and
becomes a pressure for system memory. But I'd say it is similar to
files under /proc or /sys. The virtual files on procfs and sysfs also
holds a memory page (generally) while they are opened. When an idea to
reduce memory for them is introduced, it will be applied to aufs too.


o policies for selecting one among multiple writable branches,
  parent-dir, round-robin and most-free-space
----------------------------------------------------------------------
When the number of writable branch is more than one, aufs has to decide
the target branch for file creation or copy-up. By default, the highest
writable branch which has the parent (or ancestor) dir of the target
file is chosen (top-down-parent policy).
By user's request, aufs has some other policies to select the writable
branch, round-robin and most-free-space policies for file creation, and
top-down-parent, bottom-up-parent and bottom-up policies for copy-up.

As expected, the round-robin policy selects in circular. When you have
two writable branches and creates 10 new files, 5 files will be
created for each branch. mkdir(2) systemcall is an exception. When you
create ...
To: <linux-fsdevel@...>
Date: Wednesday, April 2, 2008 - 11:11 am

Hi all.

I am happy to see AuFS author in this list, and I hope there will be some people who review the design and post own comments and ideas, in order to make AuFS even better. :) I am an AuFS user for a long time and what I really appreciate (from the user's point of view) is the following:

- AuFS supports writable branch balancing. That means, you can setup several partitions for writing and AuFS will split all new/modified files between them, based on free disk space, existence of parent directory, randomly, or combinations.

- AuFS supports huge amount of branches. I'm currently using hundreds of branches without just a small slowdown (which is obvious).

- AuFS provides a list of branches through /sys, which doesn't have the limitation like /proc/mounts. For that reason, it works correctly even with thousand of branches (while so much branches would break /proc/mounts at all).

- AuFS implements 'rr' branch mode, it means 'really-readonly'. This is really useful, particularly for ISO images or SquashFS filesystems as a brach, as AuFS doesn't need to re-lookup those filesystems. (You know, a readonly branch 'ro' can be modified from another place, eg. network, so there can occur a 'direct branch access' even for read-only directories and AuFS handles it correctly.)

- last, but not the least, AuFS is really stable in real world situations. I used unionfs in the past, but my second name for it was 'NULL POINTER DEREFERENCE'. I can see those errors still happening in latest unionfs as well, last one I've found is from 27th of May 2008 ... BUG: unable to handle kernel NULL pointer dereference. ... I have absolutely no idea what that means, but the same errors keep appearing in unionfs for years. You won't see anything like that in AuFS. Guess why knoppix and other projects switched to it :)

That's all from me :)
Thanks

Tomas M
slax.org

--
To: Tomas M <tomas@...>
Cc: <linux-fsdevel@...>
Date: Thursday, April 3, 2008 - 2:56 am

Thanx Tomas,


Strictly speaking, it is not a limitaion of /proc/mounts but mount(8) or
/etc/mtab. I corrected the aufs document and myself a few weeks ago.


Junjiro Okajima
--
Previous thread: [PATCH 09/14] Unionfs: use noinline_for_stack by Erez Zadok on Tuesday, April 1, 2008 - 5:06 pm. (17 messages)

Next thread: [PATCH 0/7] XFS: case-insensitive lookup and Unicode support by Barry Naujok on Wednesday, April 2, 2008 - 2:25 am. (3 messages)
speck-geostationary