Hello fs-developers, I am developing a stackable unification filesystem which unifies several directories and provides a merged single directory. I guess most people already knows what it is. When users access a file, the access will be passed/re-directed/converted (sorry, I am not sure which English word is correct) to the real file on the member filesystem. The member filesystem is called 'lower filesytstem' or 'branch' and has a mode 'readonly' and 'readwrite.' And the file deletion is handled as 'whiteout' on the upper writable branch. On this ML, there have been discussions about UnionMount (Jan Blunck and Bharata B Rao) and Unionfs (Erez Zadok). They took different approaches to implement the merged-view. The former tries putting it into VFS, and the latter implements as a separate filesystem. (If I misunderstand about these implementations, please let me know and I shall correct it. Because it is a long time ago when I read their source files last time.) UnionMount's approach will be able to small, but may be hard to share branches between several UnionMount since the whiteout in it is implemented in the inode on branch filesystem and always shared. According to Bharata's recent post, readdir does not seems to be finished yet. Unionfs has a longer history. When I got the idea of stacking filesystem (Aug 2005), it already existed. It has virtual super_block, inode, dentry and file objects and they have an array pointing lower same kind objects. After contributing many patches for Unionfs, I re-started my project AUFS (Jun 2006). In AUFS, the structure of filesystem is simlilar to Unionfs, but I implemented my own ideas, approaches and enhancements in it. Here are some of them and the intention of this post is to get some initial feedback about its design. You can see the actual details, documents, CVS logs, and how people are using it from <http://aufs.sf.net>. Kindly review and let me know your comments. o file mapping -- mmap and sharing pages ------...
o readdir -- virtual dir block on memory (VDIR) ---------------------------------------------------------------------- This is an approach I posted a few months ago replying UnionMount's post. It constructs a virtual dir block on memory. For readdir, aufs calls vfs_readdir() internally for each lower dirs, merges their entries with eliminating the whiteout-ed ones, and gives it the the file (dir) object. So the file object has its entry list until it is closed. The entry list will be updated when the file position is zero and becomes old. This decision is made in aufs automatically. It may consume rather large memory and cpu cycles. To reduce the number of memory allocations, the implementation became rather tricky . Some people may call it can be a security hole or DoS attack since the opened and once readdir-ed dir (file object) holds its entry list and becomes a pressure for system memory. But I'd say it is similar to files under /proc or /sys. The virtual files on procfs and sysfs also holds a memory page (generally) while they are opened. When an idea to reduce memory for them is introduced, it will be applied to aufs too. o policies for selecting one among multiple writable branches, parent-dir, round-robin and most-free-space ---------------------------------------------------------------------- When the number of writable branch is more than one, aufs has to decide the target branch for file creation or copy-up. By default, the highest writable branch which has the parent (or ancestor) dir of the target file is chosen (top-down-parent policy). By user's request, aufs has some other policies to select the writable branch, round-robin and most-free-space policies for file creation, and top-down-parent, bottom-up-parent and bottom-up policies for copy-up. As expected, the round-robin policy selects in circular. When you have two writable branches and creates 10 new files, 5 files will be created for each branch. mkdir(2) systemcall is an exception. When you create ...
:::
I have posted some of ideas, design or approaches which are implemented
in AUFS stackable filesystem about a month before.
While I have a plan to implement some more features still, the current
AUFS status is better and used many people for years.
Since I have received requests to submit AUFS into the mainline more
than once, Now I'd ask you to include AUFS into mainline.
But the source is large (see below).
Should I send all of these files to this ML, or ask you to download them
from CVS?
If AUFS was much smaller, I would send files here without asking.
Junjiro Okajima
----------------------------------------------------------------------
$ wc -l fs/aufs25/*.[ch]
56 fs/aufs25/aufs.h
109 fs/aufs25/br_fuse.c
391 fs/aufs25/br_nfs.c
69 fs/aufs25/br_xfs.c
932 fs/aufs25/branch.c
345 fs/aufs25/branch.h
1043 fs/aufs25/cpup.c
82 fs/aufs25/cpup.h
246 fs/aufs25/dcsub.c
54 fs/aufs25/dcsub.h
485 fs/aufs25/debug.c
210 fs/aufs25/debug.h
1020 fs/aufs25/dentry.c
384 fs/aufs25/dentry.h
425 fs/aufs25/dinfo.c
573 fs/aufs25/dir.c
146 fs/aufs25/dir.h
113 fs/aufs25/dlgt.c
597 fs/aufs25/export.c
661 fs/aufs25/f_op.c
826 fs/aufs25/file.c
246 fs/aufs25/file.h
185 fs/aufs25/finfo.c
708 fs/aufs25/hin_or_dlgt.c
188 fs/aufs25/hinode.h
1114 fs/aufs25/hinotify.c
844 fs/aufs25/i_op.c
828 fs/aufs25/i_op_add.c
582 fs/aufs25/i_op_del.c
832 fs/aufs25/i_op_ren.c
290 fs/aufs25/iinfo.c
425 fs/aufs25/inode.c
336 fs/aufs25/inode.h
307 fs/aufs25/misc.c
201 fs/aufs25/misc.h
243 fs/aufs25/module.c
78 fs/aufs25/module.h
1489 fs/aufs25/opts.c
245 fs/aufs25/opts.h
349 fs/aufs25/plink.c
111 fs/aufs25/robr.c
268 fs/aufs25/sbinfo.c
906 fs/aufs25/super.c
409 fs/aufs25/super.h
107 fs/aufs25/sysaufs.c
150 fs/aufs25/sysaufs.h
498 fs/aufs25/sysfs.c
112 fs/aufs25/sysrq.c
963 fs/aufs25/vdir.c
653 fs/aufs25/vfsub.c
493 fs/aufs25/vfsub.h
69...We'd be interested in hearing if aufs addresses any of the shortcomings Nobody is set up to handle cvs, sorry. It would be worth the time to migrate it to git. --
Ok, I will send these patches to two MLs. I'd ask linux-kernel people to read previous posts too. http://marc.info/?l=linux-fsdevel&m=120716468102834&w=2 http://marc.info/?l=linux-fsdevel&m=120720593114664&w=2 Documentation/filesystems/aufs/README | 374 +++++++ Documentation/filesystems/aufs/aufs.5 | 1608 ++++++++++++++++++++++++++++ Documentation/filesystems/aufs/aulchown.c | 29 + Documentation/filesystems/aufs/auplink | 170 +++ Documentation/filesystems/aufs/mount.aufs | 205 ++++ Documentation/filesystems/aufs/umount.aufs | 33 + fs/Kconfig | 2 + fs/Makefile | 1 + fs/aufs/Kconfig | 203 ++++ fs/aufs/Makefile | 57 + fs/aufs/aufs.h | 56 + fs/aufs/br_fuse.c | 109 ++ fs/aufs/br_nfs.c | 391 +++++++ fs/aufs/br_xfs.c | 69 ++ fs/aufs/branch.c | 933 ++++++++++++++++ fs/aufs/branch.h | 345 ++++++ fs/aufs/cpup.c | 1043 ++++++++++++++++++ fs/aufs/cpup.h | 82 ++ fs/aufs/dcsub.c | 246 +++++ fs/aufs/dcsub.h | 54 + fs/aufs/debug.c | 485 +++++++++ fs/aufs/debug.h | 210 ++++ fs/aufs/dentry.c | 1020 ++++++++++++++++++ fs/aufs/dentry.h | 384 +++++++ fs/aufs/dinfo.c | 425 ++++++++ fs/aufs/dir.c | 573 ++++++++++ fs/aufs/dir.h | 146 +++ fs/aufs/dlgt.c | 113 ++ fs/aufs/export.c | 597 +++++++++++ fs/aufs/f_op.c | 665 ++++++++++++ fs/aufs/file.c ...
These pathces are against linux-trees.git v2.6.25-mm1. Junjiro Okajima --
o readdir -- virtual dir block on memory (VDIR) ---------------------------------------------------------------------- This is an approach I posted a few months ago replying UnionMount's post. It constructs a virtual dir block on memory. For readdir, aufs calls vfs_readdir() internally for each lower dirs, merges their entries with eliminating the whiteout-ed ones, and gives it the the file (dir) object. So the file object has its entry list until it is closed. The entry list will be updated when the file position is zero and becomes old. This decision is made in aufs automatically. It may consume rather large memory and cpu cycles. To reduce the number of memory allocations, the implementation became rather tricky . Some people may call it can be a security hole or DoS attack since the opened and once readdir-ed dir (file object) holds its entry list and becomes a pressure for system memory. But I'd say it is similar to files under /proc or /sys. The virtual files on procfs and sysfs also holds a memory page (generally) while they are opened. When an idea to reduce memory for them is introduced, it will be applied to aufs too. o policies for selecting one among multiple writable branches, parent-dir, round-robin and most-free-space ---------------------------------------------------------------------- When the number of writable branch is more than one, aufs has to decide the target branch for file creation or copy-up. By default, the highest writable branch which has the parent (or ancestor) dir of the target file is chosen (top-down-parent policy). By user's request, aufs has some other policies to select the writable branch, round-robin and most-free-space policies for file creation, and top-down-parent, bottom-up-parent and bottom-up policies for copy-up. As expected, the round-robin policy selects in circular. When you have two writable branches and creates 10 new files, 5 files will be created for each branch. mkdir(2) systemcall is an exception. When you create ...
Hi all. I am happy to see AuFS author in this list, and I hope there will be some people who review the design and post own comments and ideas, in order to make AuFS even better. :) I am an AuFS user for a long time and what I really appreciate (from the user's point of view) is the following: - AuFS supports writable branch balancing. That means, you can setup several partitions for writing and AuFS will split all new/modified files between them, based on free disk space, existence of parent directory, randomly, or combinations. - AuFS supports huge amount of branches. I'm currently using hundreds of branches without just a small slowdown (which is obvious). - AuFS provides a list of branches through /sys, which doesn't have the limitation like /proc/mounts. For that reason, it works correctly even with thousand of branches (while so much branches would break /proc/mounts at all). - AuFS implements 'rr' branch mode, it means 'really-readonly'. This is really useful, particularly for ISO images or SquashFS filesystems as a brach, as AuFS doesn't need to re-lookup those filesystems. (You know, a readonly branch 'ro' can be modified from another place, eg. network, so there can occur a 'direct branch access' even for read-only directories and AuFS handles it correctly.) - last, but not the least, AuFS is really stable in real world situations. I used unionfs in the past, but my second name for it was 'NULL POINTER DEREFERENCE'. I can see those errors still happening in latest unionfs as well, last one I've found is from 27th of May 2008 ... BUG: unable to handle kernel NULL pointer dereference. ... I have absolutely no idea what that means, but the same errors keep appearing in unionfs for years. You won't see anything like that in AuFS. Guess why knoppix and other projects switched to it :) That's all from me :) Thanks Tomas M slax.org --
Thanx Tomas, Strictly speaking, it is not a limitaion of /proc/mounts but mount(8) or /etc/mtab. I corrected the aufs document and myself a few weeks ago. Junjiro Okajima --
| david | Re: Linux 2.6.27-rc8 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Kumar Gala | PCI Failed to allocate mem for PCI ROM |
| Francois Romieu | Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out" |
git: | |
| Matthieu Moy | git push to a non-bare repository |
| Peter Stahlir | Git as a filesystem |
| Bill Lear | Meaning of "fatal: protocol error: bad line length character"? |
| Junio C Hamano | A note from the maintainer |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Chris Kuethe | Re: OpenBSD 4.4 amd64 bsd.mp can't detect 4GB memory |
| Austin English | Wine on OpenBSD |
| Darrian Hale | Re: uvm_mapent_alloc: out of static map entries on 4.3 i386 |
| John P Poet | Realtek 8111C transmit timed out |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Octavian Purdila | [RFC] support for IEEE 1588 |
