Re: [RFC PATCH 0/5] Shadow directories

Previous thread: none

Next thread: [0/3] Distributed storage. Mirror algo extension for automatic recovery. by Evgeniy Polyakov on Thursday, October 18, 2007 - 3:17 pm. (3 messages)
To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:21 am

Hello,

Let's say we have an archive file "hello.zip" with a hello world program source
code. We want to do this:
cat hello.zip^/hello.c
gcc hello.zip^/hello.c -o hello
etc..

The '^' is an escape character and it tells the computer to treat the file as a directory.
[Note: We can't do "cat hello.zip/hello.c" because of http://lwn.net/Articles/100148/ ]
The kernel patch implements only a redirection of the request to another directory
("shadow directory") where a FUSE server must be mounted. The decompression of
archives is entirely handled in the user space. More info can be found in the documentation
patch in the series.

The shadow directories are used in RheaVFS project [ http://rheavfs.sourceforge.net/ ],
and it also can be used with the original AVFS [ http://www.inf.bme.hu/~mszeredi/avfs/ ].

The patches are against vanilla 2.6.23.
This is my first bigger contribution to the kernel so please be gentle ;-)

Jara

--
"Elves and Dragons!" I says to him. "Cabbages and potatoes are better
for you and me." -- J. R. R. Tolkien
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 12:05 pm

Too bad, since ^ is a valid character in a *file*name. Everything is, with
the exception of '\0' and '/'. At the end of the day, there are no control
characters you could use.

But what you could do is: write a FUSE fs that mirrors the lower content
(lofs/fuseloop/however it was named) and expands .zip files as
directories are readdir'ed or the zip files stat'ed. That saves us
from cluttering up the Linux VFS with such stuff.
-

To: Jan Engelhardt <jengelh@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 1:07 pm

Yes, that's exactly what RheaVFS and AVFS do. Except that they both use an escape
character because:
1. without it some programs may break [ http://lwn.net/Articles/100148/ ]
2. it's very useful to pass additional parameters after the escape char to the server.

We can start VFS servers (mentioned above) and chroot the whole user session into
the mount directory of the server. It works but it's very slow, practically unusable.
So both servers need some kind of VFS redirector. In the past there were many
different approaches -- LD_PRELOAD hack, CodaFS hack, NFS hack (?), proof-of-concept
kernel hacks (project podfuk) etc.

If anybody can think of any other solution of the "redirector problem", possibly
even non-kernel based one, let me know and I'd be glad :-)

--
I find television very educating. Every time somebody turns on the set,
I go into the other room and read a book.
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 1:10 pm

Sounds like a program bug, since NTFS-3G is proof of concept that FUSE
-

To: Jan Engelhardt <jengelh@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 4:10 pm

Good point, I'll look onto it.

A minor implementation problem with chrooted environment is that the FUSE VFS server
must be run with root privileges to allow setuid programs on the mounted filesystems.
But it's certainly doable.

--
"Elves and Dragons!" I says to him. "Cabbages and potatoes are better
for you and me." -- J. R. R. Tolkien
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 4:12 pm

You would not want user-supplied filesystems to carry SUID bits...
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 12:30 pm

Wouldn't you do this as a user space filesystem?
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 12:33 pm

Which is what you were saying.

*SMACK* I so stupid.
-

To: <jaroslav.sykora@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 12:53 pm

On third thoughts, what's the reason for this?
-

To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:28 am

Documentation of the shadow directories.

Signed-off-by: Jaroslav Sykora <jaroslav.sykora@gmail.com>

Documentation/filesystems/shadow-directories.txt | 177 +++++++++++++
1 file changed, 177 insertions(+)

--- /dev/null 2007-10-18 09:34:42.624413454 +0200
+++ new/Documentation/filesystems/shadow-directories.txt 2007-10-18 17:03:06.000000000 +0200
@@ -0,0 +1,177 @@
+Shadow directories
+==================
+
+The Goal
+--------
+
+Let's say we have an archive file "hello.zip" with a hello world program source
+code. We want to do this:
+ cat hello.zip^/hello.c
+
+The '^' is an escape character and it tells the computer to treat the file
+as a directory.
+[Note: We can't do "cat hello.zip/hello.c" because of http://lwn.net/Articles/100148/ ]
+
+One way to implement the scenario above is to create a FUSE VFS server and chroot
+everything into it. This will work, but poorly. The performance will be low
+and many things, like setuid binaries, won't principally work (iff the server
+doesn't have root privileges).
+
+
+The Principle
+-------------
+
+For every process we define two VFS trees:
+(1) the standard system-wide tree, managed by mount/umount, implemented by native
+ filesystems like ext3, reiserfs, etc..;
+(2) a per-process shadow tree, usually implemented by FUSE.
+
+The main change is within VFS look up code: A file name is looked up in a standard
+tree and if it's found we're done. If not the name is transparently looked up
+in a shadow tree.
+
+[Picture: A standard and a shadow tree. The shadow tree will be in fact mounted
+ on some point in the standard tree, e.g. "/home/jara/.vfs/mnt". ]
+
+ Standard Shadow
+ "/" "/"
+ ,------|-------, ,-----|------,
+ bin home usr bin home usr
+ | |
+ jara jara
+ ,----|-----, ,----|-----,-------------,
+ tmp h...

To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:26 am

Procfs interface: /proc/<pid>/status, /proc/<pid>/{root-shdw, cwd-shdw}.

Signed-off-by: Jaroslav Sykora <jaroslav.sykora@gmail.com>

fs/proc/array.c | 23 +++++++++++++++++++
fs/proc/base.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 76 insertions(+)

--- orig/fs/proc/base.c 2007-10-07 19:00:20.000000000 +0200
+++ new/fs/proc/base.c 2007-10-07 13:39:08.000000000 +0200
@@ -171,6 +171,32 @@ static int proc_cwd_link(struct inode *i
return result;
}

+static int proc_shdwcwd_link(struct inode *inode, struct dentry **dentry,
+ struct vfsmount **mnt)
+{
+ struct task_struct *task = get_proc_task(inode);
+ struct fs_struct *fs = NULL;
+ int result = -ENOENT;
+
+ if (task) {
+ fs = get_fs_struct(task);
+ put_task_struct(task);
+ }
+ if (fs) {
+ read_lock(&fs->lock);
+ *dentry = dget(fs->shdwpwd);
+ if (fs->shdwpwd)
+ *mnt = mntget(fs->shdwpwdmnt);
+ else
+ *mnt = NULL;
+ read_unlock(&fs->lock);
+ if (*dentry)
+ result = 0;
+ put_fs_struct(fs);
+ }
+ return result;
+}
+
static int proc_root_link(struct inode *inode, struct dentry **dentry, struct vfsmount **mnt)
{
struct task_struct *task = get_proc_task(inode);
@@ -192,6 +218,29 @@ static int proc_root_link(struct inode *
return result;
}

+static int proc_shdwroot_link(struct inode *inode, struct dentry **dentry,
+ struct vfsmount **mnt)
+{
+ struct task_struct *task = get_proc_task(inode);
+ struct fs_struct *fs = NULL;
+ int result = -ENOENT;
+
+ if (task) {
+ fs = get_fs_struct(task);
+ put_task_struct(task);
+ }
+ if (fs) {
+ read_lock(&fs->lock);
+ *mnt = mntget(fs->shdwrootmnt);
+ *dentry = dget(fs->shdwroot);
+ read_unlock(&fs->lock);
+ if (*dentry)
+ result = 0;
+ put_fs_struct(fs);
+ }
+ return result;
+}
+
#define MAY_PTRACE(task) \
(task == current || \
(task->parent == current && \
@@ -2094,6 +2143,8 @@ static const struct pid_...

To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:25 am

sys_chdir and sys_fchdir changes.

Signed-off-by: Jaroslav Sykora <jaroslav.sykora@gmail.com>

fs/open.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 73 insertions(+), 6 deletions(-)

--- orig/fs/open.c 2007-10-07 19:00:19.000000000 +0200
+++ new/fs/open.c 2007-10-16 21:04:56.000000000 +0200
@@ -476,13 +476,51 @@ asmlinkage long sys_access(const char __
return sys_faccessat(AT_FDCWD, filename, mode);
}

+static inline int read_fs_flags(void)
+{
+ int res;
+ read_lock(&current->fs->lock);
+ res = current->fs->flags;
+ read_unlock(&current->fs->lock);
+ return res;
+}
+
+void set_fs_shdwpwd(struct fs_struct *fs,
+ struct vfsmount *mnt, struct dentry *dentry)
+{
+ struct dentry *old_dentry;
+ struct vfsmount *old_mnt;
+
+ BUG_ON(dentry != NULL && mnt == NULL);
+ write_lock(&fs->lock);
+ /* set shadow pwd */
+ old_dentry = fs->shdwpwd;
+ old_mnt = fs->shdwpwdmnt;
+ fs->shdwpwd = dget(dentry);
+ if (dentry)
+ fs->shdwpwdmnt = mntget(mnt);
+ else
+ /* PTR_ERR flag */
+ fs->shdwpwdmnt = mnt;
+ write_unlock(&fs->lock);
+
+ if (old_dentry) {
+ mntput(old_mnt);
+ dput(old_dentry);
+ }
+}
+
asmlinkage long sys_chdir(const char __user * filename)
{
struct nameidata nd;
- int error;
+ char *tmp = getname(filename);
+ int error = PTR_ERR(tmp);;
+
+ if (IS_ERR(tmp))
+ goto out_badname;

- error = __user_walk(filename,
- LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_CHDIR, &nd);
+ error = path_lookup(tmp, LOOKUP_FOLLOW | LOOKUP_DIRECTORY
+ | LOOKUP_CHDIR, &nd);
if (error)
goto out;

@@ -490,11 +528,23 @@ asmlinkage long sys_chdir(const char __u
if (error)
goto dput_and_out;

- set_fs_pwd(current->fs, nd.mnt, nd.dentry);
+ if (!(read_fs_flags() & SHDW_ENABLED))
+ goto set_std;

+ if (!(nd.flags & LOOKUP_INSHDW))
+ set_fs_shdwpwd(current->fs, NULL, NULL);
+ else
+ /* shadow == std */
+ set_fs_shdwpwd(...

To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:23 am

Implements two stage lookup with escape character filtering
and system calls for i386.
Changes lookup path, namely do_path_lookup. This function is split
into path_lookup_norm(), which performs standard name lookup,
and path_lookup_shdw(), which performs name lookup in an associated shadow directory.

Signed-off-by: Jaroslav Sykora <jaroslav.sykora@gmail.com>

arch/i386/kernel/syscall_table.S | 6
fs/exec.c | 4
fs/file_table.c | 19
fs/namei.c | 610 ++++++++++++++++++++++++++++-
fs/namespace.c | 13
include/linux/syscalls.h | 6
kernel/exit.c | 8
kernel/fork.c | 20
8 files changed, 672 insertions(+), 14 deletions(-)

--- orig/fs/namei.c 2007-10-07 19:00:19.000000000 +0200
+++ new/fs/namei.c 2007-10-18 15:35:54.000000000 +0200
@@ -31,6 +31,7 @@
#include <linux/file.h>
#include <linux/fcntl.h>
#include <linux/namei.h>
+#include <linux/ptrace.h>
#include <asm/namei.h>
#include <asm/uaccess.h>

@@ -515,6 +516,25 @@ static struct dentry * real_lookup(struc
return result;
}

+static inline int use_shadow(struct fs_struct *fs, struct nameidata *nd)
+{
+ /* assert: fs->lock held */
+ return (fs->flags & SHDW_ENABLED) && (nd->flags & LOOKUP_INSHDW);
+}
+
+static inline struct dentry *fs_root(struct fs_struct *fs, struct nameidata *nd)
+{
+ /* assert: current->fs->lock held */
+ return (use_shadow(fs, nd)) ? fs->shdwroot : fs->root;
+}
+
+static inline struct vfsmount *fs_rootmnt(struct fs_struct *fs,
+ struct nameidata *nd)
+{
+ /* assert: current->fs->lock held */
+ return (use_shadow(fs, nd)) ? fs->shdwrootmnt : fs->rootmnt;
+}
+
static int __emul_lookup_dentry(const char *, struct nameidata *);

/* SMP-safe */
@@ -532,8 +552,8 @@ walk_init_root(const char *name, struct
return 0;
read_lock(&fs-&...

To: <linux-kernel@...>
Cc: <linux-fsdevel@...>
Date: Thursday, October 18, 2007 - 11:22 am

Header file changes for shadow directories.
Adds pointers to shadows dirs to the struct file and struct fs_struct.
Defines internal lookup flags and syscall flags.

Signed-off-by: Jaroslav Sykora <jaroslav.sykora@gmail.com>

include/linux/file.h | 2 ++
include/linux/fs.h | 18 ++++++++++++++++++
include/linux/fs_struct.h | 25 +++++++++++++++++++++++++
include/linux/namei.h | 16 ++++++++++++++++
4 files changed, 61 insertions(+)

--- orig/include/linux/fs.h 2007-10-07 19:00:24.000000000 +0200
+++ new/include/linux/fs.h 2007-10-07 13:39:08.000000000 +0200
@@ -266,6 +266,14 @@ extern int dir_notify_enable;
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4

+/* sys_setshdwinfo(), sys_getshdwinfo(): */
+#define FSI_SHDW_ENABLE 1 /* enable shadow directories */
+#define FSI_SHDW_ESC_EN 2 /* enable use of escape character */
+#define FSI_SHDW_ESC_CHAR 3 /* specify escape character */
+/* sys_setshdwpath */
+#define SHDW_FD_ROOT -1 /* pseudo FD for root shadow dir */
+#define SHDW_FD_PWD -2 /* pseudo FD for pwd shadow dir */
+
#ifdef __KERNEL__

#include <linux/linkage.h>
@@ -752,6 +760,16 @@ struct file {
spinlock_t f_ep_lock;
#endif /* #ifdef CONFIG_EPOLL */
struct address_space *f_mapping;
+
+ /* the following fields are protected by f_owner.lock */
+ /* | f_shdw | f_shdwmnt | result
+ +----------+-------------+------------
+ | NULL | NULL | delayed
+ | NULL | !NULL | invalid
+ | !NULL | NULL | BUG
+ | !NULL | !NULL | valid */
+ struct dentry *f_shdw;
+ struct vfsmount *f_shdwmnt;
};
extern spinlock_t files_lock;
#define file_list_lock() spin_lock(&files_lock);
--- orig/include/linux/fs_struct.h 2007-07-09 01:32:17.000000000 +0200
+++ new/include/linux/fs_struct.h 2007-10-07 13:39:08.000000000 +0200
@@ -10,8 +10,31 @@ struct fs_struct {
int umask;
struct dentry * root, * pwd, * altroot;
struct vfsmount * rootmnt, * pwdm...

Previous thread: none

Next thread: [0/3] Distributed storage. Mirror algo extension for automatic recovery. by Evgeniy Polyakov on Thursday, October 18, 2007 - 3:17 pm. (3 messages)