Jaroslav Sykora posted a series of five patches to handle the kernel portion of what he described as "shadow directories", providing an example which utilized FUSE to access the contents of a compressed file from the command line. His first example was cat hello.zip^/hello.c about which he explained, "the '^' is an escape character and it tells the computer to treat the file as a directory. The kernel patch implements only a redirection of the request to another directory('shadow directory') where a FUSE server must be mounted. The decompression of archives is entirely handled in the user space. More info can be found in the documentation patch in the series."
There were numerous problems suggested. Jan Engelhardt noted, "too bad, since ^ is a valid character in a *file*name. Everything is, with the exception of '\0' and '/'. At the end of the day, there are no control characters you could use." Later in the thread an lwn.net article from a couple years ago was quoted, "another branch, led by Al Viro, worries about the locking considerations of this whole scheme. Linux, like most Unix systems, has never allowed hard links to directories for a number of reasons;" The article had been discussing Reiser4, which treats files as directories. In the current discussion, Al Viro added, "as for the posted patch, AFAICS it's FUBAR in handling of .. in such directories. Moreover, how are you going to keep that shadow tree in sync with the main one if somebody starts doing renames in the latter? Or mount --move, or..."
From: Jaroslav Sykora
Subject: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 8:21 am 2007
Hello,
Let's say we have an archive file "hello.zip" with a hello world program source
code. We want to do this:
cat hello.zip^/hello.c
gcc hello.zip^/hello.c -o hello
etc..
The '^' is an escape character and it tells the computer to treat the file as a directory.
[Note: We can't do "cat hello.zip/hello.c" because of http://lwn.net/Articles/100148/ ]
The kernel patch implements only a redirection of the request to another directory
("shadow directory") where a FUSE server must be mounted. The decompression of
archives is entirely handled in the user space. More info can be found in the documentation
patch in the series.
The shadow directories are used in RheaVFS project [ http://rheavfs.sourceforge.net/ ],
and it also can be used with the original AVFS [ http://www.inf.bme.hu/~mszeredi/avfs/ ].
The patches are against vanilla 2.6.23.
This is my first bigger contribution to the kernel so please be gentle ;-)
Jara
--
"Elves and Dragons!" I says to him. "Cabbages and potatoes are better
for you and me." -- J. R. R. Tolkien
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jan Engelhardt
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 9:05 am 2007
On Oct 18 2007 17:21, Jaroslav Sykora wrote:
>Hello,
>
>Let's say we have an archive file "hello.zip" with a hello world program source
>code. We want to do this:
> cat hello.zip^/hello.c
> gcc hello.zip^/hello.c -o hello
> etc..
>
>The '^' is an escape character and it tells the computer to treat the file as a directory.
Too bad, since ^ is a valid character in a *file*name. Everything is, with
the exception of '[mid=347375,347391,347452,347602,347610,347887,347998]' and '/'. At the end of the day, there are no control
characters you could use.
But what you could do is: write a FUSE fs that mirrors the lower content
(lofs/fuseloop/however it was named) and expands .zip files as
directories are readdir'ed or the zip files stat'ed. That saves us
from cluttering up the Linux VFS with such stuff.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jaroslav Sykora
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 10:07 am 2007
On Thursday 18 of October 2007, Jan Engelhardt wrote:
>
> On Oct 18 2007 17:21, Jaroslav Sykora wrote:
> >Hello,
> >
> >Let's say we have an archive file "hello.zip" with a hello world program source
> >code. We want to do this:
> > cat hello.zip^/hello.c
> > gcc hello.zip^/hello.c -o hello
> > etc..
> >
> >The '^' is an escape character and it tells the computer to treat the file as a directory.
>
> Too bad, since ^ is a valid character in a *file*name. Everything is, with
> the exception of '[mid=347375,347391,347452,347602,347610,347887,347998]' and '/'. At the end of the day, there are no control
> characters you could use.
>
> But what you could do is: write a FUSE fs that mirrors the lower content
> (lofs/fuseloop/however it was named) and expands .zip files as
> directories are readdir'ed or the zip files stat'ed. That saves us
> from cluttering up the Linux VFS with such stuff.
>
Yes, that's exactly what RheaVFS and AVFS do. Except that they both use an escape
character because:
1. without it some programs may break [ http://lwn.net/Articles/100148/ ]
2. it's very useful to pass additional parameters after the escape char to the server.
We can start VFS servers (mentioned above) and chroot the whole user session into
the mount directory of the server. It works but it's very slow, practically unusable.
So both servers need some kind of VFS redirector. In the past there were many
different approaches -- LD_PRELOAD hack, CodaFS hack, NFS hack (?), proof-of-concept
kernel hacks (project podfuk) etc.
If anybody can think of any other solution of the "redirector problem", possibly
even non-kernel based one, let me know and I'd be glad :-)
--
I find television very educating. Every time somebody turns on the set,
I go into the other room and read a book.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Newall
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 1:37 pm 2007
Jaroslav Sykora wrote:
> If anybody can think of any other solution of the "redirector problem", possibly
> even non-kernel based one, let me know and I'd be glad :-)
If I understand your problem, you wish to treat an archive file as if it
was a directory. Thus, in the ideal situation, you could do the following:
cat hello.zip/hello.c
gcc hello.zip/hello.c -o hello
etc..
Rather than complicate matters with a second tree, use FUSE with an
explicit directory. For example, ~/expand could be your shadow, thus to
compile hello.c from ~/hello.zip:
gcc ~/expand/hello.zip^/hello.c -o hello
I think no kernel change would be required.
I'm not keen on the caret. One of the early claims made in
http://lwn.net/Articles/100148/ is:
> Another branch, led by Al Viro, worries about the locking
> considerations of this whole scheme. Linux, like most Unix systems,
> has never allowed hard links to directories for a number of reasons;
The claim is wrong. UNIX systems have traditionally allowed the
superuser to create hard links to directories. See link(2) for 2.10BSD
<http://www.freebsd.org/cgi/man.cgi?query=link&sektion=2&manpath=2.10+BSD>.
Having got that wrong throws doubt on the argument; perhaps a path can
simultaneously be a file and a directory.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 1:47 pm 2007
On Fri, Oct 19, 2007 at 06:07:45AM +0930, David Newall wrote:
> >considerations of this whole scheme. Linux, like most Unix systems,
> >has never allowed hard links to directories for a number of reasons;
>
> The claim is wrong. UNIX systems have traditionally allowed the
> superuser to create hard links to directories. See link(2) for 2.10BSD
> <http://www.freebsd.org/cgi/man.cgi?query=link&sektion=2&manpath=2.10+BSD>.
> Having got that wrong throws doubt on the argument; perhaps a path can
> simultaneously be a file and a directory.
Learn to read. Linux has never allowed that. Most of the Unix systems
do not allow that. Original _did_ allow that, but at the cost of very
easily triggered fs corruption (and it didn't have things like rename(2) -
it _did_ have userland implementation, of course, in suid-root mv(1),
but that sucker had been extremely racy and could be easily used to
screw filesystem to hell and back; adding rename(2) to the set of primitives
combined with multiple links to directories leads to very nasty issues on
_any_ system).
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Newall
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 7:57 pm 2007
Al Viro wrote:
> On Fri, Oct 19, 2007 at 06:07:45AM +0930, David Newall wrote:
>
>>> considerations of this whole scheme. Linux, like most Unix systems,
>>> has never allowed hard links to directories for a number of reasons;
>>>
>> The claim is wrong. UNIX systems have traditionally allowed the
>> superuser to create hard links to directories. See link(2) for 2.10BSD
>> <http://www.freebsd.org/cgi/man.cgi?query=link&sektion=2&manpath=2.10+BSD>.
>> Having got that wrong throws doubt on the argument; perhaps a path can
>> simultaneously be a file and a directory.
>>
>
> Learn to read. Linux has never allowed that. Most of the Unix systems
> do not allow that.
I did read the claim and it is ambiguous, in that it can reasonably be
read to mean that most UNIX systems never allowed such links, which is
wrong. All UNIX systems allowed it until relatively recently.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro
Subject: Re: [RFC PATCH 0/5] Shadow directories
Date: Oct 18, 10:37 pm 2007
On Fri, Oct 19, 2007 at 12:27:16PM +0930, David Newall wrote:
> >Learn to read. Linux has never allowed that. Most of the Unix systems
> >do not allow that.
>
> I did read the claim and it is ambiguous, in that it can reasonably be
> read to mean that most UNIX systems never allowed such links, which is
> wrong. All UNIX systems allowed it until relatively recently.
FVO"relatively recently" exceeding a decade and half. In any case,
it's _trivial_ to get fs corruption on any system with such links -
play with rename() races a bit and you'll get it. And yes, it does
include 4.4BSD and quite a chunk of even later history.
Anyway, you are quite welcome to propose a sane locking scheme capable
of dealing with that mess.
As for the posted patch, AFAICS it's FUBAR in handling of .. in such
directories. Moreover, how are you going to keep that shadow tree
in sync with the main one if somebody starts doing renames in the
latter? Or mount --move, or...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Not a job for the kernel
Good for Jan shooting this down. This is exactly what pipes, temporary directories, and subshells are for. This is not a job for the kernel. Man, there are some scary bad ideas floating around.
Not a job for the kernel
I would have to agree. This sounds like the Microsoftization of the Linux kernel to me. What's so difficult about unzipping the file?
More examples.
# cd hello.lzo/src/
# gcc -c hello_world.c -o hello_world
Error: blablabla. <- need aufs or unionfs to this read-only directory (mounted?).
# mkdir -p jail
# chroot jail/ <- works?
Be careful with those characters inside of filepaths:
blank space, <- backspace, ', ", \, &, |, `, $, etc.
I've a tons of e-mails.
I've 4.5 GiB of .zip compressed .mboxes from many sites in my DVD (200 GiB uncompressed).
cd /mnt/cdrom
works find? works grep?
e.g. find . -iname '*POTATO*' --permit-nested-zips <- works?
e.g. grep . -iR 'NANOTECH' --permit-nested-zips <- works?
Why not implement this with
Why not implement this with FUSE? Transparently allowing zip/7z/tar.{gz|bz2} files to be accessed as directories within a mounted archive is much cleaner in userspace than altering the core model of a directory/archive. Why would anyone want such a feature in the kernel itself, any more than one would want the x server and coreutils embedded in the kernel?
Why would anyone want such a
You should RTFA more carefully:
While this could also, in theory, be implemented as a FUSE layer, the performance impact would likely be noticeable and you simply wouldn't use it for any mount points where you actually store data on.
RTF*'ing more carefully
Perhaps you should have RTFC more carefully? The comment you replied to above is about auto FUSE mounting (_in-kernel_ provision of the redirection) vs. userland mounting and redirection. I doubt it much harder to have something automounted over a temporary mount point by a wrapper than to push the equivalent functionality into the kernel via the caret character.
don't make me laugh
don't make me laugh with all these talks about performance.
have you ever measured it? with archives like zip all CPU time will be spent on unzipping and it is unlikely to notice any performance degradations.
Before talking about perf please always supply some numeric data. Otherwise we see these pure speculations, just because some people want to commit another crappy code to mainstream.
don't make me laugh Laugh
Laugh all you want. But make sure you understand the parent post before you publicly announce it.
Yes, you are absolutely right. Except that I was not talking about the performance impact when decompressing files.
See, if you want compressed files to be *transparently* accessible as directories (just like the "caret escape" explained here), you need to have some kind of a layer which recognizes them, and does the mounting when required. This KernelTrap post documents a means of doing this in kernel space.
If you implemented *this* layer in user space, it would mean that this user space file system would have to wrap *every* operation on the file system, even when you aren't accessing archives, because otherwise they couldn't perform their magic with archives.
When implementing this layer in the user space, instead of the normal chain:
app->fread()->kernel->ext3
a FUSE file system wrapper would leave you with:
app->fread()->kernel->FUSE_wrapper->fread()->kernel->ext3
So, instead of a single fread() syscall, you now have two fread() syscalls *AND* two context switches (app to FUSE_wrapper, and back to app) just to perform a simple read!
And it's not just that the call chain is twice as long. FUSE file systems are particularly inefficient because they don't have direct access to kernel data structures like native file systems (ext3, etc) have, which means that there is lots of copying.
Consider that, if the contents of a file are already cached in RAM, your application will be CPU- or RAM-bound (not HDD-bound, because no HDD accesses are done). Adding operations like context switches to a callpath which used to be really fast will kill expected performance.
Now, I would very much like to measure the performance impact myself, but I couldn't find a simple FUSE "wrapper" file system that wouldn't have any extra overheads, and I can't be bothered with writing my own.
'zgrep' and 'bzgrep' exist
'zgrep' and 'bzgrep' exist on most systems. Install strigi (from the KDE guys) and you'll have the very powerfull 'deepfind' and 'deepgrep'.
Ok Something I don't get.
Why cannot the kernel just allow file and directory just to have the same name.
Makes shadow so much simpler.
File hello.zip
directory hello.zip/
Simple no extra chars required.
How a stat (or similar) call
How a stat (or similar) call should determine weather to use the file or directory?
stat(2) will return what it
stat(2) will return what it traditionally would, a file or directory. You use POSIX file attributes to signal that the file is a container, and that any attempts to open the file like a directory would succeed.
Next step is to slowly upgrade all the existing utilities and software to be able to make use of this new feature, all the while remaining backward compatible.
Problem solved. Except this isn't "elegant" enough for people (despite the fact that it would work just fine now and forever, and isn't particularly any more ugly than other features bolted onto Unix over the past 30 years, like BSD sockets), so instead it'll never happen.
Exactly what I was getting at.
And why I see it kinda pointless needing a extra char.
Stuff elegant if its going to cause problems.
sounds like an old Mac
That just becomes a mess and makes files less portable across systems.
Remember the Mac and the "data fork/resource fork" ? I still have nightmares about it.
Linux has resource forks now...
... they're just called extended attributes. Is it safe to assume they give you nightmares as well?
They do give nightmares. Try
They do give nightmares. Try backing up files with extended attributes using tar. Where did your attributes go?
Already can do that with Fuse
There already is a file system on Fuse that can extract archives: http://www.nongnu.org/unpackfs/