login
Header Space

 
 

Re: silent semantic changes with reiser4

Previous thread: Re: 2.6.8.1-mm2 by Christoph Hellwig on Thursday, August 19, 2004 - 5:43 am. (1 message)

Next thread: Re: silent semantic changes with reiser4 by Giuseppe Bilotta on Friday, August 27, 2004 - 6:10 am. (2 messages)
Cc: <linux-fsdevel@...>, <linux-kernel@...>
Date: Tuesday, August 24, 2004 - 4:25 pm

After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems and have a slight potential to break
even in the kernel.

 o files as directories
   - O_DIRECTORY opens succeed on all files on reiser4.  Besides breaking
     .htaccess handling in apache and glibc compilation this also renders
     this flag entirely useless and opens up the races it tries to
     prevent against cmpletely useless
   - meaning of the -x permission.  This one has different meanings on
     directories vs files on UNIX systems.  If we want to support
     directories as files we'll probably have to find a way to work
     around this.
   - dentry aliasing.  I can't find a formal guarantee in the code this
     can't happen
 
 o metafiles - ..metas as a magic name that's just taken out of the
   namespace doesn't sound like a good idea.  If we want this it should
   be a VFS-level option and there should be a translation-layer to
   xattrs.  Not doing this will again confuse applications greatly that
   expect uniform filesystem behaviour.

Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
-
Date: Tuesday, September 14, 2004 - 12:27 am

(Apologies for beating on a dead horse here...)

I seems like one confusion here is that everyone has a different idea of 
what the semantics we're talking about are.  I see two main ones:

Hans's: A file, a directory, and an attribute are functionally 
equivalent (except for S_ISxxx and hardlinks).  That is, /usr/bin/metas 
makes sense, and it's not talking about a program called metas.  This 
also means that /foo/metas/metas might exist and needs dealing with.

Linus's (I think):  A directory is just a directory (no attributes and 
no read()able data).  A file can contain attributes, where attributes 
can be "file" attributes or "directory" attributes.  That is, a file is 
also a subtree with posix-like semantics (except for hardlink stuff). 
So doing "touch /tmp/foo; cat /tmp/foo/metas" fails, rather than doing 
something that's probably useless.  "touch /tmp/foo; touch /tmp/foo/bar; 
touch /tmp/foo/bar/baz" fails on the last touch because bar is a file 
attribute and recursive magic is disallowed.

Which one are we talking about?

FWIW, I like the latter version a lot better, as it removes a lot of 
ambiguity.  If I see a path like /tmp/foo/metas/uid, it is either a uid 
attribute (i.e. writing it has security implications) or it is a 
standard file (i.e. writing it  just writes it).  But _I can tell which 
one_ by fstat()ing /tmp/foo!  If it's a directory, than I have either a 
named stream called uid or a genuine file called uid (and I can tell 
which by fstat()ing /tmp/foo/metas), but if it's a file then I have the 
magic uid.  And I'm guaranteed that there's no other funny business, 
because /tmp/foo is a "file with a subtree," which means that it's not 
an attribute.  This way I know exactly what I'm dealing with.

As an added bonus, we could have O_NOMETAS which means that "files" may 
not be traversed.  Then someone who wants to make sure they get a real 
file can do it.  If recursive files-as-dirs were allowed, that might not 
do quite what the caller expected.  If y...
Cc: <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 3:53 pm

I had not intended to respond to this because I have nothing positive to 
say, but Andrew said I needed to respond and suggested I should copy 
Linus. Sigh.

Dear Christoph,

Let me see if I can summarize what you and your contingent are saying, 
and if I misconstrue anything, let me know.;-)

You ignored everything I said during the discussion of xattrs about how 
there is no need to have attributes when you can just have files and 
directories, and that xattrs reflected a complete ignorance of name 
space design principles.  When I said we should just add some nice 
optional features to files and directories so that they can do 
everything that attributes can do if they are used that way, you just 
didn't get it.  You instead went for the quick ugly hack called xattrs.  
You then got that ugly hack done first, because quick hacks are, well, 
quick.  I then went about doing it the right way for Reiser4, and got 
DARPA to fund doing it.  I was never silent about it.

Making files into directories caused only two applications out of the 
entire OS to notice the change, and that was because of a bug in what 
error code we returned that we are going to fix.  You think that was a 
disaster; I think it was a triumph.

Now a cleanly architected filesystem with no attributes and just files 
and directories that can do everything attributes are used for exists.  
You don't want it to have the competitive advantage.  Instead, you want 
it to have its clean design excised until you have something that 
duplicates it ready to go, and only then should it be allowed that users 
will use the features of your competitor's filesystem which you 
disdained implementing for so long.

Since you never studied or understood namespace design principles (or 
you would not have created and supported xattrs), you want to rename it 
to be called VFS, rewrite what we have done, and take over as the 
maintainer, mangling its design in a committee clusterfuck as you go. 

We have just implemented very tri...
Cc: Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:23 pm

Just curious about your comments on Jamie Lokier's suggestions for enabling 
files-as-directories semantics without breaking existing apps.

Chris
Cc: Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:20 pm

I don't want to comment on any of the technical issues about VFS etc. as
I would be completely out of my depth, however I do want to say 2 things. Firstly,
this is a feature that Samba users have been needing for many years to maintain
compatibility with NTFS and Windows clients. Microsoft no longer sell any servers
or clients without support for multiple data streams per file, and their latest
XP SP2 code *does* use this feature. Whatever the kernel issues I'm really glad
that Hans and Namesys have created something we can use to match this
functionality - soon we will need it in order to be able to exist in
a Microsoft client-dominated world.

My second point is the following. Hans - did you *really* have to reinvent
the wheel w.r.t userspace API calls ? Did you look at this work (done in 2001
for Solaris) ?

http://bama.ua.edu/cgi-bin/man-cgi?fsattr+5
http://bama.ua.edu/cgi-bin/man-cgi?attropen+3C
http://bama.ua.edu/cgi-bin/man-cgi?openat+2

I'm complaining here as someone who will have to write portable code
to try and work on all these "files with streams" systems.

Jeremy.
Cc: Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 4:42 am

I agree that your work is important without agreeing that MS client 
domination will last.;-)  It is indeed my desire to give you every 
single feature you need to emulate MS streams within files, but doing it 
using directories that are files.  I would like to support you in 
I interviewed for the file system architect job at Sun in, I think, 
1999, and they offered me the job conditional on my giving up on my 
Linux work.  (After much trying and failing to convince them that it 
would be okay for me to work on Linux also, I declined the job, much to 
my fiscal loss and work satisfaction.)

They do not do a pure job of implementing attributes in the file 
namespace though.  There are far more distinctions between files and 
attributes than are necessary that are described in these man pages 
below, and those distinctions cause a loss of closure.  I can say more
Cc: Jeremy Allison <jra@...>, Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 9:27 am

1) how do you back up and restore files with streams inside ?

2) how do standard unix utilities handle them ?

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
Cc: Jeremy Allison <jra@...>, Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 7:53 pm

To repeat, I think it would be nice to implement a 
filename/pseudos/backup method for all the plugins.

Guys, we just have the beginnings in place.  One plugin method at a time 
it will all fall into place.

What we have now is useful now.  The more methods come into existence, 
the more compelling it becomes --- standard network economic theory 
applies here.

Hans
Cc: Hans Reiser <reiser@...>, Jeremy Allison <jra@...>, Christoph Hellwig <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 9:56 am

Most likely they don't ;) That is, until they are fixed or replaced.
I've heard of people who want xattrs to be backed up so they use star, 
not gnu tar, already.

-- 
mjt

-
Cc: <riel@...>, <reiser@...>, <jra@...>, <hch@...>, <akpm@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <torvalds@...>, <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 3:58 pm

Not good, in my view.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson &lt;pj@sgi.com&gt; 1.650.933.1373
-
Cc: <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, Linus Torvalds <torvalds@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:08 pm

Actually in most of the discussion you simply didn't participate.  While
xattrs might not be the nicest interface they have the advantag of not
breaking the SuS assumption of what directories vs files are, and they
do not break the Linux O_DIRECTORY semantics that are defined and need

For one thing _I_ didn't decide about xattrs anyway.  And I still
haven't seen a design from you on -fsdevel how you try to solve the

My competitors filesystem?  If you look at MAINTAINERS I maintain only
vxfs and sysvfs, neither of which I'd suggest anyone to run their system

Hans, please stop the personal crap or the black helicopters will kidnap
you.   When was the last time you actually worked on kernel namespace
code instead of talking marketing bullshit and ignoring all real world

Could you pass on that crack pipe please?
Cc: Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:22 pm

Hey, files-as-directories are one of my pet things, so I have to side with 
Hans on this one. I think it just makes sense. A hell of a lot more sense 
than xattrs, anyway, since it allows scripts etc standard tools to touch 
the attributes.

It's the UNIX way.

And yes, the semantics can _easily_ be solved in very unixy ways.

One way to solve it is to just realize that a final slash at the end 
implies pretty strongly that you want to treat it as a directory. So what 
you do is:

 - without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
 - with the slash, it won't open _without_ O_DIRECTORY (EISDIR)

Problem solved. Very user-friendly, and very intuitive.

Will it potentially break something? Sure. Do we care? Me, I'll take that 
kind of extension _any_ day over xattrs, that are fundamentally flawed in 
my opinion and totally useless. The argument that applications like "tar" 
won't understand the file-as-directory thing is _flawed_, since legacy 
apps won't understand xattrs either.

Oh, add a O_NOXATTRS flag to force a path lookup to only use regular
directories, the same way we have O_NOFOLLOW and friends. That allows
people to see the difference, if they care (ie a file server might decide
that it doesn't want to expose things like this).

I never liked the xattr stuff. It makes little sense, and is totally 
useless for 99.9999% of everything. I still don't see the point of it, 
except for samba. Ugly.

		Linus
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:18 pm

Stupid question: who will use it? And why?

Anyone can write an userspace library, that implements function
set_attribute(char *file, char *attribute, char *value), that creates
directory ".attr/file" in file's directory and stores attribute there.
(and you can get list of attributes from shell too:
ls `echo "$filename" |sed 's/\/\([^\/]*\)$/\/\.attr\/\1/'`
). There's no need to add extra functionality to kernel and filesystem.

Advantage:
- you don't add bloat to kernel or filesystem
- you don't need to teach tar/cp -a/mc about attributes
- you won't lose attributes after editing file in vim (it creates another

The only way xattrs are useful is that backup/restore software doesn't
have to know about every filesystem with it's specific attributes and
every magic ioctl for setting them. Instead it can save/restore
filesystem-specific attributes without understanding what do they mean.
However there's no need why application should use them. And no
application does.

I can't imagine anyone shipping an application with "this app requires
reiser4" prerequisite. Why should anyone use it if he can store attributes
in ".attr" directory or whereever and make the application work on any OS
and any filesystem?

Mikulas
-
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:27 pm

..and the above is, roughly, what I understand samba etc falls back on.

The problem ends up being that the above isn't in any way safe from people 
moving files around (oops, where did those attributes go?) nor does it 
have any consistency guarantees. So it only works well if _one_ 
application does this, and that application follows all the locking rules.


If no application does, then why back them up? Why implement them in the 
first place?

In other words - some apps obviously do want to use the. Sadly.

		Linus
-
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:51 pm

You can add more functionality to filesystem and use xattrs to control it.
For example:
- acls
- compress file
- encrypt file (copy user's password into task_struct and use it to
encrypt his files)
- preallocate file in 4MB contignuous chunks, becuase it needs real time
multimedia access
- sync/append-only/immutable
etc.
However there's no need why an application should care whether the file is
compressed, whether it has acls, or so. And applications don't.

And I think this is the only legitimate use for xattrs. Who else uses them
except samba? I don't see how reiser4's hybrids would help.

Mikulas
Cc: Mikulas Patocka <mikulas@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 4:36 am

Streams are quite ugly.  However, if you decompose streams into all of 
the little pieces that are needed to emulate them, the pieces are quite 
nice.

For instance, inheriting stat data from a common parent is nice, and 
inheritance is nice, and being able to cat dirname/pseudos/cat and get a 
concatenation of all of the files is nice, and being able to cat 
dirname/pseudos/tar and get an archive of the directory is nice, and, 
well, if you decompose all of the features of streams into little 
features you get a bunch of fun little features much nicer than streams.

Hans
Cc: Linus Torvalds <torvalds@...>, Mikulas Patocka <mikulas@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 6:53 am

Yes.  Being able to cd into filename.tar.gz and filename.iso is also
nice, but all of these features should be supported by the VFS
generically, not in any specific filesystem, and there should be a
hook to invoke the various fun filesystem-independent handlers by name.

-- Jamie
Cc: Hans Reiser <reiser@...>, Linus Torvalds <torvalds@...>, Mikulas Patocka <mikulas@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 6:59 am

It doesn't belong into the kernel at all.  If at all it belongs into a
userspace filesystems, but even in that case the magic detection of
which one to use is kinda hard.  You absoutely don't want to hardcode
file formats in the kernel.
Cc: Jamie Lokier <jamie@...>, Hans Reiser <reiser@...>, Linus Torvalds <torvalds@...>, Mikulas Patocka <mikulas@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 7:17 am

Do  you  mean  user-level  file  system  as  a  VFS  handled by user
  applications,  or  a  intermediate  file  system  layer  between any
  application  and  the  real  file  system?  The latter would be good
  enough as it would still be transparent to the applications.

  ~S
Date: Thursday, August 26, 2004 - 7:07 am

Oh, I agree userspace should be involved.

-- Jamie
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:57 pm

I've got a stupid question too.  How do you back up these
things ?

If your backup program reads them as a file and restores
them as a file, you might lose your directory-inside-the-file
magic.

If your backup program dives into the file despite stat()
saying it's a file and you restore your backup, how are the
"file is a file" semantics preserved ?

Obviously this is something that needs to be sorted out at
the VFS layer.  A filesystem specific backup and restore
program isn't desirable, if only because then there'd be
no way for Hans's users to switch to reiser5 in 2010 ;)

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
Cc: Mikulas Patocka <mikulas@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 4:40 am

It needs to be sorted out, whether it is sorted out at the VFS layer is 
It might be that we need a filenameA/metas/backup method for all of our 
file plugins, which if cat'd gives a set of instructions which if 
executed are adequate for restoring filenameA.

Hans
Cc: Mikulas Patocka <mikulas@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 10:46 am

On 2004-08-26T01:40:29,

So what exactly is wrong with sorting it out at the VFS layer, and why
do you _insist_ on sorting it in the reiserfs4 core? I'm missing
something, please fill me in on the details.


Sincerely,
    Lars Marowsky-Brée &lt;lmb@suse.de&gt;

-- 
High Availability &amp; Clustering	   \\\  /// 
SUSE Labs, Research and Development \honk/ 
SUSE LINUX AG - A Novell company     \\//
Cc: Mikulas Patocka <mikulas@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 3:51 am

Sure, this sort of thing must be sorted out at the VFS layer.
And a backup program working on such a filesystem
will need to know that something can be a file, a directory - or both.

So an old "tar" won't get this right as it will assume that an object
is either file or directory.  The change to get it right won't be
that big - just notice that an object is both, then backup the
ordinary file contents as usual, before recursing into the
directory it also provides and backup stuff there as usual.

The resulting .tar can of course only be unpacked properly
on a fs supporting file-as-directory, similiar to how a .tar of
a fs with links only will unpack properly on a fs supporting links.

I don't see much problems for userland.  Old apps will keep working,
as the new features is a superset.  Those who care about
file-as-directory extras will provide patches for "tar" and friends,
after that the extras become useable.

Helge Hafting
-
Cc: <riel@...>, <mikulas@...>, <torvalds@...>, <hch@...>, <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 5:21 am

There are many backup apps, not just one.  I've written a few myself,
none of which will ever be worthy of notice.  The sourceforge
Topic.System.Archiving.Backup lists 335 projects at present.

I find the idea that most backup tools and scripts will silently
stop working correctly to be pretty scary.  

And then there's archiving, installation, distribution, administration,
emulation, file system and partition managers, and on and on.

===

I wonder if we can make this "modal" somehow.

The one consistency I see is that apps that want the "enhanced" view
need to ask for it, somehow.  It is the new views of the data that are
being added - let the app announce to the kernel (usually via
specialized code in some shared library that the app is using to get the
alternate views) that either per-task, or per-file descriptor, it is to
see the "enhanced" view, as a side affect of trying to access it.

Old stuff, or even new stuff that is content to work with the "classic"
view that a file is a single data stream, and that directories only
have pathnames, not data, would by default see that view, and see
_all_ the data, presented somehow in that view, perhaps as additional
files with magic names.

This still leaves the breakage that such tools don't know, and don't
preserve, the magic linkage between such magic files.  But that is
much less of an issue, in my view.  Programs such as backups that are
manipulating the files of apps they know nothing about already have
to presume that all the files are important in inscrutable ways, and
just be careful to preserve or copy or backup all of them.

Yeah - I realize that there will be a few followups denouncing modal
architectures.  I might even agree with some of them.

If this were easy, it would have been done years ago.

The onus should be on the new stuff to request the enhanced view,
rather than on the old dogs to learn new tricks.

-- 
                          I won't rest till it's the best ...
                          Progr...
Cc: <riel@...>, <mikulas@...>, <torvalds@...>, <hch@...>, <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <reiserfs-list@...>
Date: Friday, August 27, 2004 - 8:33 am

They won't stop working, they will merely not support
the new features.  That is only a problem if
you actually use those features.  If all you have
is plain files and plain directories - no problem.  Not
if files-as-directories are implemented right.

Helge Hafting
Cc: Helge Hafting <helge.hafting@...>, <riel@...>, <mikulas@...>, <torvalds@...>, <hch@...>, <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 6:47 am

Too late.  We have xattrs already; many programs don't store them.

-- Jamie
Cc: <helge.hafting@...>, <riel@...>, <mikulas@...>, <torvalds@...>, <hch@...>, <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 7:19 am

&gt; Too late.  We have xattrs already; many programs don't store them.

If by your "too late" you mean we can stop worrying about any more
breakage of file system utilities, because there exists an example
in which some were already broken, then you are absolutely wrong.

Just because we caused some breakage doesn't give us license to cause
even more.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson &lt;pj@sgi.com&gt; 1.650.933.1373
Cc: <mikulas@...>, <torvalds@...>, <hch@...>, <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, <flx@...>, <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 5:44 am

Encode the magic in the names, by stealing a bit of the existing
filename space to encode it.

Such works pretty well as part of the magic to map long filenames
into DOS 8.3 names on my FAT partitions.

Apps linked with the appropriate Windows library see nice fancy
long names.

The rest of the world, including DOS apps and my Unix backup
scripts, see the primitive 8.3 names, including one or a few
extra files per directory, which are nothing special to them.

So long as these other apps don't presume to know that they can
keep some of the files in an apps directory, and drop others, then
it works well enough.  And no self-respecting general purpose
backup program is going to presume such knowledge anyway.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson &lt;pj@sgi.com&gt; 1.650.933.1373
Cc: Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 4:43 am

I think we should require people to care enough to supply an O_NOMETAS
Cc: Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>, Nikita Danilov <Nikita@...>
Date: Wednesday, August 25, 2004 - 4:41 pm

I just want to add that I AM capable of working with the other 
filesystem developers in a team-player way, and I am happy to cooperate 
with making portions more reusable where there is serious interest from 
other filesystems in that, but Christoph is a puppy who has never 
written or designed a major filesystem from scratch, and Nikita is a big 
dog who has written stuff very few projects are lucky enough to see the 
likes of, and when Christoph insults Nikita's code, or my design 
guidance for that code, it is not going to bring out my good side.  The 
plugin and metafiles code needs many improvements, but Christoph does 
not have the expertise to understand what those needed improvements are 
because he hasn't invested the work into understanding the code.

Christoph is a bright and clever young fellow, who just hasn't had the 
years of study of the field yet.  I wish him well, and away.;-)

Hans
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>, Nikita Danilov <Nikita@...>
Date: Wednesday, August 25, 2004 - 5:03 pm

It's not Christoph who's shown more bark than bite in this thread.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>, Nikita Danilov <Nikita@...>
Date: Thursday, August 26, 2004 - 5:00 am

The bite is at www.namesys.com/download.html


Hans
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>, Nikita Danilov <Nikita@...>
Date: Wednesday, August 25, 2004 - 4:58 pm

I see this is as an opportunity where you can share some of your
experience to Cristoph and many others and work to get the semantics
into VFS.

Please make this work :)

-- 
mjt
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>, Nikita Danilov <Nikita@...>
Date: Wednesday, August 25, 2004 - 4:51 pm

Prove it.  Stop replying for today and come back tomorrow with some
useful discussions.  Christoph suggested that some of the v4 semantics
belong in the VFS and therefore linux as a whole.  He's helping you to
make sure the semantics and fit nicely with the rest of kernel
interfaces and are race free.

Take him up on the offer.

-chris
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 9:53 am

If I may chime in here...


This is an issue that directly affects work I am doing in extended
cryptfs:

http://www.linuxsymposium.org/2004/view_abstract.php?content_key=3D55
http://halcrow.us/~mhalcrow/ols2004.pdf
http://halcrow.us/~mhalcrow/ols_cryptfs.sxi

The basic idea is that the cryptographic context for every file is
correlated with the individual file via xattr's.  A file is a unit of
data that should, as it stands, contain all the information requisite
for the encrypting filesystem layer to transparently decrypt (and
encrypt, when the file is written to).  This allows for a key-&gt;file
granularity, as opposed to a key-&gt;block device (dm-crypt) or a
key-&gt;mount point (CFS) granularity.

My grand vision is to have a policy that determines whether or not the
encrypted version of the file or the decrypted version of the file is
read, dependent on whether or not the file is leaving the security
domain (the storage device under the control of the currently running
kernel).  For example, if the ``cp'' command is copying a file from a
filesystem mounted from /dev/hda1 to a filesystem mounted from
/dev/fd0, then the policy would indicate that (unless otherwise noted
in the .cryptfsrc file in the root of the filesystem mounted from
/dev/fd0, which might also contain the default security context for
that filesystem or directory - like whose public keys should be used
to encrypt the symmetric key for data) the file is leaving the
security domain, and the encrypted contents of the file should be
given to cp.  Same with mutt reading an email attachment (as opposed
to, say, .muttrc, where, more likely than not, the unencrypted version
is wanted).

The goal is to enable an ``encrypted by default'' policy, in which
files on the storage devices are independent encrypted units that
remain encrypted until an application that actually needs to see the
decrypted contents opens them.  Then the encryption and decryption is
done transparently by the fs layer, as long as the user has th...
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 6:26 pm

Reiser4 has an encryption plugin that will ship sometime this year.  You 
might want to talk to edward@namesys.com about it.
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 5:52 pm

I thought the UNIX way is "everything's a file", not "everything's a

There's always the option that they're both broken.

-- 
Mathematics is the supreme nostalgia of our time.
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 6:21 pm

It really was. Directories were historically largely just files too, 
although with the special "lookup" operation. 

Historic unix didn't have readdir/rmdir/mkdir/rename or really much _any_
special directory handling. Directories were just files, and you read them 
like files.

Of course, even in that early unix, "directories" were very much a 
reality even apart from the fact that they happened to be implemented 
pretty much like files. Nobody has ever claimed that the UNIX way is 

Yes. Highly likely. However, something like that _does_ end up what a 
Windows fileserver wants. IOW, even if it's broken, _something_ is likely 
forced on us by that nasty thing we call "real users". Damn them.

		Linus
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:35 pm

That would solve the O_DIRECTORY issue, the dentry aliasing still needs
work though with the semantics for link/unlink/rename.

Maybe Hans &amp; you should start 2.7 to work this out? :)
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 4:42 pm

Not if you allow link(2) on them.  And not if you design and market your
stuff as a general-purpose backdoor into kernel.  Note how *EVERY* *DAMN*
*OPERATION* is made possible to override by "plugins".  Which is the reason
for deadlocks in question, BTW.

Don't fool yourself - that's what Hans is selling.  Target market: ISV.
Marketed product: a set of hooks, the wider the better, no matter how
little sense it makes.  The reason for doing that outside of core kernel:
bypassing any review and being able to control the product being sold (see
above).

Shame that it got an actual filesystem mixed in with the marketing plans
and general insanity...
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 5:00 pm

Heh. I don't think that's a very strong argument against being "unixy", 
considering how traditional unix _used_ to handle directories.

mkdir/rmdir/rename only came later. Now, obviously they did come later for 
a good reason, but still..

The interesting part is that thanks to the dcache, we should be perfectly
able to actually _see_ circular links etc, so some of the problems with 
linking directories should actually be quite solvable - something that is 
_not_ true for a traditional UNIX VFS layer.

Of course, the dcache introduces some new problems of its own wrt
directory aliasing, but I don't actually think that should be fundamental
either. Treating them more as a "static mountpoint" from an FS angle and
less as a traditional Unix hardlink should be doable, I'd have thought.

(Also, it's entirely possible that the filesystem may not support some of
the more esoteric linking/renaming operations. For example, in a
traditional xattrs setup where the xattr is linked on-disk with the file
it is associated with, you simply _can't_ link it somewhere else, or
rename it to any other directory. That's not a VFS layer issue, obviously,
but I thought I'd bring up the point that file-as-dir cases may have

Now that's a separate argument, and not one I'm personally interested in
arguing at least right now. I haven't actually looked at the reiser4 code,
so I'm really _only_ arguing against special-case attributes.

		Linus
Cc: Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 5:25 pm

Yeah, if we ditch the "mountpoints are busy and untouchable" stuff.  Which
I'd love to, but it's a hell of a visible (and admin-visible) change.

FWIW, current deadlocks are unrelated to actual operation succeeding.
Look: we have sys_link() making sure that parent of target is a directory
(PATH_LOOKUP, in a "it has -&gt;lookup()" sense), then locking target's parent,
then checking that it has -&gt;link() (everyone on reiser4 does) and then
checking that source (old link to file) is *not* a directory (in S_ISDIR
sense).  Then we lock source.

Note that currently it's OK - we get "all non-directories are always locked
after all directories".  With filesystem that provides hybrid objects with
non-NULL -&gt;link() it's not true and we are in deadlock country.  Before
we get anywhere near fs code.

I'm not saying that this particular instance is hard to fix, but it wasn't
even looked at.  All it would take is checking the description of current
locking scheme and looking through the proof of correctness (present in the
tree).  That's the first point where said proof breaks if we have hybrids.
And it's what, about 4 screenfuls of text?

I have no problems with discussing such stuff and no problems with having it
merged if it actually works.  But let's start with something better than
"let's hope nothing breaks if we just add such objects and do nothing else,
'cause hybridi files/directories are good, mmmkay?"
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:11 pm

This message suggests a way to extend the VFS safe locking rules to
include files-as-directories.


Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Simply doing a path walk would lock the file and then cross the mount
point to a directory.

A way to ensure that preserves the lock order is to require that the
metadata is in a different filesystem to its file (i.e. not crossing a
bind mount to the same filesystem).

That has the side effect of preventing hard links between metadata
files and non-metadata, which in my opinion is fine.

Path walking will lock the file, and then lock the directory on a
different filesystem.  Lock order is still safe, provided a strict
order is maintained between the two filesystems.

The strict order is ensured by preventing bind mounts which create a
path cycle containing a file-&gt;metadata edge.  One way to ensure that
is to prevent mounts on the metadata filesystems, but the rule doesn't
have to be that strict.  This condition only needs to be checked in
the mount() syscall.

-- Jamie
-
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 8:30 pm

Yes - mountpoints can't be e.g. unlinked.  Moreover, having directory

*Ugh*

What would happen if you open that directory or chdir there?  If it's


You really don't want to lock mountpoint on path lookup, so I don't see
how that would be relevant - it's a hell to clean up, for one thing
(I've crossed ten mountpoints on the way, when do I unlock them and
how do I prevent deadlocks from that?)  Besides, different namespaces
can have completely different mount trees, so tracking down all that
stuff would be hell in its own right.

The main issue I see with all schemes in that direction (and something
like that could be made workable) is the semantics of unlink() on
mountpoints.  *Especially* with users being able to see attributes of
files they do not own (e.g. reiser4 mode/uid/gid stuff).  Ability to
pin down any damn file on the system and make it impossible to replace
is not something you want to give to any user.
Cc: Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, Hans Reiser <reiser@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, August 25, 2004 - 9:00 pm

Ok, so can we make it so mountpoints can be unlinked? :)

The mount would continue to exist, but with no name, until its last

I think the underlying file does not stay locked, and once you've
entered it as a directory, it can be unlinked.

If you have the directory open or chdir into it, then it _may_ have
the effect of keeping the file's storage allocated when you unlink it
-- just like when a file is unlinked while opened.  As that is not a
user-visible property, it's a filesystem-specific implementation
detail as to whether it keeps the file's data in existence while the

I didn't mean locking a chain of mountpoints, I meant the temporary
state where two dentries and/or inodes are locked, parent and child,
during a path walk.  However I'm not very familiar with that part of
the VFS and I see that the current RCU dcache might not lock that much

I agree, users shouldn't be able to pin down a file.

I think unlink() should succeed on a file while something is visiting
inside its metadata directory.

It's a filesystem quality-of-implementation feature whether that
actually releases the file's data.  It's a desirable feature because
one user shouldn't be able to pin another user's quota'd data if they
don't have permission to open the file, but if it's not implemented by
a filesystem then it doesn't break anything fundamental.

It's a semantics question whether unlinking a file makes the metadata
(i.e. "uid", "mode", "content-type" etc.) disappear at the same time,
or if the metadata stays around until the last visitor leaves it.  A
filesystem might be able to keep the metadata in existence even if it
deletes the file's storage on unlink(), but it would be nice for the
VFS to declare which semantic is preferred.

One of the big potential uses for file-as-directory is to go inside
archive files, ELF files, .iso files and so on in a convenient way.
In those cases, if you open one of the virtually generated "archive
content" files, then you might expect the data to continue t...
Cc: <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 4:49 am

Yes, this was part of the plan, tar file-directory plugins would be cute.
Cc: Jamie Lokier <jamie@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 2:35 pm

Question:  Is "cat /foo/bar/baz.tar.gz/metas" the attribute
directory or a directory in the tarball named "metas"?

Joel

-- 

"I'm so tired of being tired,
 Sure as night will follow day.
 Most things I worry about
 Never happen anyway."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
Date: Friday, August 27, 2004 - 5:19 am

This has been fought over on the reiserfs-list ad nauseaum, but
it's a valid point.

That's why I tend to rename metas/ into ..metas/ to avoid
name clashes, even if I've never had a directory named metas/
apart from what Reiser4 ships.

It is then debatable if it should be renamed before it's too
late, have it renamable in the kernel configs (and the name
exported via /sys or something) or just leave it be.

-- 
mjt
Cc: Hans Reiser <reiser@...>, Jamie Lokier <jamie@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Friday, August 27, 2004 - 8:26 am

http://packages.debian.org/cgi-bin/search_contents.pl?word=3Dmetas&amp;sear=
chmode=3Dsearchfilesanddirs&amp;case=3Dinsensitive&amp;version=3Dunstable&amp;arch=3D=
i386

OK, those are capital METAS rather than junior metas, but it does show
this is not a unique word to reiser4.

--=20
"Next the statesmen will invent cheap lies, putting the blame upon=20
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refu=
se
to examine any refutations of them; and thus he will by and by convince=
=20
himself that the war is just, and will thank God for the better sleep=20
he enjoys after this process of grotesque self-deception." -- Mark Twai=
n
-
Date: Thursday, August 26, 2004 - 3:53 pm

This needs to be designed.

Perhaps /foo/bar/baz.tar.gz/tar/metas is the directory in the tarball
named "metas".

Or perhaps /foo/bar/baz.tar.gz/x/metas is: it's independent of archive
format, and I personally tend to extract things into a directory
called "x". [*]

Or perhaps /foo/bar/baz.tar.gz/metas is, and the attribute directory
is /foo/bar/baz.tar.gz/../metas, to be perverse ;)

I prefer the second one, ("x/metas"), but not with any conviction.

-- Jamie


[*] Actually I prefer:

      /foo/bar/baz.tar.gz/content/metas
      /foo/bar/baz-0.01.tar.gz/content/baz-0.01/metas

           Archives always in "content".  One layer of decompression
           always tried for .tar files and other uncompressed archive
           formats.

      /foo/bar/baz.tar.gz/x -&gt; content/
      /foo/bar/baz-0.01.tar.gz/x -&gt; content/baz-0.01/

           If the root of the archive contains a single directory, "x"
           is a symlink to it.  Otherwise "x" is a symlink to the root
           directory of the archive.  This is comfortable with the
           common practice by which archives are distributed, without
           making a mess when someone forgets to put everything in a
           top-level directory.
Cc: Jamie Lokier <jamie@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 10:05 am

Silly question:

GNU Midnight Commander allows for ages to go into e.g. tar files, so I 
know the benefits of this. Additionally, in GNU Midnight Commander, this 
works no matter which file system I use (e.g. it works on iso9660), and 
it even works the same way on other OS's like e.g. Solaris and NetBSD.

What is the technical reason why a tar plugin should be reiser4 
specific, instead of a generic VFS or userspace solution that would 
allow the same also on other fs like e.g. iso9660?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
Cc: Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, August 26, 2004 - 11:02 am

It should be a generic VFS plugin, not reiser4 or userspace.  The VFS
plugin should call out to userspace for most actions (except handling
cached data), and it should take advantage of special reiser4 features
for storage and performance optimisations.  But it should still work
over a standard filesystem, when those special features aren't
available.  I guess FUSE and many earlier projects are heading in this
direction.

A generic userspace solution doesn't let you "cd" into a tar file from
all programs like you can inside Midnight Commander.

Gnome and KDE take the approach that every userspace system call
should be intercepted and filtered, to create the illusion of virtual
data.  As a result, different programs see different virtual data: you
can't just cut and paste a path from Gnome or KDE into any other
program.  It's not just a "social problem of libraries" thing:
sometimes I have programs which don't link to libc.  Sometimes I have
programs which mustn't link with anything that calls malloc().  It'd
be silly for them to have a different view of the filesystem just
because they can't link with some userspace library.

The Gnome/KDE/Midnight Commander pure userspace solution is silly: if
_every_ program in the system should get the same view, it makes much
more sense for the kernel to filter the system calls and redirect the
virtual accesses to a userspace daemon, while keeping the real
accesses at full speed.

Furthermore is makes much more sense for the kernel's page cache to
hold those uncompressed pages, than for every userspace application to
try and cooperatively manage a cache of uncompressed fragments in the
most inefficient way.

There's another problem with the Midnight Commander approach.  If I
"cd" into a tar file, and then a program writes to the tar file, I
don't always see the changes straight away. The two views aren't
coherent.  This isn't an easy problem to solve, but it should be
solved.

When a simple "cd" into .tar.gz or .iso is implemented prope...
Cc: Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Saturday, August 28, 2004 - 7:14 pm

Jamie Lokier &lt;jamie@shareable.org&gt; said:


Nonsense. The .iso or .tar or whatever would have to be kept un-isoed or
un-tarred in memory (or on disk cache) for this to be true, and that takes
quite a long time. Each time you want to peek anew at linux/Makefile, the
whole tarfile will have to be read and stored somewhere, and that is just
too slow for my taste. The .tar format is optimized for compact storage,
the on-disk format of a filesystem is optimized for fast access and
modifiability. Now go ahead and enlarge a file on your .iso/.tar a bit...it
will take ages to rebuild the whole thing. There is a _reason_ why there
are filesystems and archives, and they use different formats. If it weren't
so, everybody and Aunt Tillie would just carry .ext3's around, and would

Perpetuum mobile is a nice idea too.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513
-
Cc: Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Wednesday, September 1, 2004 - 4:08 pm

I'm going to explain why filesystem support for .tar.gz or other
"document container"-like formats is useful.  This does _not_ mean tar
in the kernel (I know someone who can't read will think that if I
don't say it isn't); it does mean hooks in the kernel for maintaining
coherency between different views and filesystem support for cacheing.

The vision I'm going for here is:

  1. OpenOffice and similar programs store compound documents in
     standard file formats, such as .tar.gz, compressed XML and such.

     Fs support can reduce CPU time handling these sorts of files, as
     I explain below, while still working with the standard file formats.

     With appropriate userspace support, programs can be written which
     have access to the capabilities on all platforms, but reduced CPU
     time occurs only on platforms with the fs support.

  2. Real-time indexing and local search engine tools.  This isn't
     just things like local Google; it's also your MP3 player scanning
     for titles &amp; artists, your email program scanning for subject
     lines to display the summary fast, your blog server caching built
     pages, your programming environment scanning for tags, and your
     file transfer program scanning for shared deltas to reduce bandwidth.

     I won't explain how these work as it would make this mail too
     long.  It should be clear to anyone who thinks about it why the
     coherency mechanism is essential for real-time, and a consistent
     interface to container internals helps with performance.


Wrong.  "So long as it remains in the on-disk cache" means each time
you peek at linux/Makefile, the tarfile is _not_ read.

For a tarfile it's slow the first time, and when it falls out of the
on-disk cache, otherwise, for component files you are using regularly
(even over a long time) it's as fast as reading a plain file.

You obviously know this, as you mentioned on-disk cache in the reply,
so I infer from the rest of your mail that what you're try...
Cc: Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, September 2, 2004 - 10:06 am

Nobody disagrees there (I think), the disagreement is on whether the

"Coherency" and "different views" implies atomic transactions, and being
able to tell that an access to the file requieres updating random junk
about it. It requires being able to guess if it is worth updating now
(given that the file might be modified a dozen times more before the junk

And they are doing fine AFAICS. Besides, they won't exactly jump on the
possibility of leaving behind all other OSes on which they run to become a

I don't buy this one. A tar.gz must be uncompressed and unpacked, and

Userspace support isn't there on any of the platforms right now, if ever it
will be a strange-Linux-installation thing for quite some time to come. Not

Sure! Gimme the CPU power and disk throughput for that, pretty please. [No,

With no description on how this is supposed to work, this is pure science

Coherency is essential, but it isn't free. Far from it. The easiest way of
getting coherency is having _one_ authoritative source. That way you don't
need coherency, and don't pay for it. Anything in this class must by force
be just hints, to be recomputed at a moment's notice. I.e., let the
application who might use it check and recompute as needed.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513
Cc: Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, September 2, 2004 - 1:32 pm

[Empty message]
Cc: Horst von Brand <vonbrand@...>, Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, September 2, 2004 - 2:23 pm

Can be done with dnotify/inotify and a cache daemon keeping track of
mtime.  Yes, this will need a kernel change to make sure mtime always



And so on.

  /Christer

-- 
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux 
Christer Weinigel &lt;christer@weinigel.se&gt;  http://www.weinigel.se
Cc: Horst von Brand <vonbrand@...>, Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Thursday, September 2, 2004 - 5:47 pm

- Can the daemon keep track of _every_ file on my disk like this?
      That's more than a million files, and about 10^5 directories.
      dnotify would require the daemon to open all the directories.
      I'm not sure what inotify offers.

    - What happens at reboot - I guess the daemon has to call stat()
      on every file to verify its indexes? Have you any idea how long
      it takes to call stat() on every file in my home directory?

    - The ordering problem: I write to a file, then the program
      returns.  System is very busy compiling.  2 minutes later, I
      execute a search query.  The file I wrote two minute ago doesn't
      appear in the search results.  What's wrong?

      Due to scheduling, the daemon hasn't caught up yet.  Ok, we can
      accept that's just hard life.  Sometimes it takes a while for
      something I write to appear in search results.

      But!  That means I can't use these optimised queries as drop-in
      replacements for calling grep and find, or for making Make-like
      programs run faster (by eliminating parsing and stat() calls).
      That's a shame, it would have been nice to have a mechanism that
      could transparently optimise prorgrams that do calculations....

Do you see what I'm getting at?  There's building some nice GUI
and search engine like functionality, where changes made by one
program _eventually_ show up in another (i.e. not synchronously).

That's easy.

And then there's optimising things like grep, find, perl, gcc,
make, httpd, rsync, in a way that's semantically transparent, but
executes faster _as if_ they had recalculated everything they

No, not 3, 4 or 6.  For correct behaviour those require synchronous
query results.  Think about 6, where one important cached query is
"what is the MD5 sum of this file", and another critical one, which
can only work through indexing, is "give me the name of any file whose
MD5 sum matches $A_SPECIFIC_MD5".  Trusting the async results for
those kind of qu...
Cc: Christer Weinigel <christer@...>, Horst von Brand <vonbrand@...>, Adrian Bunk <bunk@...>, Hans Reiser <reiser@...>, <viro@...>, Linus Torvalds <torvalds@...>, Christoph Hellwig <hch@...>, <linux-fsdevel@...>, <linux-kernel@...>, Alexander Lyamin aka FLX <flx@...>, ReiserFS List <reiserfs-list@...>
Date: Monday, September 6, 2004 - 11:55 am

I don't think dnotify/inotify handles subdirectories well yet, so I

The daemon saves state before it shuts down and reloads the state
after a reboot.  You have to make sure that it is started first and
stopped last during the boot process.  How would a kernel plugin
handle things that happen before or after the plugin module has been

Sure you can.  First of all, you can just wait for the daemon to
finish indexing any files that it has been notified about changes in.
This is no different from you having to wait for the kernel to finish
indexing the files.  Or are you suggesting that the kernel should stop


So how do you calculate the MD5 sum of a file that is in the process
of being modified?  It's not possible to do that unless you block all
other access to that file and recalculate the MD5 sum after each
write.  With a notifie