After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems and have a slight potential to break
even in the kernel.
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
- meaning of the -x permission. This one has different meanings on
directories vs files on UNIX systems. If we want to support
directories as files we'll probably have to find a way to work
around this.
- dentry aliasing. I can't find a formal guarantee in the code this
can't happen
o metafiles - ..metas as a magic name that's just taken out of the
namespace doesn't sound like a good idea. If we want this it should
be a VFS-level option and there should be a translation-layer to
xattrs. Not doing this will again confuse applications greatly that
expect uniform filesystem behaviour.
Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
-(Apologies for beating on a dead horse here...) I seems like one confusion here is that everyone has a different idea of what the semantics we're talking about are. I see two main ones: Hans's: A file, a directory, and an attribute are functionally equivalent (except for S_ISxxx and hardlinks). That is, /usr/bin/metas makes sense, and it's not talking about a program called metas. This also means that /foo/metas/metas might exist and needs dealing with. Linus's (I think): A directory is just a directory (no attributes and no read()able data). A file can contain attributes, where attributes can be "file" attributes or "directory" attributes. That is, a file is also a subtree with posix-like semantics (except for hardlink stuff). So doing "touch /tmp/foo; cat /tmp/foo/metas" fails, rather than doing something that's probably useless. "touch /tmp/foo; touch /tmp/foo/bar; touch /tmp/foo/bar/baz" fails on the last touch because bar is a file attribute and recursive magic is disallowed. Which one are we talking about? FWIW, I like the latter version a lot better, as it removes a lot of ambiguity. If I see a path like /tmp/foo/metas/uid, it is either a uid attribute (i.e. writing it has security implications) or it is a standard file (i.e. writing it just writes it). But _I can tell which one_ by fstat()ing /tmp/foo! If it's a directory, than I have either a named stream called uid or a genuine file called uid (and I can tell which by fstat()ing /tmp/foo/metas), but if it's a file then I have the magic uid. And I'm guaranteed that there's no other funny business, because /tmp/foo is a "file with a subtree," which means that it's not an attribute. This way I know exactly what I'm dealing with. As an added bonus, we could have O_NOMETAS which means that "files" may not be traversed. Then someone who wants to make sure they get a real file can do it. If recursive files-as-dirs were allowed, that might not do quite what the caller expected. If y...
I had not intended to respond to this because I have nothing positive to say, but Andrew said I needed to respond and suggested I should copy Linus. Sigh. Dear Christoph, Let me see if I can summarize what you and your contingent are saying, and if I misconstrue anything, let me know.;-) You ignored everything I said during the discussion of xattrs about how there is no need to have attributes when you can just have files and directories, and that xattrs reflected a complete ignorance of name space design principles. When I said we should just add some nice optional features to files and directories so that they can do everything that attributes can do if they are used that way, you just didn't get it. You instead went for the quick ugly hack called xattrs. You then got that ugly hack done first, because quick hacks are, well, quick. I then went about doing it the right way for Reiser4, and got DARPA to fund doing it. I was never silent about it. Making files into directories caused only two applications out of the entire OS to notice the change, and that was because of a bug in what error code we returned that we are going to fix. You think that was a disaster; I think it was a triumph. Now a cleanly architected filesystem with no attributes and just files and directories that can do everything attributes are used for exists. You don't want it to have the competitive advantage. Instead, you want it to have its clean design excised until you have something that duplicates it ready to go, and only then should it be allowed that users will use the features of your competitor's filesystem which you disdained implementing for so long. Since you never studied or understood namespace design principles (or you would not have created and supported xattrs), you want to rename it to be called VFS, rewrite what we have done, and take over as the maintainer, mangling its design in a committee clusterfuck as you go. We have just implemented very tri...
Just curious about your comments on Jamie Lokier's suggestions for enabling files-as-directories semantics without breaking existing apps. Chris
I don't want to comment on any of the technical issues about VFS etc. as I would be completely out of my depth, however I do want to say 2 things. Firstly, this is a feature that Samba users have been needing for many years to maintain compatibility with NTFS and Windows clients. Microsoft no longer sell any servers or clients without support for multiple data streams per file, and their latest XP SP2 code *does* use this feature. Whatever the kernel issues I'm really glad that Hans and Namesys have created something we can use to match this functionality - soon we will need it in order to be able to exist in a Microsoft client-dominated world. My second point is the following. Hans - did you *really* have to reinvent the wheel w.r.t userspace API calls ? Did you look at this work (done in 2001 for Solaris) ? http://bama.ua.edu/cgi-bin/man-cgi?fsattr+5 http://bama.ua.edu/cgi-bin/man-cgi?attropen+3C http://bama.ua.edu/cgi-bin/man-cgi?openat+2 I'm complaining here as someone who will have to write portable code to try and work on all these "files with streams" systems. Jeremy.
I agree that your work is important without agreeing that MS client domination will last.;-) It is indeed my desire to give you every single feature you need to emulate MS streams within files, but doing it using directories that are files. I would like to support you in I interviewed for the file system architect job at Sun in, I think, 1999, and they offered me the job conditional on my giving up on my Linux work. (After much trying and failing to convince them that it would be okay for me to work on Linux also, I declined the job, much to my fiscal loss and work satisfaction.) They do not do a pure job of implementing attributes in the file namespace though. There are far more distinctions between files and attributes than are necessary that are described in these man pages below, and those distinctions cause a loss of closure. I can say more
1) how do you back up and restore files with streams inside ? 2) how do standard unix utilities handle them ? -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
To repeat, I think it would be nice to implement a filename/pseudos/backup method for all the plugins. Guys, we just have the beginnings in place. One plugin method at a time it will all fall into place. What we have now is useful now. The more methods come into existence, the more compelling it becomes --- standard network economic theory applies here. Hans
Most likely they don't ;) That is, until they are fixed or replaced. I've heard of people who want xattrs to be backed up so they use star, not gnu tar, already. -- mjt -
Not good, in my view.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
-Actually in most of the discussion you simply didn't participate. While xattrs might not be the nicest interface they have the advantag of not breaking the SuS assumption of what directories vs files are, and they do not break the Linux O_DIRECTORY semantics that are defined and need For one thing _I_ didn't decide about xattrs anyway. And I still haven't seen a design from you on -fsdevel how you try to solve the My competitors filesystem? If you look at MAINTAINERS I maintain only vxfs and sysvfs, neither of which I'd suggest anyone to run their system Hans, please stop the personal crap or the black helicopters will kidnap you. When was the last time you actually worked on kernel namespace code instead of talking marketing bullshit and ignoring all real world Could you pass on that crack pipe please?
Hey, files-as-directories are one of my pet things, so I have to side with Hans on this one. I think it just makes sense. A hell of a lot more sense than xattrs, anyway, since it allows scripts etc standard tools to touch the attributes. It's the UNIX way. And yes, the semantics can _easily_ be solved in very unixy ways. One way to solve it is to just realize that a final slash at the end implies pretty strongly that you want to treat it as a directory. So what you do is: - without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR) - with the slash, it won't open _without_ O_DIRECTORY (EISDIR) Problem solved. Very user-friendly, and very intuitive. Will it potentially break something? Sure. Do we care? Me, I'll take that kind of extension _any_ day over xattrs, that are fundamentally flawed in my opinion and totally useless. The argument that applications like "tar" won't understand the file-as-directory thing is _flawed_, since legacy apps won't understand xattrs either. Oh, add a O_NOXATTRS flag to force a path lookup to only use regular directories, the same way we have O_NOFOLLOW and friends. That allows people to see the difference, if they care (ie a file server might decide that it doesn't want to expose things like this). I never liked the xattr stuff. It makes little sense, and is totally useless for 99.9999% of everything. I still don't see the point of it, except for samba. Ugly. Linus
Stupid question: who will use it? And why? Anyone can write an userspace library, that implements function set_attribute(char *file, char *attribute, char *value), that creates directory ".attr/file" in file's directory and stores attribute there. (and you can get list of attributes from shell too: ls `echo "$filename" |sed 's/\/\([^\/]*\)$/\/\.attr\/\1/'` ). There's no need to add extra functionality to kernel and filesystem. Advantage: - you don't add bloat to kernel or filesystem - you don't need to teach tar/cp -a/mc about attributes - you won't lose attributes after editing file in vim (it creates another The only way xattrs are useful is that backup/restore software doesn't have to know about every filesystem with it's specific attributes and every magic ioctl for setting them. Instead it can save/restore filesystem-specific attributes without understanding what do they mean. However there's no need why application should use them. And no application does. I can't imagine anyone shipping an application with "this app requires reiser4" prerequisite. Why should anyone use it if he can store attributes in ".attr" directory or whereever and make the application work on any OS and any filesystem? Mikulas -
..and the above is, roughly, what I understand samba etc falls back on. The problem ends up being that the above isn't in any way safe from people moving files around (oops, where did those attributes go?) nor does it have any consistency guarantees. So it only works well if _one_ application does this, and that application follows all the locking rules. If no application does, then why back them up? Why implement them in the first place? In other words - some apps obviously do want to use the. Sadly. Linus -
You can add more functionality to filesystem and use xattrs to control it. For example: - acls - compress file - encrypt file (copy user's password into task_struct and use it to encrypt his files) - preallocate file in 4MB contignuous chunks, becuase it needs real time multimedia access - sync/append-only/immutable etc. However there's no need why an application should care whether the file is compressed, whether it has acls, or so. And applications don't. And I think this is the only legitimate use for xattrs. Who else uses them except samba? I don't see how reiser4's hybrids would help. Mikulas
Streams are quite ugly. However, if you decompose streams into all of the little pieces that are needed to emulate them, the pieces are quite nice. For instance, inheriting stat data from a common parent is nice, and inheritance is nice, and being able to cat dirname/pseudos/cat and get a concatenation of all of the files is nice, and being able to cat dirname/pseudos/tar and get an archive of the directory is nice, and, well, if you decompose all of the features of streams into little features you get a bunch of fun little features much nicer than streams. Hans
Yes. Being able to cd into filename.tar.gz and filename.iso is also nice, but all of these features should be supported by the VFS generically, not in any specific filesystem, and there should be a hook to invoke the various fun filesystem-independent handlers by name. -- Jamie
It doesn't belong into the kernel at all. If at all it belongs into a userspace filesystems, but even in that case the magic detection of which one to use is kinda hard. You absoutely don't want to hardcode file formats in the kernel.
Do you mean user-level file system as a VFS handled by user applications, or a intermediate file system layer between any application and the real file system? The latter would be good enough as it would still be transparent to the applications. ~S
Oh, I agree userspace should be involved. -- Jamie
I've got a stupid question too. How do you back up these things ? If your backup program reads them as a file and restores them as a file, you might lose your directory-inside-the-file magic. If your backup program dives into the file despite stat() saying it's a file and you restore your backup, how are the "file is a file" semantics preserved ? Obviously this is something that needs to be sorted out at the VFS layer. A filesystem specific backup and restore program isn't desirable, if only because then there'd be no way for Hans's users to switch to reiser5 in 2010 ;) -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
It needs to be sorted out, whether it is sorted out at the VFS layer is It might be that we need a filenameA/metas/backup method for all of our file plugins, which if cat'd gives a set of instructions which if executed are adequate for restoring filenameA. Hans
On 2004-08-26T01:40:29,
So what exactly is wrong with sorting it out at the VFS layer, and why
do you _insist_ on sorting it in the reiserfs4 core? I'm missing
something, please fill me in on the details.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \\\ ///
SUSE Labs, Research and Development \honk/
SUSE LINUX AG - A Novell company \\//Sure, this sort of thing must be sorted out at the VFS layer. And a backup program working on such a filesystem will need to know that something can be a file, a directory - or both. So an old "tar" won't get this right as it will assume that an object is either file or directory. The change to get it right won't be that big - just notice that an object is both, then backup the ordinary file contents as usual, before recursing into the directory it also provides and backup stuff there as usual. The resulting .tar can of course only be unpacked properly on a fs supporting file-as-directory, similiar to how a .tar of a fs with links only will unpack properly on a fs supporting links. I don't see much problems for userland. Old apps will keep working, as the new features is a superset. Those who care about file-as-directory extras will provide patches for "tar" and friends, after that the extras become useable. Helge Hafting -
There are many backup apps, not just one. I've written a few myself,
none of which will ever be worthy of notice. The sourceforge
Topic.System.Archiving.Backup lists 335 projects at present.
I find the idea that most backup tools and scripts will silently
stop working correctly to be pretty scary.
And then there's archiving, installation, distribution, administration,
emulation, file system and partition managers, and on and on.
===
I wonder if we can make this "modal" somehow.
The one consistency I see is that apps that want the "enhanced" view
need to ask for it, somehow. It is the new views of the data that are
being added - let the app announce to the kernel (usually via
specialized code in some shared library that the app is using to get the
alternate views) that either per-task, or per-file descriptor, it is to
see the "enhanced" view, as a side affect of trying to access it.
Old stuff, or even new stuff that is content to work with the "classic"
view that a file is a single data stream, and that directories only
have pathnames, not data, would by default see that view, and see
_all_ the data, presented somehow in that view, perhaps as additional
files with magic names.
This still leaves the breakage that such tools don't know, and don't
preserve, the magic linkage between such magic files. But that is
much less of an issue, in my view. Programs such as backups that are
manipulating the files of apps they know nothing about already have
to presume that all the files are important in inscrutable ways, and
just be careful to preserve or copy or backup all of them.
Yeah - I realize that there will be a few followups denouncing modal
architectures. I might even agree with some of them.
If this were easy, it would have been done years ago.
The onus should be on the new stuff to request the enhanced view,
rather than on the old dogs to learn new tricks.
--
I won't rest till it's the best ...
Progr...They won't stop working, they will merely not support the new features. That is only a problem if you actually use those features. If all you have is plain files and plain directories - no problem. Not if files-as-directories are implemented right. Helge Hafting
Too late. We have xattrs already; many programs don't store them. -- Jamie
> Too late. We have xattrs already; many programs don't store them.
If by your "too late" you mean we can stop worrying about any more
breakage of file system utilities, because there exists an example
in which some were already broken, then you are absolutely wrong.
Just because we caused some breakage doesn't give us license to cause
even more.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373Encode the magic in the names, by stealing a bit of the existing
filename space to encode it.
Such works pretty well as part of the magic to map long filenames
into DOS 8.3 names on my FAT partitions.
Apps linked with the appropriate Windows library see nice fancy
long names.
The rest of the world, including DOS apps and my Unix backup
scripts, see the primitive 8.3 names, including one or a few
extra files per directory, which are nothing special to them.
So long as these other apps don't presume to know that they can
keep some of the files in an apps directory, and drop others, then
it works well enough. And no self-respecting general purpose
backup program is going to presume such knowledge anyway.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373I think we should require people to care enough to supply an O_NOMETAS
I just want to add that I AM capable of working with the other filesystem developers in a team-player way, and I am happy to cooperate with making portions more reusable where there is serious interest from other filesystems in that, but Christoph is a puppy who has never written or designed a major filesystem from scratch, and Nikita is a big dog who has written stuff very few projects are lucky enough to see the likes of, and when Christoph insults Nikita's code, or my design guidance for that code, it is not going to bring out my good side. The plugin and metafiles code needs many improvements, but Christoph does not have the expertise to understand what those needed improvements are because he hasn't invested the work into understanding the code. Christoph is a bright and clever young fellow, who just hasn't had the years of study of the field yet. I wish him well, and away.;-) Hans
It's not Christoph who's shown more bark than bite in this thread. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
The bite is at www.namesys.com/download.html Hans
I see this is as an opportunity where you can share some of your experience to Cristoph and many others and work to get the semantics into VFS. Please make this work :) -- mjt
Prove it. Stop replying for today and come back tomorrow with some useful discussions. Christoph suggested that some of the v4 semantics belong in the VFS and therefore linux as a whole. He's helping you to make sure the semantics and fit nicely with the rest of kernel interfaces and are race free. Take him up on the offer. -chris
If I may chime in here... This is an issue that directly affects work I am doing in extended cryptfs: http://www.linuxsymposium.org/2004/view_abstract.php?content_key=3D55 http://halcrow.us/~mhalcrow/ols2004.pdf http://halcrow.us/~mhalcrow/ols_cryptfs.sxi The basic idea is that the cryptographic context for every file is correlated with the individual file via xattr's. A file is a unit of data that should, as it stands, contain all the information requisite for the encrypting filesystem layer to transparently decrypt (and encrypt, when the file is written to). This allows for a key->file granularity, as opposed to a key->block device (dm-crypt) or a key->mount point (CFS) granularity. My grand vision is to have a policy that determines whether or not the encrypted version of the file or the decrypted version of the file is read, dependent on whether or not the file is leaving the security domain (the storage device under the control of the currently running kernel). For example, if the ``cp'' command is copying a file from a filesystem mounted from /dev/hda1 to a filesystem mounted from /dev/fd0, then the policy would indicate that (unless otherwise noted in the .cryptfsrc file in the root of the filesystem mounted from /dev/fd0, which might also contain the default security context for that filesystem or directory - like whose public keys should be used to encrypt the symmetric key for data) the file is leaving the security domain, and the encrypted contents of the file should be given to cp. Same with mutt reading an email attachment (as opposed to, say, .muttrc, where, more likely than not, the unencrypted version is wanted). The goal is to enable an ``encrypted by default'' policy, in which files on the storage devices are independent encrypted units that remain encrypted until an application that actually needs to see the decrypted contents opens them. Then the encryption and decryption is done transparently by the fs layer, as long as the user has th...
Reiser4 has an encryption plugin that will ship sometime this year. You might want to talk to edward@namesys.com about it.
I thought the UNIX way is "everything's a file", not "everything's a There's always the option that they're both broken. -- Mathematics is the supreme nostalgia of our time.
It really was. Directories were historically largely just files too, although with the special "lookup" operation. Historic unix didn't have readdir/rmdir/mkdir/rename or really much _any_ special directory handling. Directories were just files, and you read them like files. Of course, even in that early unix, "directories" were very much a reality even apart from the fact that they happened to be implemented pretty much like files. Nobody has ever claimed that the UNIX way is Yes. Highly likely. However, something like that _does_ end up what a Windows fileserver wants. IOW, even if it's broken, _something_ is likely forced on us by that nasty thing we call "real users". Damn them. Linus
That would solve the O_DIRECTORY issue, the dentry aliasing still needs work though with the semantics for link/unlink/rename. Maybe Hans & you should start 2.7 to work this out? :)
Not if you allow link(2) on them. And not if you design and market your stuff as a general-purpose backdoor into kernel. Note how *EVERY* *DAMN* *OPERATION* is made possible to override by "plugins". Which is the reason for deadlocks in question, BTW. Don't fool yourself - that's what Hans is selling. Target market: ISV. Marketed product: a set of hooks, the wider the better, no matter how little sense it makes. The reason for doing that outside of core kernel: bypassing any review and being able to control the product being sold (see above). Shame that it got an actual filesystem mixed in with the marketing plans and general insanity...
Heh. I don't think that's a very strong argument against being "unixy", considering how traditional unix _used_ to handle directories. mkdir/rmdir/rename only came later. Now, obviously they did come later for a good reason, but still.. The interesting part is that thanks to the dcache, we should be perfectly able to actually _see_ circular links etc, so some of the problems with linking directories should actually be quite solvable - something that is _not_ true for a traditional UNIX VFS layer. Of course, the dcache introduces some new problems of its own wrt directory aliasing, but I don't actually think that should be fundamental either. Treating them more as a "static mountpoint" from an FS angle and less as a traditional Unix hardlink should be doable, I'd have thought. (Also, it's entirely possible that the filesystem may not support some of the more esoteric linking/renaming operations. For example, in a traditional xattrs setup where the xattr is linked on-disk with the file it is associated with, you simply _can't_ link it somewhere else, or rename it to any other directory. That's not a VFS layer issue, obviously, but I thought I'd bring up the point that file-as-dir cases may have Now that's a separate argument, and not one I'm personally interested in arguing at least right now. I haven't actually looked at the reiser4 code, so I'm really _only_ arguing against special-case attributes. Linus
Yeah, if we ditch the "mountpoints are busy and untouchable" stuff. Which I'd love to, but it's a hell of a visible (and admin-visible) change. FWIW, current deadlocks are unrelated to actual operation succeeding. Look: we have sys_link() making sure that parent of target is a directory (PATH_LOOKUP, in a "it has ->lookup()" sense), then locking target's parent, then checking that it has ->link() (everyone on reiser4 does) and then checking that source (old link to file) is *not* a directory (in S_ISDIR sense). Then we lock source. Note that currently it's OK - we get "all non-directories are always locked after all directories". With filesystem that provides hybrid objects with non-NULL ->link() it's not true and we are in deadlock country. Before we get anywhere near fs code. I'm not saying that this particular instance is hard to fix, but it wasn't even looked at. All it would take is checking the description of current locking scheme and looking through the proof of correctness (present in the tree). That's the first point where said proof breaks if we have hybrids. And it's what, about 4 screenfuls of text? I have no problems with discussing such stuff and no problems with having it merged if it actually works. But let's start with something better than "let's hope nothing breaks if we just add such objects and do nothing else, 'cause hybridi files/directories are good, mmmkay?"
This message suggests a way to extend the VFS safe locking rules to include files-as-directories. Is this a problem if we treat entering a file-as-directory as crossing a mount point (i.e. like auto-mounting)? Simply doing a path walk would lock the file and then cross the mount point to a directory. A way to ensure that preserves the lock order is to require that the metadata is in a different filesystem to its file (i.e. not crossing a bind mount to the same filesystem). That has the side effect of preventing hard links between metadata files and non-metadata, which in my opinion is fine. Path walking will lock the file, and then lock the directory on a different filesystem. Lock order is still safe, provided a strict order is maintained between the two filesystems. The strict order is ensured by preventing bind mounts which create a path cycle containing a file->metadata edge. One way to ensure that is to prevent mounts on the metadata filesystems, but the rule doesn't have to be that strict. This condition only needs to be checked in the mount() syscall. -- Jamie -
Yes - mountpoints can't be e.g. unlinked. Moreover, having directory *Ugh* What would happen if you open that directory or chdir there? If it's You really don't want to lock mountpoint on path lookup, so I don't see how that would be relevant - it's a hell to clean up, for one thing (I've crossed ten mountpoints on the way, when do I unlock them and how do I prevent deadlocks from that?) Besides, different namespaces can have completely different mount trees, so tracking down all that stuff would be hell in its own right. The main issue I see with all schemes in that direction (and something like that could be made workable) is the semantics of unlink() on mountpoints. *Especially* with users being able to see attributes of files they do not own (e.g. reiser4 mode/uid/gid stuff). Ability to pin down any damn file on the system and make it impossible to replace is not something you want to give to any user.
Ok, so can we make it so mountpoints can be unlinked? :) The mount would continue to exist, but with no name, until its last I think the underlying file does not stay locked, and once you've entered it as a directory, it can be unlinked. If you have the directory open or chdir into it, then it _may_ have the effect of keeping the file's storage allocated when you unlink it -- just like when a file is unlinked while opened. As that is not a user-visible property, it's a filesystem-specific implementation detail as to whether it keeps the file's data in existence while the I didn't mean locking a chain of mountpoints, I meant the temporary state where two dentries and/or inodes are locked, parent and child, during a path walk. However I'm not very familiar with that part of the VFS and I see that the current RCU dcache might not lock that much I agree, users shouldn't be able to pin down a file. I think unlink() should succeed on a file while something is visiting inside its metadata directory. It's a filesystem quality-of-implementation feature whether that actually releases the file's data. It's a desirable feature because one user shouldn't be able to pin another user's quota'd data if they don't have permission to open the file, but if it's not implemented by a filesystem then it doesn't break anything fundamental. It's a semantics question whether unlinking a file makes the metadata (i.e. "uid", "mode", "content-type" etc.) disappear at the same time, or if the metadata stays around until the last visitor leaves it. A filesystem might be able to keep the metadata in existence even if it deletes the file's storage on unlink(), but it would be nice for the VFS to declare which semantic is preferred. One of the big potential uses for file-as-directory is to go inside archive files, ELF files, .iso files and so on in a convenient way. In those cases, if you open one of the virtually generated "archive content" files, then you might expect the data to continue t...
Yes, this was part of the plan, tar file-directory plugins would be cute.
Question: Is "cat /foo/bar/baz.tar.gz/metas" the attribute directory or a directory in the tarball named "metas"? Joel -- "I'm so tired of being tired, Sure as night will follow day. Most things I worry about Never happen anyway." Joel Becker Senior Member of Technical Staff Oracle Corporation E-mail: joel.becker@oracle.com Phone: (650) 506-8127
This has been fought over on the reiserfs-list ad nauseaum, but it's a valid point. That's why I tend to rename metas/ into ..metas/ to avoid name clashes, even if I've never had a directory named metas/ apart from what Reiser4 ships. It is then debatable if it should be renamed before it's too late, have it renamable in the kernel configs (and the name exported via /sys or something) or just leave it be. -- mjt
http://packages.debian.org/cgi-bin/search_contents.pl?word=3Dmetas&sear= chmode=3Dsearchfilesanddirs&case=3Dinsensitive&version=3Dunstable&arch=3D= i386 OK, those are capital METAS rather than junior metas, but it does show this is not a unique word to reiser4. --=20 "Next the statesmen will invent cheap lies, putting the blame upon=20 the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refu= se to examine any refutations of them; and thus he will by and by convince= =20 himself that the war is just, and will thank God for the better sleep=20 he enjoys after this process of grotesque self-deception." -- Mark Twai= n -
This needs to be designed.
Perhaps /foo/bar/baz.tar.gz/tar/metas is the directory in the tarball
named "metas".
Or perhaps /foo/bar/baz.tar.gz/x/metas is: it's independent of archive
format, and I personally tend to extract things into a directory
called "x". [*]
Or perhaps /foo/bar/baz.tar.gz/metas is, and the attribute directory
is /foo/bar/baz.tar.gz/../metas, to be perverse ;)
I prefer the second one, ("x/metas"), but not with any conviction.
-- Jamie
[*] Actually I prefer:
/foo/bar/baz.tar.gz/content/metas
/foo/bar/baz-0.01.tar.gz/content/baz-0.01/metas
Archives always in "content". One layer of decompression
always tried for .tar files and other uncompressed archive
formats.
/foo/bar/baz.tar.gz/x -> content/
/foo/bar/baz-0.01.tar.gz/x -> content/baz-0.01/
If the root of the archive contains a single directory, "x"
is a symlink to it. Otherwise "x" is a symlink to the root
directory of the archive. This is comfortable with the
common practice by which archives are distributed, without
making a mess when someone forgets to put everything in a
top-level directory.Silly question:
GNU Midnight Commander allows for ages to go into e.g. tar files, so I
know the benefits of this. Additionally, in GNU Midnight Commander, this
works no matter which file system I use (e.g. it works on iso9660), and
it even works the same way on other OS's like e.g. Solaris and NetBSD.
What is the technical reason why a tar plugin should be reiser4
specific, instead of a generic VFS or userspace solution that would
allow the same also on other fs like e.g. iso9660?
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-It should be a generic VFS plugin, not reiser4 or userspace. The VFS plugin should call out to userspace for most actions (except handling cached data), and it should take advantage of special reiser4 features for storage and performance optimisations. But it should still work over a standard filesystem, when those special features aren't available. I guess FUSE and many earlier projects are heading in this direction. A generic userspace solution doesn't let you "cd" into a tar file from all programs like you can inside Midnight Commander. Gnome and KDE take the approach that every userspace system call should be intercepted and filtered, to create the illusion of virtual data. As a result, different programs see different virtual data: you can't just cut and paste a path from Gnome or KDE into any other program. It's not just a "social problem of libraries" thing: sometimes I have programs which don't link to libc. Sometimes I have programs which mustn't link with anything that calls malloc(). It'd be silly for them to have a different view of the filesystem just because they can't link with some userspace library. The Gnome/KDE/Midnight Commander pure userspace solution is silly: if _every_ program in the system should get the same view, it makes much more sense for the kernel to filter the system calls and redirect the virtual accesses to a userspace daemon, while keeping the real accesses at full speed. Furthermore is makes much more sense for the kernel's page cache to hold those uncompressed pages, than for every userspace application to try and cooperatively manage a cache of uncompressed fragments in the most inefficient way. There's another problem with the Midnight Commander approach. If I "cd" into a tar file, and then a program writes to the tar file, I don't always see the changes straight away. The two views aren't coherent. This isn't an easy problem to solve, but it should be solved. When a simple "cd" into .tar.gz or .iso is implemented prope...
Jamie Lokier <jamie@shareable.org> said: Nonsense. The .iso or .tar or whatever would have to be kept un-isoed or un-tarred in memory (or on disk cache) for this to be true, and that takes quite a long time. Each time you want to peek anew at linux/Makefile, the whole tarfile will have to be read and stored somewhere, and that is just too slow for my taste. The .tar format is optimized for compact storage, the on-disk format of a filesystem is optimized for fast access and modifiability. Now go ahead and enlarge a file on your .iso/.tar a bit...it will take ages to rebuild the whole thing. There is a _reason_ why there are filesystems and archives, and they use different formats. If it weren't so, everybody and Aunt Tillie would just carry .ext3's around, and would Perpetuum mobile is a nice idea too. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 -
I'm going to explain why filesystem support for .tar.gz or other
"document container"-like formats is useful. This does _not_ mean tar
in the kernel (I know someone who can't read will think that if I
don't say it isn't); it does mean hooks in the kernel for maintaining
coherency between different views and filesystem support for cacheing.
The vision I'm going for here is:
1. OpenOffice and similar programs store compound documents in
standard file formats, such as .tar.gz, compressed XML and such.
Fs support can reduce CPU time handling these sorts of files, as
I explain below, while still working with the standard file formats.
With appropriate userspace support, programs can be written which
have access to the capabilities on all platforms, but reduced CPU
time occurs only on platforms with the fs support.
2. Real-time indexing and local search engine tools. This isn't
just things like local Google; it's also your MP3 player scanning
for titles & artists, your email program scanning for subject
lines to display the summary fast, your blog server caching built
pages, your programming environment scanning for tags, and your
file transfer program scanning for shared deltas to reduce bandwidth.
I won't explain how these work as it would make this mail too
long. It should be clear to anyone who thinks about it why the
coherency mechanism is essential for real-time, and a consistent
interface to container internals helps with performance.
Wrong. "So long as it remains in the on-disk cache" means each time
you peek at linux/Makefile, the tarfile is _not_ read.
For a tarfile it's slow the first time, and when it falls out of the
on-disk cache, otherwise, for component files you are using regularly
(even over a long time) it's as fast as reading a plain file.
You obviously know this, as you mentioned on-disk cache in the reply,
so I infer from the rest of your mail that what you're try...Nobody disagrees there (I think), the disagreement is on whether the "Coherency" and "different views" implies atomic transactions, and being able to tell that an access to the file requieres updating random junk about it. It requires being able to guess if it is worth updating now (given that the file might be modified a dozen times more before the junk And they are doing fine AFAICS. Besides, they won't exactly jump on the possibility of leaving behind all other OSes on which they run to become a I don't buy this one. A tar.gz must be uncompressed and unpacked, and Userspace support isn't there on any of the platforms right now, if ever it will be a strange-Linux-installation thing for quite some time to come. Not Sure! Gimme the CPU power and disk throughput for that, pretty please. [No, With no description on how this is supposed to work, this is pure science Coherency is essential, but it isn't free. Far from it. The easiest way of getting coherency is having _one_ authoritative source. That way you don't need coherency, and don't pay for it. Anything in this class must by force be just hints, to be recomputed at a moment's notice. I.e., let the application who might use it check and recompute as needed. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Can be done with dnotify/inotify and a cache daemon keeping track of mtime. Yes, this will need a kernel change to make sure mtime always And so on. /Christer -- "Just how much can I get away with and still go to heaven?" Freelance consultant specializing in device driver programming for Linux Christer Weinigel <christer@weinigel.se> http://www.weinigel.se
- Can the daemon keep track of _every_ file on my disk like this?
That's more than a million files, and about 10^5 directories.
dnotify would require the daemon to open all the directories.
I'm not sure what inotify offers.
- What happens at reboot - I guess the daemon has to call stat()
on every file to verify its indexes? Have you any idea how long
it takes to call stat() on every file in my home directory?
- The ordering problem: I write to a file, then the program
returns. System is very busy compiling. 2 minutes later, I
execute a search query. The file I wrote two minute ago doesn't
appear in the search results. What's wrong?
Due to scheduling, the daemon hasn't caught up yet. Ok, we can
accept that's just hard life. Sometimes it takes a while for
something I write to appear in search results.
But! That means I can't use these optimised queries as drop-in
replacements for calling grep and find, or for making Make-like
programs run faster (by eliminating parsing and stat() calls).
That's a shame, it would have been nice to have a mechanism that
could transparently optimise prorgrams that do calculations....
Do you see what I'm getting at? There's building some nice GUI
and search engine like functionality, where changes made by one
program _eventually_ show up in another (i.e. not synchronously).
That's easy.
And then there's optimising things like grep, find, perl, gcc,
make, httpd, rsync, in a way that's semantically transparent, but
executes faster _as if_ they had recalculated everything they
No, not 3, 4 or 6. For correct behaviour those require synchronous
query results. Think about 6, where one important cached query is
"what is the MD5 sum of this file", and another critical one, which
can only work through indexing, is "give me the name of any file whose
MD5 sum matches $A_SPECIFIC_MD5". Trusting the async results for
those kind of qu...I don't think dnotify/inotify handles subdirectories well yet, so I The daemon saves state before it shuts down and reloads the state after a reboot. You have to make sure that it is started first and stopped last during the boot process. How would a kernel plugin handle things that happen before or after the plugin module has been Sure you can. First of all, you can just wait for the daemon to finish indexing any files that it has been notified about changes in. This is no different from you having to wait for the kernel to finish indexing the files. Or are you suggesting that the kernel should stop So how do you calculate the MD5 sum of a file that is in the process of being modified? It's not possible to do that unless you block all other access to that file and recalculate the MD5 sum after each write. With a notifie
