login
Header Space

 
 

Linux: Semantic Changes with Reiser4

August 27, 2004 - 2:48pm
Submitted by Kedar Sovani on August 27, 2004 - 2:48pm.
Linux

As expected, merging Resier4 into Andrew Morton [interview]'s -mm tree [story] brought with it a lot of additional features and semantic changes. Christoph Hellwig expressed some unhappiness over these semantic changes, spawning a lengthy thread on the lkml. Specifically, he mentioned that the handling of files-as-directories (multiple streams within files) could cause problems to user-space applications, and could cause other dcache problems.

A lot of opposition was expressed. Some mentioned that the handling of multiple streams is really a userspace issue, whereas others mentioned that legacy applications may not properly handle multiple streams which could lead to the loss of user data. This lead Hans Reiser to say in support:

"Andrew, we need to compete with WinFS and Dominic Giampaolo's filesystem for Apple, and that means we need to put search engine and database functionality into the filesystem."

Linus was supportive of having such features in the OS, with the competition (Windows, OSX) already supporting such features. The discussion then moved to whether part of the feature set supported by Reiser4 should be moved to the VFS layer, and if any of the features should be included in the kernel.



Mail Thread Archive : silent semantic changes with Reiser4

Reiser4 file semantics

August 27, 2004 - 12:31pm
Anonymous

I have been reading some dissenting voices about the Reiser4 file semantics and the problems that this will present to the Linux community. In a nutshell every file now would look like a directory and can be opened as a directory. The names in that directory are not new files but meta data associated wit the file. This is well documented by Has Reiser on the Namesys site. This change is in some way sneaky, but in reality Hans has been writing about it for years - most of us did not pay too much attention. The immediate response in the community has been that this is too big a change and should be withdrawn.

I humbly propose that this is a challenge we should face head on now or we may not have an opportunity to do so in the future. The best way for open source to fight patents is to create prior art and you can only create prior art if you have a problem to solve. WinFS is going to give Microsoft the opportunity to discover the problems that have to be solved when faced with a file system that offers rich meta data. IMHO we have to innovate to prevent patents corralling all open source development to the old Unix domain. The only way we can fight patents is to create prior art. If we are too conservative about the challenge of change we will have to be simply spectators while the like of Microsoft patent all the ‘trivial ideas’ around the rich meta-data semantics that Reiser4 has to offer. We should give to the community the opportunity to discover and solve the problems that using new ways of looking at files and information that we will face.

I am generally of the opinion that much of the 'innovation' in computing is largely trivial or useless from a long term point of view. A few years ago we were told that Unix was a relic of the past and Windows NT was the operating of the future - well we see the future has to reinvent that past relic 'bit by bit'. We now see that Microsoft had many good ideas but also may worthless ones and they are having to retro-fit much that had been implemented in Unix all those years ago. But it has not been a one way street, we too have borrowed many ideas from Microsoft also.

The challenge of WinFS is not that it will be so great, in the beginning, but that it give Microsoft first crack at tackling and patenting all the trivial little solutions that integrating the WinFS into an existing computing environment poses. If we faced those issues first we have the opportunity to create the prior art necessary to defend against the, mostly trivial, patents that Microsoft and other will be filing furiously. If we are too conservative, there will be no prior art to face the challenge. Whatever your opinion about patients they will stop us dead in out tracks if we do not innovate first.

The new file semantics is both a challenge and an opportunity and one that the 'many eye-balls' of open source should brilliantly demonstrate. Yes this changes the way we view a file system and what it can be used for. As other have mentioned user space solutions would be unworkable because of the huge task of getting everybody to agree on libraries and converting the huge number of applications to use common libraries - the kernel is the common library all applications are forced to use. I strongly agree with Hans that the semantics should not be removed from Reiser4 but here lies the challenge how do you write a simple file copy utility, we can not longer use a simple OPEN, READ, WRITE, and CLOSE and get a perfect copy. But of course it has never been that simple to actually copy a file - files have always had other attributes (security, timestamps, ownership).

Perhaps we have always needed a separate form of file open - OPEN_READ_WITH_META_DATA and OPEN_WRITE_WITH_METADATA (choose any name you like). This form of open would maintain the original 'file as bytestream' concept and would read all the metadata first followed by the actual data for the file. Clearly the encoding of the meta-data is left as an open question (I would prefer the meta-data be encoded in XML utf8 format). I see the challenge coming from completely internalizing the richer semantics that is proposed. The main issue is to enable, file system utilities such as backup and restore, file copy and move, and of course the most interesting of all version control such as CVS to work well with rich meta-data. Microsoft faces this problem also - let us not wait for it to patent all the simple ideas around this interesting extension to the file semantics.

I strongly believe in the philosophy that Namesys is proposing - unifying names space for rich data. The Unix file model was absolutely great for its time, but as we attempt to handle TB of disk space and files. IMHO the real challenge that open source faces is how to balance the desire for conservative incremental change while pushing the limits of our view of computing. In the last two decades w have seen 10,000 x improvement in the power of the machines we have available yet the underlying software infrastructure have improved in relatively modest terms. We need to show that open source is not simply a rehash of other peoples technology but is capable of expanding our view of computing otherwise 20 years from now we will still be asking how do we effectively compete with Microsoft.

I humbly propose that this is

August 28, 2004 - 1:12am
Anonymous

I humbly propose that this is a challenge we should face head on now or we may not have an opportunity to do so in the future. The best way for open source to fight patents is to create prior art and you can only create prior art if you have a problem to solve. WinFS is going to give Microsoft the opportunity to discover the problems that have to be solved when faced with a file system that offers rich meta data. IMHO we have to innovate to prevent patents corralling all open source development to the old Unix domain. The only way we can fight patents is to create prior art.

How is this related to the problem we face here? ReiserFS4 already exists. Merging it with the -mm kernel or not, the prior art is there.
Although if you argue that by including ReiserFS4, Hans Reiser is more likely to innovate (think ReiserFS5 and 6) which means Microsoft is less likely to patent WinFS-related semantics then it makes sense. If that is what you argue donating some money to him or buying services also contributes.

Consider that anything Microsoft has researched right now, and developed already, may also be patent pending. They don't have to file their patent request right after they 'discovered' something related to . Also, WinFS won't be the only player in this field (think about work done by BeOS already and work done by Apple) and Microsoft will put a damn lot more in their patent portfolio as they've already stated. Longhorn will have a lot new toys WinFS being one of them.

The point isn't whether Reise

August 28, 2004 - 1:51am
Anonymous

The point isn't whether ReiserFS X will come about. The point is that these new semantics will be a launch pad for further exploration and further prior art derivation by everybody, not just Hans.

Nothing new, but is it useful?

August 29, 2004 - 4:57am
Anonymous

Believe me, they already have thought to all the implications of what they are doing, since a while. That's apparently not the case in Linux world. But maybe ReiserFS4 asks for the right questions...
Is it usefull?

NTFS already implemented multi-stream files since a long time already. I fear Microsoft could have already find all the mess that can cause.

However, a NTFS file cannot have "directories", but this "functionality" is already implemented too since nearly a decade now in Windows, through OLE interfaces IPersistStorage.

In fact, OLE storage was already an (second) attempt to make the filesystem evolve, it already enabled document searching (and MS already had done it) through special meta data streams in OLE documents. This is the way finding files can get indexes and other identifying informations from inside closed file formats - all they need to is to be OLE storages and have meta data in specified streams.

It seems that Microsoft has already tested all these things. However, they are not yet visible, end user simply cannot access them for now (not quite true, but this is not very used). I suspect WinFS to be based on these already existing facilities the core OS already has since nearly the beginnings. For now, all new versions of Windows I've seen seemed to unveil and give access old functionalities (that were finalized for this purpose). As far as I'm concerned, Windows was "finished" 10 years ago already (I really mean it: NT4 core nearly already had all the "great" functionalities longhorn will expose).

The only thing they could add to NTFS that is presently missing is database support / automatic file versioning. This may be what they are actually doing, but I only suspect a great marketing revamping of existing Windows technologies, alongside with a new, easier API to tell ISF: "Now you should really use it... or your softs will appear as obsolete.". I mean there is nothing new under the sun (see VMS).

When I compared Linux FS with NTFS/OleStorage, I finally came to the conclusion that all these functionalities were great ideas in fact, but simply didn't took off since... this was not a necessity. Multi stream files was fun to play with, but let's look at it straight: virtually no application uses or are aware of it (notepad is!). More important, backup tools are not aware of it.

It looks to me as advanced attributes of files and ACLs (you can have multi streamed file and store meta data for search and indexing with that too in Unix by the way, simply need of some standard streams name and standard stream formats to define): great but finally not workable (ACL are a little exception since tools begins to be usable).

The only thing I wonder is if these functionalities are not used because they are not useful for now, or if it is because MS didn't promote them yet (by using them intensively in all of there products / revamping that through new APIs as .Net seems to evolve). In other words, I don't know if they will create the need, or if the no use will prevail because all this is useless...

I really do not understand the way Windows is evolving, with advanced user functionalities I do not need and finally bothers me (I really hate all these clipboard / find utilities / indexing tools / spyware evolutions).

In other words, I'm reluctant to new functionalities introduced for marketing reasons. I simply do not need them. When you have to store something, store it in a file. If you want it hierarchical, use plain directories.

Do not use complex file structure / no standard OS APIs that make data buried in difficult to use things. Data should be easily accessible, that's why XML succeeded by formalizing things. These "functionalities" ARE REGRESSIONS (not technically, but as far as FREEDOM is concerned). Why do you think MS is developping this? Yeah, we can read NTFS files under Linux, but they already planned to integrate OLE storages in the filesystem 5 years ago to kill interoperability!

If some Linux filesystem (at least one!) is developed in order to compete at the marketing level with Windows, great. But let's not forget to keep the OS simple and do simple things that are the usefull ones. One can put complexity in kernel if it is required to help *many* user applications (synonym as: that is usefull). But let's not mess it up with Windowsism, that would be to go on their battlefield, and they are better there.

To conclude, KISS and DIIU (Do It If Useful).

PS: this is not a troll attempt, that's simply my thoughts on "new" technologies.

CC

USEFUL rich metadata has been around for a while

August 30, 2004 - 8:37pm
Anonymous

The fact that Microsoft fucked up multiple forks under NT tells you a whole lot about Microsoft, but very little about the usefulness of this facility. A much better example is to look at the old style MacOS with the resource fork. The resource fork performed a bunch of roles under the old MacOS, some of which were a bad idea, and some of whhich are not relevant here.

What IS relevant is that they allowed various useful metadata to be attached to files in a way that
(1) persisted through moving the file around, including to other machines, compressing it and so on
(2) that did not modify or affect the "main" data of the file, meaning that an application that did not understand the metadata could still use the "main" data of a file
(3) was extensible so that various other apps could add metadata to a file, conventions could change with time about the naming and structure of common metadata and so on.

There were many many examples of this sort of thing. Here are some:
(1) When files were downloaded from the internet (HTTP, FTP etc) a string giving the URL of the file was added as metadata.
(2) Editing applications (ie vi/emacs type things) would store in the metadata of a file the position of the window holding that file on the screen, the offset into the file that the window was scrolled to, the current selection and so on. This meant you could close a text file, re-open it, and be right back where you left off working.
(3) Various metadata appropriate to pictures could be placed in resources --- EXIF type info, a small icon for use by browsers and the file system and so on.

Now OF COURSE you can work around these issues. JPEG has a (lame) method of adding various bits of metadata, because such metadata is really necessary. MP3 has a different lame method. Quicktime files (and therefore MPEG4 and AAC files) have a third method which isn't lame, but is once again different. Executable files are nothing but collections of different pieces of data, with occasional new types being devised and added.
Now COME ON. It is clearly retarded to see the same problem being solved over and over again, in different ways that mean more work for everyone, and to conclude from this that a better solution is not needed. The current system not only foists limited metadata schemes upon in some areas (eg JPEG, MP3), or no metadata (eg text files), but provides for no comprehensive mechanism for browsing and editing this data.
To read or write your MP3 data, you are limited to what your MP3 app provides; likewise for your JPEG data.

Maynard Handley

Matthew C. Tedder

August 31, 2004 - 2:30pm
Anonymous

Of course, we sorely need to move on this system. The implication are profound in terms of productivity increases to end users as well many other benefits. Imagine the following ascii art as a GUI for querying your files or creating "views" that can be thrown on your desktop as icons:


Show:[X][X][X]Attributes:[creationDate \/][name \/][project \/]Order:[Descending \/][Ascending \/][None \/]Where:&gt 25-06-2004'homework' or 'notes'Or&lt 25-06-2004'class prep'



The attributes are drop-down boxes. You will *NEVER* loose a file again. Heirarchies really suck, in comparison. And imagine the command line parallels to this... The "ls" command only offers a couple of options in terms of how a directories contents or subdirectories contents or formatted...only wildcards for filenames in terms of filtering. Here we could hand code the same thing like:



ls -bysql "SELECT creationDate, name, project ORDER BY creationDate desc, name WHERE (creationDate > 25-06-2004 and (project = 'homework' or project = 'notes')) or (creationDate < 25-06-2004 and project = 'Class Prep');"



Command line users will have no problems with this and GUI users will have no problems with the GUI version. Overall, it'll REALLY DRAMATICALLY improve the whole computing experience...be it for backup systems or desktop users.



I have been waiting for YEARS for this to come out.. If I were not so busy, I'd get to writing these new tools for Reiser4 right now.



Matthew C. Tedder

One point that some people ke

August 31, 2004 - 8:08pm
Anonymous

One point that some people keep bringing up though, is that you can accomplish much if not all of this "user experience" through user-space applications, without needing to make the actual filesystem more complex. As a simple example, you can accomplish much of what you addressed through the simple 'find' utility: you can perform searches through the filesystem performing regular expression pattern matches on file names or directory names, filtering the results by creation/modification/access time (absolute or relative), and you can tell it to execute commands on files that match your conditions, such as greps to look for matching content, etc. As for the GUI for this, you can store any additional metadata through special hidden directories throughout the hierarchy, or any number of alternate methods. The point is, that you don't have to modify the filesystem for this, you can keep it simple (fast) and lean (lower chance for bugs), while building up the "user experience" in user-space (applications, utilities, etc.)

(Naturally that doesn't mean that we shouldn't investigate and play around with new methods like what Reiser4 is offering. :) )

-jesse

OS/2 HPFS had this

November 8, 2004 - 5:14pm
Anonymous

and it was available through the GUI or through REXX, but not in the SQL-like manner you desire

here here!

August 28, 2004 - 2:28am
Anonymous

I agree with this, competition with other filesystem is VERY important in order for Linux to be make the grade with end users.

I am also equally concerned with breaking apps or moving userpace problems into kernel space.

I think it might be time to rethink the idea of a file and stream instead of trying to merge new FS concepts into the old FS interface model.

My brother's computer uses ex

August 28, 2004 - 5:02pm
Anonymous

My brother's computer uses ext3, he is a complete newbie, just use the computer so surf the web. He is the end user, and i'm very shure that he don't know what the hell a filesystem is.

WinFS is just something to tell the people: Hey! We have something here that is the kick ass of all time! They said that when win95 came out. LOL!

Yes, but...

August 29, 2004 - 6:42am
Anonymous

It is likely he uses some of Ext3's features such as journaling. If he used Ext2 instead, he'd experience the fscks which he wouldn't with NTFS (but would with FAT32) so he does experience (some) benefits even though he doesn't know why or how. Overal, such features all-in-1 form the user experience which is one of the aspects one could use to chose product X over product Y.

But since ReiserFS4 ain't tested much (not enough) i'd not recommend it to people who "don't know what they're doing" (NOFI).

No--The Difference Will Be Profoundly Visible To All

August 31, 2004 - 2:39pm
Anonymous

Not only the performance improvements, but when you can query your files (see my post up above a little) and create desktop icons that show your files organized in certain ways... When you no longer need to search through ugly heirarchies to find things--you're whole computer life will be profoundly and visibly different.



Productivity will be greatly increased and people's frustrations greatly diminished. Reiser4 opens is perhaps the single most important innovation in the high tech world today. It is the gateway to this and many more great new capabilities.



Matthew C. Tedder

In a few year's time...

August 28, 2004 - 3:38am
Anonymous

...we'll wonder how we ever managed to get by without reiser 4. One thing in particular grates me about the current state of computing:

We have filesystems, and then we have databases. They do the same thing: store data in a structured, easily retrievable way. Now, when the db sits on top of the filesystem, as in your everyday mysql installation, how can we still expect it to be faster than the fs? Doesn't the fact that the db exist mean that the fs failed one of it's basic requirements?

Reiser 4 will solve this, finally.

I don't think it is as simple

August 28, 2004 - 11:51am
Anonymous

Filesystem is a thing geared for storing files of arbitrary size and certain associated metadata set, typically things like owner, last access time, last modify time, etc. ReiserFS4 promises extension of this data to an arbitrary set.

But if you want to store, say an ordinary sql table of particular structure using the filesystem as data storage backend, then you'll probably find that the excess metadata (all the normally useful file-related data that doesn't matter crap for the application that wishes to treat the filesystem as database) kills you. Think about it: you can't have structure inside the file, so every row must be a directory (probably named by primary key), then inside each such directory you'd have a file named by the column name, with the content as bytes, and you'd have to encode the datatype information somewhere as well so that foo the char(4) and foo the integer can be told apart.

Now your filesystem must implement an interface for indexing: you want to find out all the values of primary key where foo takes a particular value very quickly. etc. etc. In other words, I think we'll keep on having files pretty much like today even after ReiserFS4 has come and gone. It seems to me that a FS is not such a great match for all kinds of databases.

And from my discussion it may be apparent to you why the database can be "faster" than the filesystem. It can be faster, even ater paying the filesystem cost, because it can pack data in more efficiently and avoid keeping the metadata it doesn't need.

re: I don't think it is as simple

August 28, 2004 - 8:10pm

You're assuming that the existing filesystem API will be the only API to get to the column data. The current filesystem API is not set up correctly for relational queries, we will have to add new calls. In the context of a new call accessing the column data should be just as fast as existing databases.

Agreed, but...

August 29, 2004 - 8:39am
Anonymous

if we are going to blur the distinction between a database and filesystem, assuming a fs can be made to work like db, then obviously it'll be so. This might not be different at all from how Oracle for instance maintains its own data partitions, bypassing as much of the kernel as possible for performance reasons. It could be very interesting if we simply built this all into kernel, whipped up the new interfaces as needed, and so on, but then again it would probably bloat the kernel considerably to have the flexibility to twist into these directions. All in all, not a week or a month's hack, this.

I thought this was exactly wa

August 29, 2004 - 11:07am
Anonymous

I thought this was exactly was reiser 4 was all about - making it possible to add these kind of extensions in a reasonable time, and secondly to provide this specific "module" as part of the "frequently used modules" set.

Isn't this exactly what all the fuss about silent semantic changes on lkml is all about?

Not Trying to Build a Database

August 31, 2004 - 2:49pm
Anonymous

Reiser4 isn't going to outperform MySQL or provide the rich feature set of PostgreSQL (though it will improve the performance of both). It's going to provide database-like querying capabilities to end users, among other things.

So, even if you save your files as you always have. It'll be useful to you. If you add attributes to better categorize your files, it'll be that much more useful to you.

Most importantly, heirarchies really suck.. If you enjoy digging down deep into folders under folders under folders to find your important files....and then matching dates between multiple copies of a file strewn throughout this heirarchy to see which is the most recent, you must also enjoy burning yourself with matches.



Matthew C. Tedder

package catalog

August 31, 2004 - 10:07pm

Hi,
you seem to know what you are talking about. What I wonder: would it be possible to for example add a meta-data "package" and "package-version". That could specify which package the given file comes from... Like what the dpkg database holds currently on debian. It could be useful especially when you build something yourself with "make" and "make install" that normally would just start to fill up the file system as the package manager can't remove it. Of course one would still have to use the debconf database (until elektra can do that job as well).

Yes, you would be able to do

June 23, 2005 - 1:08pm
JanC (not verified)

Yes, you would be able to do that (and you can do that on NTFS since NT4 or maybe even earlier!).

If you use the proper tools,

August 31, 2004 - 11:07pm
Anonymous

If you use the proper tools, then hierarchies become mostly transparent, and help to organize data (at least, that's one usage paradigm.)

Again, the 'find' utility is indespensible for aiding in querying your filesystem: for example, if you wish to find all *.doc files and then sort them by modification time (the filesystem's file modification time) then this is very easy (read through several pages of the 'find' man page): find /some/path/ -type f -name "*.doc" -printf "%T@ %p\n" | sort

You can piece together a couple simple 'awk' or 'sed' or 'grep' or 'perl one-liners' to filter/sort/manipulate your queries in a very powerful way. Do you see any ways in which providing more flexible meta-data would improve upon this (I'm curious)?

-jesse

hum

August 31, 2004 - 3:47am
Anonymous

---
Methinks Linus' right saying :

"So there's really no point in trying to push your agenda by trying to
scare people with MS activities. Linux kernel developers do what's right because it is _right_, not because somebody else does it"

If this FS model is needed and the _right_ thing to do, it'll be implemented.

And enough of these competition talks and MBA babbles, the community will surely die playing ball with Microsoft rules.

Linux goal is to offer the best free kernel you can have, not furthering the agenda of those who seek a Microsoft remedy.

As stated before... this will only be a side effect.

Fabrice.

XML-like semantics for VFS

August 31, 2004 - 5:17pm
Anonymous

Why not use a semantic that has proved to work for a lot of things like XML? Instead of having files and directories, just have elements. Elements can have attributes associated with it and may contain for elements or raw data.

-Richi

try adding a attribute to the

September 6, 2004 - 6:26pm
Anonymous

try adding a attribute to the middle of a 320 gigabtye xml file, without distrubing the millions of other operations going on at the same time.

This may sound either crazy o

March 4, 2005 - 5:04am
Anonymous (not verified)

This may sound either crazy or sarcastic, but I don't really care - Why not simply keep the filesystem exactly as it is, but instead considering each file to be something with subelements inside it, just use directories where once you would have used files? Everything people have been talking about with regards to multiple forks, metadata, and the like, could be implemented this way. You download a "file", but what you actually get is a directory, with a file in it, and possibly other files giving metadata, such as where a file is from. But instead of all these little files in the directory being generated on the fly by some plugin system, just have them actually be files.

uh...yeah....

July 12, 2005 - 11:23am
bocomowo (not verified)

Yep. That's pretty much exactly what Reiser4 does, hence what we are talking about. Every file can also be accessed as a directory containing attributes as files.

What I really want to know is whether we can index them. I don't find much discussion of that, but it's the important part for me. I want to be able to add arbitrary metadata and then index by each label, opening up the promise of a truly new storage paradigm.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary