Reading that article about reiser4 got me thinking. I myself have been researching about reiser4 and after a year of trying to understand it i still dont get it.
So what is my problem with filesystems?
Search. Fast search for particular kind of files in a big directory heirarchy containing gigabytes of data. Its very slow now. So what is needed? An indexing system of high efficiency? files containing data that describe other files? how does this uber index track changes to the files? notifications? and how do we optimise reading of large files that these indexes can become? split them into small files? files and directories are basically chunks of disk space. what is the most efficient way to work with directories and files? directories are just files. so a file with a special kind of layout is similar to a directory with higher level semantics.
Logging. most of my databases are small files. and they are memory bound. its mostly read once, modify for some time then close. maybe i could use append only log files thats very hard to corrupt instead. how do we optimise these log files layout on disk?
custon file formats. maybe this is a non issue. answer is to use similar formats for files with similar purpose.
You need;
updatedb
locate name
updatedb
I need something more powerful than that. How about content indexing? I dont think you want to run updatedb from FAM.
depends on what to index
I suppose the amount of useful information to be indexed is limited in reality. Length and codec information on big movie files, tags on mp3 files, text portition of PDF and DOC files, file creation dates and owners etc. The new indexable filesystem could track "dirty" files that need to be indexed on next possible occasion.
The search engine would also need to keep track on what information to give out. As I recall distantly
locatehad a bug that made it possible for unpriviledged user to search filenames on directories they had no access to.One particular thing with the reiser4 fuzz surprised me a little. If I got them correctly some people want to use multiple streams of a file to help parsing file formats. As a stupid example: reading file ~/mypicture.jpg/resolution would give me the resolution. Reading the base file mypicture.jpg would give me the file in "serialized" format suitable for transporting to another system (that is the jpg file itself as we know it today:-).
Their argument being that re-implementing parsing code for file formats wouldn't be necessary anymore. I cannot really see this any better than using parser libraries. Who is going to standardize the mapping of file formats to namespaces? File formats surely will still be defined in the original serialized form by their designers and standard mapping come second. This wont probably change before the whole world agrees on switching to namespace based file formats :)
If somebody really thinks this interface would be good then who stops offering functions like openjpg("mypicture.jpg/resolution") in the jpeg lib?
/Tero