login
Header Space

 
 

my problem with filesystems

August 29, 2004 - 12:55pm
Submitted by biscuitman on August 29, 2004 - 12:55pm.
Linux

Reading that article about reiser4 got me thinking. I myself have been researching about reiser4 and after a year of trying to understand it i still dont get it.

So what is my problem with filesystems?

Search. Fast search for particular kind of files in a big directory heirarchy containing gigabytes of data. Its very slow now. So what is needed? An indexing system of high efficiency? files containing data that describe other files? how does this uber index track changes to the files? notifications? and how do we optimise reading of large files that these indexes can become? split them into small files? files and directories are basically chunks of disk space. what is the most efficient way to work with directories and files? directories are just files. so a file with a special kind of layout is similar to a directory with higher level semantics.

Logging. most of my databases are small files. and they are memory bound. its mostly read once, modify for some time then close. maybe i could use append only log files thats very hard to corrupt instead. how do we optimise these log files layout on disk?

custon file formats. maybe this is a non issue. answer is to use similar formats for files with similar purpose.

You need;

August 29, 2004 - 1:10pm
Anonymous

updatedb
locate name

updatedb

August 29, 2004 - 7:51pm

I need something more powerful than that. How about content indexing? I dont think you want to run updatedb from FAM.

depends on what to index

August 30, 2004 - 10:06am
Anonymous

I suppose the amount of useful information to be indexed is limited in reality. Length and codec information on big movie files, tags on mp3 files, text portition of PDF and DOC files, file creation dates and owners etc. The new indexable filesystem could track "dirty" files that need to be indexed on next possible occasion.

The search engine would also need to keep track on what information to give out. As I recall distantly locate had a bug that made it possible for unpriviledged user to search filenames on directories they had no access to.

One particular thing with the reiser4 fuzz surprised me a little. If I got them correctly some people want to use multiple streams of a file to help parsing file formats. As a stupid example: reading file ~/mypicture.jpg/resolution would give me the resolution. Reading the base file mypicture.jpg would give me the file in "serialized" format suitable for transporting to another system (that is the jpg file itself as we know it today:-).

Their argument being that re-implementing parsing code for file formats wouldn't be necessary anymore. I cannot really see this any better than using parser libraries. Who is going to standardize the mapping of file formats to namespaces? File formats surely will still be defined in the original serialized form by their designers and standard mapping come second. This wont probably change before the whole world agrees on switching to namespace based file formats :)

If somebody really thinks this interface would be good then who stops offering functions like openjpg("mypicture.jpg/resolution") in the jpeg lib?

/Tero

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary