Tux3 Acting Like A Filesystem

Submitted by Jeremy
on September 4, 2008 - 8:44am

Daniel Phillips noted that his new Tux3 versioning filesystem is now operating like a filesystem, "the last burst of checkins has brought Tux3 to the point where it undeniably acts like a filesystem: one can write files, go away, come back later and read those files by name. We can see some of the hoped for attractiveness starting to emerge: Tux3 clearly does scale from the very small to the very big at the same time. We have our Exabyte file with 4K blocksize and we can also create 64 Petabyte files using 256 byte blocks." He went on to discuss some of the remaining features yet to be implemented, including atomic commits, versioning, coalesce on delete, a version of the filesystem written in the kernel, extents, locking, and extended attributes.

Reviewing the above list, Daniel decided he would work next on the coalesce on delete functionality, noting, "without this we can still delete files but we cannot recover file index blocks, only empty them, not so good." He added that at this time he was only going to focus on file truncation, "as soon as file truncation is added to the test mix we will see much more interesting behavior from the bitmap allocator, and we will discover some great ways to generate horrible fragmentation issues. Yummy." Daniel continued to point out that Tux3 is an open source project, and as such is always looking for others to participate, "whoever wants to carve their initials on what is starting to look like a for-real Linux filesystem, now is a great time to take a flyer. The code base is still tiny, builds fast, has lots of interactive feedback and is easy to work on. And you get to put your email address near the beginning of the list, which will naturally write its way into the history of open source. Probably."


From: Daniel Phillips
Subject: [Tux3] Time to truncate
Date: Sep 1, 6:24 pm 2008

The last burst of checkins has brought Tux3 to the point where it
undeniably acts like a filesystem: one can write files, go away,
come back later and read those files by name.  We can see some of the
hoped for attractiveness starting to emerge: Tux3 clearly does scale
from the very small to the very big at the same time.  We have our
Exabyte file with 4K blocksize and we can also create 64 Petabyte
files using 256 byte blocks.  How cool is that?  Not much chance for
internal fragmentation with 256 byte blocks.

   http://en.wikipedia.org/wiki/Fragmentation_(computer)

I wonder how well Tux3 will perform with 256 byte blocks.  Actually,
I don't really see big problems.  We should probably be working mostly
with tiny blocks in initial development, because little blocks generate
bushy trees, and bushy trees expose boundary conditions much faster
than big blocks.  Which is exactly what we need now if we want to get
stable early.  Plus it helps focus on allocation strategy: more little
blocks means more chances for things to go wrong by fragmentation.
Let's keep that issue front and center throughout the entire course of
Tux3 development.

(When we get closer to the kernel port I will switch to working mainly
with 512 byte blocks, which is the finest granularity supported by
Linux block devices at present.)

Anyway, the question naturally arises: what next?  There are so many
issues remaining, big and small.  Some of the big ones:

  * Atomic Commit - we want to know if Tux3's new forward logging
    strategy is as good as I have boasted, and indeed, does it work
    at all?  And what is the commit algorithm exactly?

  * Versioning - very nearly the entire reason for Tux3 to exist,
    although we are now beginning to see evidence that even as a
    conventional non-versioning filesystem, Tux3 is not without its
    attractions.

  * Coalesce on delete - without this we can still delete files but we
    cannot recover file index blocks, only empty them, not so good.

  * Kernel port - no kernel port, no proof of concept, no hordes of
    enthusiastic kernel developers flocking to help.  Imagining how
    well Tux3 will work in kernel is no substitute for actually being
    able to mount a Tux3 filesystem and take it for a spin.

  * Extents - without extents we are going to get hammered (pun
    intentional) by the competition in various benchmarks.  Not all
    benchmarks, but some important ones.  We cannot enter the
    benchmark sweepstakes until extents are working.  There is a big
    messy interaction between extents and versioning: versioned
    extents are much harder to do than versioned pointers because the
    number of boundary conditions in the algorithms explodes and
    new, very subtle block (de)allocation issues arise.  Not a
    weekend project, more like a couple of weeks.
    
  * Locking - often the biggest source of bugs and bottlenecks in a
    Linux kernel subsystem, not to mention the way it tends to force
    unnatural algorithmic modifications on the unfortunate coder, to
    get around roadblocks like not being able to sleep in spinlocks or
    interrupt context, situations that are encountered frequently in
    any kernel system having to do with storage.

  * Extended attributes.  Ok, so nobody exactly uses them.  Well,
    except Samba, which is very sensitive to xattr performance, and...
    security people, who love to play with weird and wonderful schemes
    for doing security more securely with the help of xattrs and acls.

So with all those big projects to do, and a host of little ones
besides, really, what next?

OK, I decided.  It's going to be coalesce on delete, just enough of
that to implement file truncation.  So it is now time to truncate.  As
soon as file truncation is added to the test mix we will see much more
interesting behavior from the bitmap allocator, and we will discover
some great ways to generate horrible fragmentation issues.  Yummy.

One approachable project that pretty well anybody on the list here
could jump into while I am going at truncation: leaf methods to check
integrity of the two kinds of btree leaves we now have in use, file
data index leaves (dleaf.c) and inode table leaf blocks (ileaf.c).
Whoever wants to carve their initials on what is starting to look like
a for-real Linux filesystem, now is a great time to take a flyer.  The
code base is still tiny, builds fast, has lots of interactive feedback
and is easy to work on.  And you get to put your email address near
the beginning of the list, which will naturally write its way into the
history of open source.  Probably.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3

New Filesystem Operations?

Lawrence D'Oliveiro (not verified)
on
September 8, 2008 - 12:35am

Is anybody looking at new kinds of filesystem operations? For example, inserting new blocks in the middle of a file, or deleting blocks from the middle of a file. That kind of thing could be useful for editing in a video-recorder type of application.

That's just one example.

That'd have to be at the VFS level, wouldn't it?

Mr_Z
on
September 8, 2008 - 9:38am

Those sorts of splicing operations would have to exist at the VFS level, wouldn't they? And, I suppose there would need to be a new userspace API for it.

I forget, but is there a way in the existing APIs to zero out a portion of an existing file so as to make a non-sparse segment sparse? That could get you most of the way there, since you could at least "sparse-ify" deleted sections, reclaiming the disk space, and use other metadata in user-space to indicate the logical order of file fragments. Unfortunately, the only way I know of currently to make a sparse file is to use fseek() to seek beyond the end of the current file.

Actually splicing at the filesystem level by dropping or reordering blocks seems like it would be very sensitive to a specific filesystem's block size, and it would require either filesystem-specific APIs, or new VFS support and teaching existing filesystems new tricks.

--
Program Intellivision and play Space Patrol!

New Filesystem Operations?

Daniel Phillips (not verified)
on
October 3, 2008 - 7:25am

I just added "support hole punch" to the Tux3 things to do list.

I just did it yesterday, it

bebamoron
on
October 30, 2008 - 7:59pm

I just did it yesterday, it worked like a charm

What about...

Anonymous (not verified)
on
September 9, 2008 - 7:24am

Why don't they just merge the benefits of all this shiny-new file systems into a unique, super fast, distributed and versioned filesystem? Like could be ext4_new = ext4_old + btrfs + tux3 + etc?

Otherwise more overhead in testing our apps on different filesystem :P

better yet

Anonymous (not verified)
on
September 9, 2008 - 12:30pm

why dont they just give up and use OS X. they will never match ZFS, and it is sad (but amusing at the same time) to see them try.

Uhhh...

Anonymous (not verified)
on
September 9, 2008 - 6:50pm

Because OS X is slow as molasses?

OS X

Anonymous (not verified)
on
September 10, 2008 - 6:09am

OS X is proprietary.
I rather run OpenSolaris then.

OS X

Anonymous (not verified)
on
September 10, 2008 - 9:50am

Hmm..can't say I see any OS X *OR* OpenSolaris embedded devices around, which would seem to indicate *NEITHER* are as insanely customizable as linux.

Extra filesystems mean extra choices and in turn extra roles linux can potentially fill.

Is the iPhone not OSX on ARM

Anonymous (not verified)
on
September 10, 2008 - 10:45am

Is the iPhone not OSX on ARM ?

Ooooh, they compiled BSD on

Anonymous (not verified)
on
August 20, 2009 - 3:30pm

Ooooh, they compiled BSD on ARM, oohhh...

Shoo, Apple fanbois. Begone pest!

FreeBSD supports ZFS

Anonymous (not verified)
on
September 10, 2008 - 10:45am

FreeBSD supports ZFS

But does it already do SMP?

Anonymous (not verified)
on
September 10, 2008 - 1:50pm

But does it already do SMP?

Yup, with fine grained

Anonymous (not verified)
on
September 10, 2008 - 4:54pm

Yup, with fine grained locking

Linux has fuse-zfs.

Anonymous (not verified)
on
September 12, 2008 - 1:33am

Linux has fuse-zfs.

..which is absolutely

Anonymous (not verified)
on
September 16, 2008 - 11:36am

..which is absolutely useless.

OK I take that back. Tested

Anonymous (not verified)
on
September 17, 2008 - 9:19am

OK I take that back. Tested newer version 0.5.0, and it works superb!

I wonder if the FUSE implementation also features atomic writes and always-valid-on-disk state of files.

Anyways, really looking forward to see Tux3 running, but given that Sun needed 5 years for ZFS, it might take a while.

FreeBSD doesn't support ZFS

Anonymous (not verified)
on
October 22, 2008 - 4:52pm

"FreeBSD supports ZFS" is great for gratuitous trolling, but the reality differs.

The ZFS code in FreeBSD is still marked experimental for a reason. It's definitely too unstable for production use.
I keep trying it, on 7-STABLE and on very recent -current, and I'm often experiencing odd behaviors (the system either locks up or becomes so slow that it is unuseable). I also had a corrupted filesystem, that was just unfixable. As soon as I entered a directory, the system rebooted.

Having some ZFS code doesn't mean that FreeBSD supports ZFS. Please advocate things you actually used in the real life.

By suggesting so I'm

Anonymous (not verified)
on
September 12, 2008 - 11:49am

By suggesting so I'm assuming that you have given up and are, in fact, using OS X. So that begs my question. Why in the hell are you posting on this forum then? Go off into the distance and enjoy your OS X love.....

The trolling is much better

Anonymous (not verified)
on
September 12, 2008 - 7:09pm

The trolling is much better here ;)

Ah, but OS X does have UNIX

Mr_Z
on
September 12, 2008 - 7:33pm

There *is* a UNIX- derived kernel under OS X, (specifically BSD UNIX-derived). This site isn't specific to Linux.

--
Program Intellivision and play Space Patrol!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.