Hi all,
Since everybody seems to be having fun building new filesystems these
days, I thought I should join the party. Tux3 is the spiritual and
moral successor of Tux2, the most famous filesystem that was never
released.[1] In the ten years since Tux2 was prototyped on Linux
2.2.13 we have all learned a thing or two about filesystem design. Tux3
is a write anywhere, atomic commit, btree based versioning filesystem.
As part of this work, the venerable HTree design used in Ext3 and
Lustre is getting a rev to better support NFS and possibly become more
efficient.
The main purpose of Tux3 is to embody my new ideas on storage data
versioning. The secondary goal is to provide a more efficient
snapshotting and replication method for the Zumastor NAS project, and a
tertiary goal is to be better than ZFS.
Tux3 is big endian as the Great Penguin intended.
General Description
In broad outline, Tux3 is a conventional node/file/directory design
with wrinkles. A Tux3 inode table is a btree with versioned attributes
at the leaves. A file is an inode attribute that is a btree with
versioned extents at the leaves. Directory indexes are mapped into
directory file blocks as with HTree. Free space is mapped by a btree
with extents at the leaves.
The interesting part is the way inode attributes and file extents are
versioned. Unlike the currently fashionable recursive copy on write
designs with one tree root per version[2], Tux3 stores all its
versioning information in the leaves of btrees using the versioned
pointer algorithms described in detail here:
http://lwn.net/Articles/288896/
This method promises a significant shrinkage of metadata for heavily
versioned filesystems as compared to ZFS and Btrfs. The distinction
between Tux3 style versioning and WAFL style versioning used by ZFS and
Btrfs is analogous to the distinction between delta and weave encoding
for version control systems. In fact Tux3's pointer versioning
algorithms were derived from a binary weave ...