Re: HAMMER filesystem update - design document

Previous thread: HAMMER filesystem update by Matthew Dillon on Wednesday, October 10, 2007 - 2:41 pm. (1 message)

Next thread: Re: HAMMER filesystem update - design document by Matthew Dillon on Wednesday, October 10, 2007 - 5:25 pm. (3 messages)
To: <kernel@...>
Date: Wednesday, October 10, 2007 - 3:33 pm

Ok, here's the final design document that I am now implementing.
Again, I expect most or all of these features to be ready and the
filesystem to be beta-quality by the December release.

Hammer Filesystem

(I) General Storage Abstraction

HAMMER uses a basic 16K filesystem buffer for all I/O. Buffers are
collected into clusters, cluster are collected into volumes, and a
single HAMMER filesystem may span multiple volumes.

HAMMER maintains a small hinted radix tree for block management in
each layer. A small radix tree in the volume header manages cluster
allocations within a volume, one in the cluster header manages buffer
allocations within a cluster, and most buffers (pure data buffers
excepted) will embed a small tree to manage item allocations within
the buffer.

Volumes are typically specified as disk partitions, with one volume
designated as the root volume containing the root cluster. The root
cluster does not need to be contained in volume 0 nor does it have to
be located at any particular offset.

Data can be migrated on a cluster-by-cluster or volume-by-volume basis
and any given volume may be expanded or contracted while the filesystem
is live. Whole volumes can be added and (with appropriate data
migration) removed.

HAMMER's storage management limits it to 32768 volumes, 32768 clusters
per volume, and 32768 16K filesystem buffers per cluster. A volume
is thus limited to 16TB and a HAMMER filesystem as a whole is limited
to 524288TB. HAMMER's on-disk structures are designed to allow future
expansion through expansion of these limits. In particular, the volume
id is intended to be expanded to a full 32 bits in the future and using
a larger buffer size will also greatly increase the cluster and volume
size limitations by increasing the number of elements the buffer-
restricted radix trees can manage.

HAMMER breaks all of its ...

To: <kernel@...>
Date: Saturday, October 13, 2007 - 1:24 pm

Hi,

I hope this question has not been implicitly answered before, but how
does Hammer handle quotas? Filesystems like XFS and ZFS maintain quota
information internally so that a quotacheck after a system crash does
not take ages. It seems to me that Hammer could manage quotas as a
part of its cluster allocation strategy. Is this the case?

TIA,
RIggs

To: <kernel@...>
Date: Thursday, October 11, 2007 - 3:22 am

Wow, this seems pretty good.

What about data corruption issues ?

Have you thought about implementing some sort of checksumming mechanism ?

We cannot assume hardware to be absolutely reliable. There may be some
silent corruption going on the disk or network layers, etc...

More on this in this article:
http://kerneltrap.org/Linux/Data_Errors_During_Drive_Communication

--
Francois Tigeot

To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:38 am

Quoting from Matt's announcement:

" All information in a HAMMER filesystem is CRCd to detect corruption."

'All'

So the question - if there is one - is 'how good' that check is.

Otherwise, not the fs' job.

It *must* presume a 'generally reliable' environment beyond a certain point.

Error prevention, detection, (possible) correction, and friends more properly
should exist in the storage hardware, I/O, and link layers.

As they do. Or do not.

.. just as the article you cited points out.... hardware and driver selection
issues, or even suboptimal silicon.

Bill

To: <kernel@...>
Date: Thursday, October 11, 2007 - 3:55 am

According to Matt's design document:

"All information in a HAMMER filesystem is CRCd to detect
corruption."

Regards,

Michael

To: <kernel@...>
Date: Thursday, October 11, 2007 - 2:04 am

Any specific reason not to go with a B+-Tree or B#-Tree which have shown
to have advantageous effects?

Also, what, if any, will be the locking policy for multiple I/O threads
accessing a single HAMMER filesystem? Shared/exclusive mutexes on the
cluster level?

(Whomever invented early mornings does not deserve brownie points),
--
Thomas E. Spanjaard
tgen@netphreax.net

To: <kernel@...>
Date: Wednesday, October 10, 2007 - 7:12 pm

Interesting. What exactly are those database files used for? Is a
database file attached to each file to store ACLs, for example? Or can
it be used like btree(3)? Do they have their own namespace in the
filesystem?

Wow! I a am really looking forward to try out HAMMER!!!

Regards,

Michael

To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:18 am

Because HAMMER uses a B-Tree (maybe a B+Tree the more I look at it)..
in anycase, because HAMMER uses a B-Tree all lookups are basically
key searches, even when looking up an offset in a file. Since B-Tree
elements specify records which can reference variable-length data,
there really is very little difference between a database record
indexed with a key and regular file data indexed with an offset.

Records are typed so any given filesystem object can contain multiple
key spaces. One space will hold ACLs, one will be for regular file
offsets, and there's nothing preventing us from having a key space
directly accessible by userland.

A HAMMER-aware database would be able to store its records using the
key space directly. It opens up some intriguing possibilities.

-Matt
Matthew Dillon
<dillon@backplane.com>

To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:49 am

.. including implementing a ZFS-like DB/fs crossbreed *ATOP* HAMMER.

Or a Venti workalike (surpassed already on feature-set, but not storage
efficiency AFAICS).

IF one really wanted to do either badly enough. And had a need.

Too complicated for my taste, but PostgreSQL data store might be another matter
entirely.

Bill

To: <kernel@...>
Date: Wednesday, October 10, 2007 - 4:30 pm

*snip*

Matt,

Awesome!

Tells me: "ZFS, bend over, grab your ankles and kiss your an(atomy) 'Goodbye'"

From the amount of work that has HAD to go into this, it also tells me you are:

A) probably single, or soon will be and

B) don't sleep much anyway!

;-)

Looking forward to a 'test drive'...

Bill Hacker

Previous thread: HAMMER filesystem update by Matthew Dillon on Wednesday, October 10, 2007 - 2:41 pm. (1 message)

Next thread: Re: HAMMER filesystem update - design document by Matthew Dillon on Wednesday, October 10, 2007 - 5:25 pm. (3 messages)