login
Header Space

 
 

Re: HAMMER filesystem update - design document

Previous thread: HAMMER filesystem update by Matthew Dillon on Wednesday, October 10, 2007 - 2:41 pm. (1 message)

Next thread: Re: HAMMER filesystem update - design document by Matthew Dillon on Wednesday, October 10, 2007 - 5:25 pm. (3 messages)
To: <kernel@...>
Date: Wednesday, October 10, 2007 - 3:33 pm

Ok, here's the final design document that I am now implementing.
    Again, I expect most or all of these features to be ready and the
    filesystem to be beta-quality by the December release.


			       Hammer Filesystem

(I) General Storage Abstraction

    HAMMER uses a basic 16K filesystem buffer for all I/O.  Buffers are
    collected into clusters, cluster are collected into volumes, and a
    single HAMMER filesystem may span multiple volumes.

    HAMMER maintains a small hinted radix tree for block management in
    each layer.  A small radix tree in the volume header manages cluster
    allocations within a volume, one in the cluster header manages buffer
    allocations within a cluster, and most buffers (pure data buffers
    excepted) will embed a small tree to manage item allocations within
    the buffer.

    Volumes are typically specified as disk partitions, with one volume
    designated as the root volume containing the root cluster.  The root
    cluster does not need to be contained in volume 0 nor does it have to
    be located at any particular offset.

    Data can be migrated on a cluster-by-cluster or volume-by-volume basis
    and any given volume may be expanded or contracted while the filesystem
    is live.   Whole volumes can be added and (with appropriate data
    migration) removed.

    HAMMER's storage management limits it to 32768 volumes, 32768 clusters
    per volume, and 32768 16K filesystem buffers per cluster.   A volume
    is thus limited to 16TB and a HAMMER filesystem as a whole is limited
    to 524288TB.  HAMMER's on-disk structures are designed to allow future
    expansion through expansion of these limits.  In particular, the volume
    id is intended to be expanded to a full 32 bits in the future and using
    a larger buffer size will also greatly increase the cluster and volume
    size limitations by increasing the number of elements the buffer-
    restricted radix trees can manage.

    HAMMER breaks all of its ...
To: <kernel@...>
Date: Saturday, October 13, 2007 - 1:24 pm

Hi,

I hope this question has not been implicitly answered before, but how
does Hammer handle quotas? Filesystems like XFS and ZFS maintain quota
information internally so that a quotacheck after a system crash does
not take ages. It seems to me that Hammer could manage quotas as a
part of its cluster allocation strategy. Is this the case?

TIA,
RIggs
To: <kernel@...>
Date: Thursday, October 11, 2007 - 3:22 am

Wow, this seems pretty good.

What about data corruption issues ?

Have you thought about implementing some sort of checksumming mechanism ?

We cannot assume hardware to be absolutely reliable. There may be some
silent corruption going on the disk or network layers, etc...

More on this in this article:
http://kerneltrap.org/Linux/Data_Errors_During_Drive_Communication

-- 
Francois Tigeot
To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:38 am

Quoting from Matt's announcement:

"    All information in a HAMMER filesystem is CRCd to detect corruption."

'All'

So the question - if there is one - is 'how good' that check is.

Otherwise, not the fs' job.

It *must* presume a 'generally reliable' environment beyond a certain point.

Error prevention, detection, (possible) correction, and friends more properly 
should exist in the storage hardware, I/O, and link layers.

As they do. Or do not.

.. just as the article you cited points out.... hardware and driver selection 
issues, or even suboptimal silicon.


Bill
To: <kernel@...>
Date: Thursday, October 11, 2007 - 3:55 am

According to Matt's design document:

     "All information in a HAMMER filesystem is CRCd to detect
      corruption."

Regards,

   Michael
To: <kernel@...>
Date: Thursday, October 11, 2007 - 2:04 am

Any specific reason not to go with a B+-Tree or B#-Tree which have shown 
to have advantageous effects?

Also, what, if any, will be the locking policy for multiple I/O threads 
accessing a single HAMMER filesystem? Shared/exclusive mutexes on the 
cluster level?

(Whomever invented early mornings does not deserve brownie points),
-- 
         Thomas E. Spanjaard
         tgen@netphreax.net
To: <kernel@...>
Date: Wednesday, October 10, 2007 - 7:12 pm

Interesting. What exactly are those database files used for? Is a 
database file attached to each file to store ACLs, for example? Or can 
it be used like btree(3)? Do they have their own namespace in the 
filesystem?

Wow! I a am really looking forward to try out HAMMER!!!

Regards,

   Michael
To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:18 am

Because HAMMER uses a B-Tree (maybe a B+Tree the more I look at it)..
     in anycase, because HAMMER uses a B-Tree all lookups are basically
     key searches, even when looking up an offset in a file.  Since B-Tree
     elements specify records which can reference variable-length data,
     there really is very little difference between a database record
     indexed with a key and regular file data indexed with an offset.

     Records are typed so any given filesystem object can contain multiple
     key spaces.  One space will hold ACLs, one will be for regular file
     offsets, and there's nothing preventing us from having a key space
     directly accessible by userland.

     A HAMMER-aware database would be able to store its records using the
     key space directly.  It opens up some intriguing possibilities.

					    -Matt
					    Matthew Dillon
					    &lt;dillon@backplane.com&gt;
To: <kernel@...>
Date: Thursday, October 11, 2007 - 4:49 am

.. including implementing a ZFS-like DB/fs crossbreed *ATOP* HAMMER.

Or a Venti workalike (surpassed already on feature-set, but not storage 
efficiency AFAICS).

IF one really wanted to do either badly enough.  And had a need.

Too complicated for my taste, but PostgreSQL data store might be another matter 
entirely.

Bill
To: <kernel@...>
Date: Wednesday, October 10, 2007 - 4:30 pm

*snip*

Matt,

Awesome!

Tells me: "ZFS, bend over, grab your ankles and kiss your an(atomy) 'Goodbye'"

 From the amount of work that has HAD to go into this, it also tells me you are:

A) probably single, or soon will be and

B) don't sleep much anyway!

;-)

Looking forward to a 'test drive'...

Bill Hacker
Previous thread: HAMMER filesystem update by Matthew Dillon on Wednesday, October 10, 2007 - 2:41 pm. (1 message)

Next thread: Re: HAMMER filesystem update - design document by Matthew Dillon on Wednesday, October 10, 2007 - 5:25 pm. (3 messages)
speck-geostationary