"Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based on its use of flexible B-Trees," suggested a reader on the DragonFlyBSD Kernel mailing list, pointing to the difficulty of detecting a failed node and then of rebuilding the B-Tree. HAMMER filesystem designer and author, Matt Dillon, explained, "if a cluster needs to be recovered, HAMMER will simply throw away the B-Tree and regenerate it from scratch using the cluster's record list. This way all B-Tree I/O operations can be asynchronous and do not have to be flushed on fsync. At the same time HAMMER will remove any records whose creation transaction id's are too large (i.e. not synchronized with the cluster header), and will zero out the delete transaction id for any records whos deletion transaction id's are too large." Matt then acknowledged:
"The real performance issue for HAMMER is going to be B-Tree insertions and rebalancing across clusters. I think most of the issues can be resolved with appropriate heuristics and by a background process to slowly rebalance clusters. This will require a lot of work, though, and only minimal rebalancing will be in [the end-of-the-year] release."
From: Dmitri Nikulin <dnikulin@...>
Subject: Re: HAMMER Update
[0]Date: Nov 5, 1:26 am 2007
On 11/2/07, Matthew Dillon <dillon@apollo.backplane.com> wrote:
> My B-Tree implementation also allows HAMMER to cache B-Tree nodes
> and start lookups from any internal node rather then having to start at
> the root. You can do this in a standard B-Tree too but it isn't
> necessarily efficient for certain boundary cases. In my implementation
> I store boundaries for the left AND right side which means a search
> starting in the middle of the tree knows exactly where to go and will
> never have to retrace its steps.
Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based
on its use of flexible B-Trees. For instance, the difficulty of
correctly rebuilding the tree if a node is damaged, and correctly
detecting if a ReiserFS system is hosted on a file in another
(supposedly damaged) ReiserFS system, are noted as big problems that
are supposedly solved for Reiser4. Are there likely to be similar
issues in Hammer for the time being, or have you already planned far
ahead for these cases with extra information in the nodes?
--
Dmitri Nikulin
Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
From: Matthew Dillon <dillon@...>
Subject: Re: HAMMER Update
[0]Date: Nov 5, 4:27 pm 2007
:Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based
:on its use of flexible B-Trees. For instance, the difficulty of
:correctly rebuilding the tree if a node is damaged, and correctly
:detecting if a ReiserFS system is hosted on a file in another
:(supposedly damaged) ReiserFS system, are noted as big problems that
:are supposedly solved for Reiser4. Are there likely to be similar
:issues in Hammer for the time being, or have you already planned far
:ahead for these cases with extra information in the nodes?
:
:--
:Dmitri Nikulin
HAMMER has an independant B-Tree for each cluster. The clusters are
then glued together (leaf -> new cluster root) to form a unified B-Tree.
Except for lookups and no-free-space cases, B-Tree operations are
local to a cluster.
A cluster in HAMMER is around 64MB and can have up to 4096 records.
If a cluster needs to be recovered, HAMMER will simply throw away
the B-Tree and regenerate it from scratch using the cluster's
record list. This way all B-Tree I/O operations can be asynchronous
and do not have to be flushed on fsync.
At the same time HAMMER will remove any records whos creation
transaction id's are too large (i.e. not synchronized with the cluster
header), and will zero out the delete transaction id for any records
whos deletion transaction id's are too large.
Essentially recovery is handled by performing a rollback on a
cluster-by-cluster basis.
HAMMER doesn't bother to scan clusters at mount time (that would be
expensive!). Instead it will recover a cluster on the fly when the
cluster is first accessed and HAMMER notices that it is marked 'open'
when it shouldn't be.
--
The real performance issue for HAMMER is going to be B-Tree insertions
and rebalancing across clusters. I think most of the issues can be
resolved with appropriate heuristics and by a backup ground process to
slowly rebalance clusters. This will require a lot of work, though,
and only minimal rebalancing will be in for the release.
-Matt
From: Matthew Dillon <dillon@...>
Subject: Re: HAMMER Update
[0]Date: Nov 5, 6:05 pm 2007
: resolved with appropriate heuristics and by a backup ground process to
: slowly rebalance clusters. This will require a lot of work, though,
: and only minimal rebalancing will be in for the release.
Bleh, I meant to say 'background process' not 'backup ground process'.
-Matt
Matthew Dillon
<dillon@backplane.com>
Related links:
- Archive of above thread [0]
- Archive of above thread [0]
- Archive of above thread [0]