HAMMER Balancing

Submitted by Jeremy
on January 13, 2008 - 2:09pm

"HAMMER is progressing very well with only 3-4 big-ticket items left to do," noted DragonFlyBSD creator Matthew Dillon regarding the ongoing development of his highly available clustering filesystem, "I'm really happy with the progress I'm making". He listed "on-the-fly recovery, balancing, refactoring of the spike code, and retention policy scan" as the remaining items needing to be implemented. "everything else is now working and reasonably stable. Of the remaining items only the spike coding has any real algorithmic complexity. Recovery and balancing just require brute force and the physical record deletion. The retention policy scan needs is already coded and working (just not the scan itself)."

Matt then defined what he meant by 'spike', "basically, when a cluster (a 64MB block of the disk) fills up a 'spike' needs to be driven into that cluster's B-Tree in order to expand it into a new cluster. The spike basically forwards a portion of the B-Tree's key space to a new cluster." He added, "refactoring the spike code means doing a better job selecting the amount of key space the spike can represent." He noted that balancing refers to the act of balancing the B-Tree representation of the filesystem, "we want to slowly move physical data records from higher level clusters to lower level clusters, eventually winding up with a situation where the higher level clusters contain only spikes and lower level clusters are mostly full." Matt continued:

"Keep in mind that HAMMER is designed to handle very large filesystems... in particular, filesystems that are big enough that you don't actually fill them up under normal operation, or at least do not quickly fill them up and then quickly clean them out. The balancing code is expected to need upwards of a day (or longer) to slowly iron out storage inefficiencies. If a situation comes up where faster action is needed, then faster action can be taken. I intend to take advantage of the fact that most filesystems (and, really, any large filesystem), takes quite a while to actually become full."


From: Matthew Dillon <dillon@...>
Subject: HAMMER update - 1/11/2008
Date: Jan 11, 6:09 pm 2008

HAMMER is progressing very well with only 3-4 big-ticket items left
to do:

* On-the-fly Recovery (<--- current WIP)
* Balancing
* Refactoring of the spike code
* Retention policy scan

Everything else is now working and reasonably stable. Of the remaining
items only the spike coding has any real algorithmic complexity. Recovery
and balancing just require brute force and the physical record deletion
the retention policy scan needs to do is already coded and working (just
not the scan itself).

I'm really happy with the progress I'm making on HAMMER.

--

I'm going to talk about the spike and balancing code for a moment. What
is a spike? Basically, when a cluster (a 64MB block of of the disk) fills
up a 'spike' needs to be driven into that cluster's B-Tree in order to
expand it into a new cluster. The spike basically forwards a portion of
the B-Tree's key space to a new cluster.

At the moment the spike takes the B-Tree cursor at a leaf after a failed
insertion (due to running out of space) and copies that whole leaf to a
new cluster, then replaces the reference in the internal node pointing
to that leaf with a pointer to the new cluster (the 'spike').

This is extremely inefficient because the key space covered by the spike
is usually fairly minimal, so the newly spiked cluster might not fill up
before another spike is needed.

Refactoring the spike code means doing a better job selecting the amount
of key space the spike can represent.

--

The balancing code is responsible for slowly cleaning up any
inefficiencies that build up in the filesystem. As its name
implies, the idea is to 'balance' the overall B-Tree representing
the filesystem. We want to slowly move physical data records
from higher level clusters to lower level clusters, eventually
winding up with a situation where the higher level clusters contain
only spikes and lower level clusters are mostly full.

--

The idea is for the spike code to use heuristics to try to select
an optimal key range to copy into the new spike to try to avoid
unnecessary copying later on, and if it doesn't make good choices
the balancing code will come along and finish the job.

Keep in mind that HAMMER is designed to handle very large filesystems...
in particular, filesystems that are big enough that you don't actually
fill them up under normal operation, or at least do not quickly fill
them up and then quickly clean them out. The balancing code is expected
to need upwards of a day (or longer) to slowly iron out storage
inefficiencies. If a situation comes up where faster action is needed,
then faster action can be taken. I intend to take advantage of the fact
that most filesystems (and, really, any large filesystem), takes quite
a while to actually become full.

-Matt

From: Matthew Dillon <dillon@...>
Subject: 2.0 Release Timeframe
Date: Jan 11, 5:36 pm 2008

I've decided to delay the 2.0 release as long as necessary to get
HAMMER into a fully working alpha or early beta state.

This means that we will probably not release until mid to late February.
I'm not going to call an actual date until the last big ticket items
are coded for HAMMER.

-Matt


Is separate optimization runs a good idea?

Anonymous (not verified)
on
January 13, 2008 - 4:52pm

I mean... reiser4 was supposed to come with a rebalancer tool too, and it basically didn't take off very well because of the administration overhead. Perhaps what I'd like to know is, is there precedent for a high-performance (or high-reliability, whatever) filesystem that both did batch optimization and was successful in the real world?

It just feels strongly like "we'll rebalance later" is mostly just a cop-out to justify tossing prototype designs into real development. Of course HAMMER is just as much an experimental design like DragonFly altogether, but still...

I'm not sure, but I think

Eriwik (not verified)
on
January 14, 2008 - 4:04am

I'm not sure, but I think that the rebalancer will be just a low-priority kernel-thread, but it might be a separate user-land process as well. Either way I do not see much administration required.

As for highperformance/high reliability I do not think that is the goal of HAMMER. The main goal is to have a file-system that can easily be replicated in a clustered environment, though of course performance and reliability are important too.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.