From: Matthew Dillon <dillon@...>
Subject: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN
Date: Jun 13, 10:01 pm 2008
Commit 55 to the filesystem and utilities changes the media format
again. A new blockmap has been added as a stop-gap while I
research storage model issues more. Please note:
* You must recompile the filesystem AND its utilities (hammer and
newfs_hammer).
* You must re-newfs_hammer your test filesystems.
This commit also fixes a panic and *might* also have fixed the reported
machine lockup. It does not fix the memory exhaustion issue yet.
There is a known issue (that's been around for a while) where the
HAMMER flusher threads can become cpu bound, creating large noticeable
pauses in machine operation. That is related to B-Tree overheads and
has not yet been dealt with.
I have made progress on the blogbench performance drop issue when run
on the same directory over and over again. Blogbench does its best
to really mess up B-Trees :-). I have narrowed the issue down to
directory/namespace lookups in the face of massive directory rewriting.
blogbench does that in spades by creating .tmp files and renaming them
over the originals.
Unfortunately, the only way to really fix it is to introduce true
locality of reference on disk on a file-by-file and directory-by-
directory basis, kinda like how UFS does with cylinder groups.
In HAMMER's case this means adding a field to the blockmap's
layer2 structure (which represents an 8MB chunk of disk space) to
allow incremental appends anywhere on the disk. This is probably what
I will be working on for the rest of this month.
-Matt
Matthew Dillon
From: strangepics <strangepics@...>
Subject: Re: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN
Date: Jun 15, 2:22 am 2008
Matthew Dillon wrote:
> on the same directory over and over again. Blogbench does its best
> to really mess up B-Trees :-). I have narrowed the issue down to
That's probably why they don't use B-trees for ReiserFS.
What about crit-bit? Other trees?
From: Matthew Dillon <dillon@...>
Subject: Re: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN
Date: Jun 15, 10:28 pm 2008
:
:Matthew Dillon wrote:
:> on the same directory over and over again. Blogbench does its best
:> to really mess up B-Trees :-). I have narrowed the issue down to
:
:That's probably why they don't use B-trees for ReiserFS.
:
:What about crit-bit? Other trees?
I am not too familiar with Reiser so I can't really come to that
conclusion, but it doesn't seem likely that the reasons are similar.
HAMMER's issue insofar as the B-Tree goes is mainly due to its history
retention practices. If I mount with -o nohistory then the issue
becomes one of locality of reference due to HAMMER not immediately
reusing space freed by the rename-over that blogbench does.
After running the test over the weekend the culprit seems to pointing
more towards HAMMER's low level storage allocation model and away
from the B-Tree per-say. I was already planning on making some major
changes there so we'll see what pans out.
-Matt
Matthew Dillon
From: Dan M <strangepics@...>
Subject: Re: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN
Date: Jun 16, 4:18 pm 2008
On Sun, Jun 15, 2008 at 10:28 PM, Matthew Dillon
wrote:
> I am not too familiar with Reiser so I can't really come to that
> conclusion, but it doesn't seem likely that the reasons are similar.
They used B+ in earlier versions. In Reiser4 they decided to use
http://en.wikipedia.org/wiki/B*-tree
and
http://en.wikipedia.org/wiki/Dancing_trees
--
Dan
From: Matthew Dillon <dillon@...>
Subject: Re: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN
Date: Jun 16, 5:37 pm 2008
: wrote:
:> I am not too familiar with Reiser so I can't really come to that
:> conclusion, but it doesn't seem likely that the reasons are similar.
:
:They used B+ in earlier versions. In Reiser4 they decided to use
:
:http://en.wikipedia.org/wiki/B*-tree
:
:and
:
:http://en.wikipedia.org/wiki/Dancing_trees
:
:--
:Dan
I can never keep all the variations straight, I need a Wiki reference
every time to get the name right :-).
These are all minor variations of B+Trees. The B* variation keeps
nodes 2/3rds full instead of half full. It is an optimization which
is intended to maintain higher fill ratios to make better use of system
caches.
A Dancing tree ... hehehehe. Looks like Hans invented that one. I'm
not sure it even counts as a variation of a B+Tree. It doesn't apply
to HAMMER at all because HAMMER makes no modifications to the B-Tree
whatsoever on the front-end. When you create, delete, rename, write,
etc... when you do those operations HAMMER caches them in a
virtualization layer in memory and doesn't make any modifications to
its on-media data structures (or their in-memory representations) at
all until the meta-data is synced to disk.
HAMMER doesn't have to make real-time on-media modifications for any
meta-data change for ANY operation. Not a single one. Not rename, not
hard or soft links, not truncations, nothing.
HAMMER uses a modified B+Tree for its on-disk representation, which is
a B-Tree with only keys at internal nodes and only records at the leafs.
This was done to reduce structural bloat, allow for a leaf->leaf
linking optimization in the future, and for other reasons. The Wiki
definition isn't quite right. The leaf->leaf links are optional (and
HAMMER doesn't implement them, though I've left room to do so). It's
still a B+Tree if you don't have them.
HAMMER's custom modification is to formalize both a left AND a right
bounding element. So a HAMMER B+Tree looks like this:
B B B B B B <---- internal node (e.g. 6 elements w/ 5 leaf ptrs)
L L L L L <---- leaf nodes
Whereas a normal B+Tree looks like this (note the missing right bound):
B B B B B <---- internal node
L L L L L <---- leaf nodes
Note that HAMMER uses a radix of 64, so e.g. 64 internal elements but one
is a right-bound so only 63 pointers to the next level. The leafs then
have 64 elements. So HAMMER's B+Tree recursion is 63x63x63....x64.
HAMMER's internal nodes have a left and right bounding element. A
standard B+Tree only has a left bounding element. By adding a right
bounding element HAMMER can cache pointers into its B+Tree and 'pick up'
searches, insertions, and deletions relative to the cached pointers instead
of having to start at the root of the tree. More importantly, it can
pickup searches, insertions, and deletions at internal nodes, not just
leaf nodes. So I can cache a proximity pointer and if I do a good job
I never have to traverse the B+Tree above that point.
Without the right bound a search cannot be picked up from mid-tree
without potentially going down the wrong branch and then having to
backtrack.
--
Many of the textbook optimizations listed on e.g. Wiki have serious
issues... you can't make the blanket statement "method A is better
then method B". Balancing optimizations can cause massive issues with
deadlocks, for example, or a need to lock extra nodes, or a need to
traverse a greater portion of the tree. In fact, for that reason, a
lot of people don't even try to balance B-Tree fill ratios in the
critical path, and neither does HAMMER. Putting in better balancing and
fill-ratio algorithms are something I'll probably do in the pruner or
reblocker at some point, but never in the critical path.
I've left the implementation open to additional optimizations, such
as the leaf-linking optimization and better balancing optimizations
such as you get with B*. There are plenty of others as well, and in fact
WHEN you do the optimizations is often more important then WHAT
optimizations you choose to do.
HAMMER does a bunch of other stuff as well. But HAMMER's main
optimization is the addition of the right bounding element and its
ability to cache pointers into the middle of the B+Tree to short-cut
searches.
-Matt
Matthew Dillon