login
Header Space

 
 

HAMMER Performance

October 14, 2007 - 6:07am
Submitted by Jeremy on October 14, 2007 - 6:07am.
DragonFlyBSD

"I've never looked at the Reiser code though the comments I get from friends who use it are on the order of 'extremely reliable but not the fastest filesystem in the world'," Matt Dillon explained when asked to compare his new clustering HAMMER filesystem with ReiserFS, both of which utilize BTrees to organize objects and records. He continued, "I don't expect HAMMER to be slow. A B-Tree typically uses a fairly small radix in the 8-64 range (HAMMER uses 8 for now). A standard indirect block methodology typically uses a much larger radix, such as 512, but is only able to organize information in a very restricted, linear way." He continued to describe numerous plans he has for optimizing performance, "my expectation is that this will lead to a fairly fast filesystem. We will know in about a month :-)"

Among the optimizations planned, Matt explained, "the main thing you want to do is to issue large I/Os which cover multiple B-Tree nodes and then arrange the physical layout of the B-Tree such that a linear I/O will cover the most likely path(s), thus reducing the actual number of physical I/O's needed." He noted, "HAMMER will also be able to issue 100% asynchronous I/Os for all B-Tree operations, because it doesn't need an intact B-Tree for recovery of the filesystem." He went on to describe another potential optimization allowed by the filesystem's design, "HAMMER is designed to allow clusters-by-cluster reoptimization of the storage layout. Anything that isn't optimally layed-out at the time it was created can be re-layed-out at some later time, e.g. with a continuously running background process or a nightly cron job or something of that ilk. This will allow HAMMER to choose to use an expedient layout instead of an optimal one in its critical path and then 'fix' the layout later on to make re-accesses optimal."


From: Chris Turner <c.turner@...>
Subject: Re: HAMMER filesystem update - design document
Date: Oct 12, 6:06 pm 2007

Matthew Dillon wrote:
> 
>     It will be cluster-by-cluster to begin with.  I don't expect it to cause
>     any issues, the BxTree in each cluster will be fairly compact and well
>     cached and, most importantly, nearly all write I/O can be asynchronous
>     so locks simply will not be held all that long.
> 
>     Eventually it will be possible to use inherent buffer cache locks to
>     lock the BxTree operations but its a little dicey to try to do
>     that level of fine-grained locking by default due to the allocation
>     model.
> 

Anyone up on ReiserFS ?

(but still capable of a 'clean room' description :)

As I recall, according to their docs it seems to have been one of the
first to use BTrees in the general sense for internal structuring ..

also as I recall, there were some performance problems in specific areas
of requiring extra CPU for basic IO (due to having to compute tree
operations rather than do simple pointer manipulations) and also
concurrent IO (due to the need for complex tree locking types of things,
possibly compounded by the extra cpu time)

this kind of a thing is more for replication than 100% raw speed, but
in any case.. just some topics for discussion I suppose ..

I still need to read the more detailed commits.

looking forward to it.

- Chris

From: Matthew Dillon <dillon@...> Subject: Re: HAMMER filesystem update - design document Date: Oct 13, 8:59 pm 2007 :Anyone up on ReiserFS ? : :(but still capable of a 'clean room' description :) :... :As I recall, according to their docs it seems to have been one of the :first to use BTrees in the general sense for internal structuring .. : :also as I recall, there were some performance problems in specific areas :of requiring extra CPU for basic IO (due to having to compute tree :operations rather than do simple pointer manipulations) and also :concurrent IO (due to the need for complex tree locking types of things, :possibly compounded by the extra cpu time) : :this kind of a thing is more for replication than 100% raw speed, but :in any case.. just some topics for discussion I suppose .. : :I still need to read the more detailed commits. : :looking forward to it. : :- Chris I've never looked at the Reiser code though the comments I get from friends who use it are on the order of 'extremely reliable but not the fastest filesystem in the world'. I don't expect HAMMER to be slow. A B-Tree typically uses a fairly small radix in the 8-64 range (HAMMER uses 8 for now). A standard indirect block methodology typically uses a much larger radix, such as 512, but is only able to organize information in a very restricted, linear way. The are several tricks to making a B-Tree operate efficiently but the main thing you want to do is to issue large I/Os which cover multiple B-Tree nodes and then arrange the physical layout of the B-Tree such that a linear I/O will cover the most likely path(s), thus reducing the actual number of physical I/O's needed. Locality of reference is important. HAMMER will also be able to issue 100% asynchronous I/Os for all B-Tree operations, because it doesn't need an intact B-Tree for recovery of the filesystem. It can reconstruct the B-Tree for a cluster by scanning the records in the cluster and using a stored transaction id verses the transaction id in the records to determine what can be restored and what still may have had pending asynchronous I/O and thus cannot. HAMMER will implement one B-Tree per cluster (where a cluster is e.g. 64MB), and then hook clusters together at B-Tree leaf nodes (B-Tree leaf -> root of some other cluster). This means that HAMMER will be able to lock modifying operations cluster-by-cluster at the very least and hopefully greatly improve the amount of parallelism supported by the filesystem. HAMMER uses a index-record-data approach. Each cluster has three types of information in it: Indexes, records, and data. The index is a B-Tree and B-Tree nodes will replicate most of the contents of the records as well as supply a direct pointer to the related data. B-Tree nodes will be localized in typed filesystem buffers (that is, grouped with other B-Tree nodes), and B-Tree filesystem buffers will be intermixed with data filesystem buffers to a degree, so it should have extremely good caching characteristics. I tried to take into consideration how hard drives cache data (which is typically whole tracks to begin with) and incorporate that into the design. Finally, HAMMER is designed to allow clusters-by-cluster reoptimization of the storage layout. Anything that isn't optimally layed-out at the time it was created can be re-layed-out at some later time, e.g. with a continuously running background process or a nightly cron job or something of that ilk. This will allow HAMMER to choose to use an expedient layout instead of an optimal one in its critical path and then 'fix' the layout later on to make re-accesses optimal. I've left a ton of bytes free in the filesystem buffer headers for records and clusters for (future) usage-pattern tracking heuristics. The radix tree bitmap allocator, which has been committed so you can take a look at it if you want, is extremely sophisticated. It should be able to optimally allocate and free various types of information all the way from megabyte+ sized chunks down to a 64-byte boundary, in powers of 2. My expectation is that this will lead to a fairly fast filesystem. We will know in about a month :-) -Matt Matthew Dillon <dillon@backplane.com>


I've never looked at the

October 14, 2007 - 4:28pm
Anonymous (not verified)

I've never looked at the Reiser code though the comments I get from friends who use it are on the order of 'extremely reliable but not the fastest filesystem in the world'.

Either he typed that out wrong, or I suspect his friends are having a laugh.

I'm pretty sure that if you asked a hundred Linux users about ReiserFS, at least 99 of them would describe it as "extremely fast but not the most reliable filesystem in the World".

Now, I never had any reliability problems with ReiserFS when I was using it, and I suspect that most of the people who have done are either exaggerating or shot themselves in the foot due to inexperience. But whatever the truth of the matter, there is no way on Earth you can say that ReiserFS has a reputation of being very reliable but slow. Exactly the opposite in fact.

I've been using ReiserFS

October 14, 2007 - 6:38pm
Anonymous (not verified)

I've been using ReiserFS (version 3.6) on my current laptop, for the last few years. I've never had an issue with it.

I had an older laptop with Reiser 3.5, which managed to put a file in such a state, that I could not delete it. I had to use a CVS version of reiserfsck to fix the issue. That was about 5 years ago. From then on, it's been working flawlessly.

I'm quite happy with the performance and reliability of ReiserFS. It'll be interesting to see how version 4 shapes up.

A laptop isn't generally the

October 15, 2007 - 9:05am
Anonymous (not verified)

A laptop isn't generally the best test environment, a server that supports millions of users gives much better view of stability.

What makes ReiserFS

October 14, 2007 - 9:13pm
Anonymous (not verified)

What makes ReiserFS seemingly unreliable is that it can do bad things in BAD situations. However, if you have a normal stable environment, it's fine. The unusual organization of the data on a drive makes is highly susceptible to corruption when things like bad sectors occur. However, if your drives are in good shape, no problem.

ReiserFS suffered from Reiser attitude

October 14, 2007 - 11:37pm
Anonymous (not verified)

I used ReiserFS on a home server a few months after its introduction in Kernel and distro. I got several corruption (about 4/5) in less than three months.

I was hit by two problems: first was a race condition on ReiserFS open() that was identified only one year later, second was a few bad sectors on the disk.

What was disgusting, was Reiser behavior to issue reporting. He always denied there are problems in ReiserFS. It was always bad hardware for him.

He was partially right in my case. But seeing again corruption on new healthy hardware, I lost trust in ReiserFS (and Reiser too). I got tired of spending hours on fsck too.

I doubt I will try again ReiserFS on my hardware.

hello! it depends one one's

October 15, 2007 - 2:12am
Anonymous (not verified)

hello!
it depends one one's experience; i've been using reiserfs on a lot of systems (some with bad disks) and didn't see a single failure, corruption. At the same time i suffered some issues with ext3 (that was 4years ago).
After 2 years of distrust about it, i tried it again and i'm now a happy ext3 user (and reiserfs too)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary