HAMMER Stabilizing

Submitted by Jeremy
on May 14, 2008 - 6:11am

Matthew Dillon sent out a series of updates about his developing HAMMER filesystem, noting that he is currently focusing on the reblocking and pruning code, tracking down a number of bugs resulting in B-Tree corruption. He also noted that previously HAMMER was comprised of three components: B-Tree nodes, records, and data. In his latest cleanups, he has entirely removed the record structure, "this will seriously improve the performance of directory and inode access." This change did require an on-media format change, "I know I have said this before, but there's a very good chance that no more on-media changes will be made after this point. The official freeze of the on-media format will not occur until the 2.0 release, however."

Matt added, "HAMMER is stable enough now that I am able to run it on my LAN backup box. I'm using it to test that the snapshots work as expected as well as to test the long term effects of reblocking and pruning." He then cautioned:

"Please note that HAMMER is not ready for production use yet, there is still the filesystem-full handling to implement and much more serious testing of the reblocking and pruning code is required, not to mention the crash recovery code. I expect to find a few more bugs, but I'm really happy with the results so far."


From: Matthew Dillon
Subject: HAMMER update 12-May-2008
Date: May 12, 10:01 am 2008

I'm holding off the filesystem-full handling work for another week
    and instead I am going to focus on the reblocking and pruning code.
    There are still numerous bugs in the reblocking and pruning code
    that are resulting in a small amount of corruption of the B-Tree.

    I am also going to do one more major change to the on-media format.
    As I test the reblocking and pruning code more and more, and also 
    test HAMMER's performance, it has become apparent that the record
    abstraction is creating a bigger problem then it is solving.

    HAMMER is broken down into three major components:  B-Tree nodes, records,
    and data.  B-Tree nodes reference both records and data and also 
    duplicate a big chunk of the information found in records.  In fact,
    the ONLY information in a record that is not found in a B-Tree node
    exists for inode records and directory entries, and only a few fields.

    What I am going to do is move the remaining information found in the
    record structure into the data, and get rid of the record structure
    entirely so HAMMER only has B-Tree nodes and data.  This will seriously
    improve the performance of directory and inode access.

    These changes are actually fairly minor in the larger scheme of things.
    The records are barely accessed as it stands now, so removing them will
    only take a day.

						-Matt

From: Matthew Dillon
Subject: HEADS UP - HAMMER on-media format changed 12-May-2008
Date: May 12, 2:33 pm 2008

For those people testing HAMMER, the HAMMER on-media format has
    changed so you will have to newfs any HAMMER filesystems.

    I know I have said this before, but there's a very good chance that no
    more on-media changes will be made after this point.  The official
    freeze of the on-media format will not occur until the 2.0 release,
    however.

    The testing of the reblocking and pruning code continues.  There
    are still a handful of bugs related to parallel operations while
    reblocking and pruning which I expect to be worked out this week.

						-Matt

From: Matthew Dillon
Subject: Backup statistics - using HAMMER on my LAN backup box
Date: May 11, 11:13 am 2008

HAMMER is stable enough now that I am able to run it on my
    LAN backup box.  I'm using it to test that the snapshots work
    as expected as well as to test the long term effects of reblocking
    and pruning.  The LAN backup box NFS mounts all the other boxes primary
    partitions and uses that to create a daily snapshots from 5 machines
    (apollo, crater, leaf, pkgbox, and my office workstation), covering
    around 90G of backed-up data.  The box mirrors the latest daily
    snapshot off-site once a week so I can afford to lose the data if
    I hit a bug.

    With UFS I had to use the hardlink trick (w/ cpdup) to generate backups.
    It took 4-6 hours every day for the backup box to create the snapshots,
    and I couldn't use more then half the 700G of backup space because
    using more resulted in having too many inodes (> 40 million) for UFS's
    fsck to be able to fsck without running out of memory.

STARTING MIRRORS Mon Mar 31 01:15:00 PDT 2008 level 2
DONE MIRRORING   Mon Mar 31 05:01:51 PDT 2008		~4 hrs
STARTING MIRRORS Wed Apr  2 01:15:00 PDT 2008 level 2
DONE MIRRORING   Wed Apr  2 07:02:00 PDT 2008		~6 hrs
STARTING MIRRORS Fri Apr  4 01:15:00 PDT 2008 level 2
DONE MIRRORING   Fri Apr  4 07:32:33 PDT 2008		~6 hrs
STARTING MIRRORS Sat Apr  5 01:15:00 PDT 2008 level 2
DONE MIRRORING   Sat Apr  5 05:09:13 PDT 2008		~4 hrs

    With HAMMER I don't have to use the hardlink trick.  I can just
    cpdup straight out and then create a @@ softlink to the snapshot.
    It takes less then an hour to do a daily backup that way.

STARTING MIRRORS Tue May  6 01:15:01 PDT 2008 level 2
DONE MIRRORING   Tue May  6 02:11:17 PDT 2008		~56 min
STARTING MIRRORS Sat May 10 01:15:00 PDT 2008 level 2
DONE MIRRORING   Sat May 10 02:17:36 PDT 2008		~62 min
STARTING MIRRORS Sun May 11 01:15:01 PDT 2008 level 2
DONE MIRRORING   Sun May 11 02:09:02 PDT 2008		~54 min

    So far the integrity of the snapshots is good.  I am doing a
    tar cf - <softlink>/. | md5 on each fixed snapshot and will check
    to see if the value changes over time.  I already see that I might
    want to create a mount option to update mtime as a record update
    instead of as an in-place update, to guarantee that it does not
    change from the point of view of a snapshot.  Being able to
    integrity-check a snapshot will likely become an important aspect of
    the filesystem.

    I am seeing a certain degree of fragmentation, particularly when
    listing directories.  It will be interesting to see what kind of
    effect reblocking has on that.

    Please note that HAMMER is not ready for production use yet, there
    is still the filesystem-full handling to implement and much more serious
    testing of the reblocking and pruning code is required, not to mention
    the crash recovery code.  I expect to find a few more bugs, but I'm
    really happy with the results so far.

						-Matt

From: Matthew Dillon
Subject: Blogbench results for HAMMER
Date: May 10, 2:21 pm 2008

I ran blockbench on a HAMMER partition and on a UFS partition and
    got some rather interesting results.

    I fully expected HAMMER's write performance to be bad compared to UFS,
    because HAMMER is still double-buffering its data.  Indeed, as the
    test began UFS seemed to be outdoing HAMMER.  But as the number of files
    grew and the kernel started to have to recycle vnodes and buffers, UFS's
    performance went completely to hell while HAMMER was able to maintain good
    throughput.  Ths basic blog benchmark creates, reads, and writes around
    20,000 files and goes for a lot of parallelism.

    I don't know why UFS's write performance went to hell.. it pretty much
    died completely after a very promising start.  But even ignoring that
    as some sort of implementation fluke the read performance numbers speak
    for themselves.

    I haven't run bonnie++ yet.  I think UFS still does very well vs HAMMER
    on saturated single-file I/O.

						-Matt


test29# blogbench -d /usr/obj/bench		(HAMMER MOUNT)

Frequency = 10 secs
Scratch dir = [/usr/obj/bench]
Direct I/O: disabled
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.

  Nb blogs   R articles    W articles    R pictures    W pictures    R comments    W comments
        17        90598           894         64890           945         44719          2268
        22        82772           362         63112           348         52860          1002
        32        75915           537         53145           484         49000          1482
        34        86616           188         58819           213         54302           542
        38        85506           179         60253           195         51557           474
        43        73030           441         51141           390         43208          1582
        45        72860           156         51320           226         40755           634
        48        63925           262         47448            87         37990           578
        53        65461           338         48538           370         37215          1199
        55        60703           189         44439            97         37096           487
        55        61601           111         44742           110         34605           401
        60        60006           497         45219           232         34962          1413
        61        62211            66         43301           104         36553           394
        62        61530            47         43645           123         34151           381
        70        59738           380         43176           286         34783          1567
        70        60988            70         42115           132         36931           407
        71        61319            76         42675            90         35336           323
        75        62402           398         44539           224         37923          1132
        75        60812            66         43790           116         34839           373
        77        62885            82         45267            72         35848           310
        80        60077           154         44181           393         32197          1513
        81        60118            35         46024            59         39169           190
        83        61791           115         46716            44         39592           295
        87        57090           181         43096           244         35117          1229
        87        62665            84         45634            44         41626           296
        89        59524            91         44228            52         37435           264
        92        57822           121         43098            81         37357           622
        94        62745           194         46117           248         43361          1280
        96        61023            58         46515            45         39916           202
        96        64832            49         47852            29         44019           166

Final score for writes:            96
Final score for reads :         40279


test29# blogbench -d /usr/obj/bench	(UFS + softupdates)

Frequency = 10 secs
Scratch dir = [/usr/obj/bench]
Direct I/O: disabled
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.

  Nb blogs   R articles    W articles    R pictures    W pictures    R comments    W comments
        21        41840          1138         38517          1168         26698          5056
        33        71941           800         52572           668         52682          4824
        46        60342           625         40127           512         39293          3301
        53        70209           709         44394           479         51812          3297
        65        53748           689         35700           491         36768          3025
        65        18636             0         12765             0         13882             2
        65        19329             0         13227             0         13244             0
        67        34110           199         23500           164         23938          1109
        67        19850             0         13136             4         12844            10
        67        19394             0         12692             0         13231             0
        67        19452             0         12909             0         13442             0
        67        19523             0         13231             0         13644             0
        67        19941             0         13162             0         12295             0
        67        20134             0         12781             0         13061             0
        67        19832             0         13066             0         13343             0
        67        19672             0         12471             0         12996             0
        67        19353             0         12842             0         13634             0
        67        19516             0         12775             1         13401             1
        67        19399             0         12927             0         13596             0
        67        20434             0         12915             0         13345             0
        67        19534             0         12528             0         14222             0
        67        20034             0         12667             0         13535             0
        67        19090             0         12707             0         14163             0
        67        20591             0         13061             0         12392             0
        67        19419             0         12702             0         13495             1
        67        18881             0         12697             0         13638             0
        67        19308             0         12430             0         13816             0
        67        18406             0         12878             0         14477             0
        67        18697             0         12448             0         14445             0
        67        19444             0         12322             0         13796             0

Final score for writes:            67
Final score for reads :         16022

Interesting, but what am I looking at?

Mr_Z
on
May 14, 2008 - 8:39am

That blogbench result is interesting, but I'm not sure what I'm looking at. What are the numbers in the 7 columns? Number of transactions in the last time quanta?

If so, it seems rather odd that the number of write transactions goes to ZERO for the UFS + Soft Update case. That's not a slowdown. That's an outright death!

--
Program Intellivision and play Space Patrol!

Indeed. Has anybody been

Anonymous (not verified)
on
May 16, 2008 - 12:27pm

Indeed. Has anybody been able to reproduce this?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.