Unknown mailing list, 1.

HAMMER Crash Recovery

Submitted by Jeremy
on April 24, 2008 - 5:20pm

"HAMMER is going to be a little unstable as I commit the crash recovery code," began DragonFly BSD creator Matthew Dillon, adding, "I'm about half way through it." He went on to list what's left for crash recovery to work with HAMMER, his new clustering filesystem, "I have to flush the undo buffers out before the meta-data buffers; then I have to flush the volume header so mount can see the updated undo info; then I have to flush out the meta-data buffers that the UNDO info refers to; and, finally, the mount code must scan the UNDO buffers and perform any required UNDOs." He continued:

"The idea being that if a crash occurs at any point in the above sequence, HAMMER will be able to run the UNDOs to undo any partially written meta-data. HAMMER would be able to do this at mount-time and it would probably take less then a second, so basically this gives us our instant crash-recovery feature."

Matt went on to add that as an advantage of significantly separating the front end VFS operations from the backend I/O it would now be possible to fix several stalls in the code, significantly improving HAMMER's performance.


From: Matthew Dillon
Subject: HAMMER update 24-Apr-2008
Date: Apr 24, 2:57 pm 2008

HAMMER is going to be a little unstable as I commit the crash
    recovery code.  I'm about half way through it.  Meta-data updates
    to the disk media have now been separated out.  I have a few things
    left to do before crash recovery will actually work:

    * I have to flush the undo buffers out before the meta-data buffers
    * Then I have to flush the volume header so mount can see the updated
      undo info.
    * Then I have to flush out the meta-data buffers that the UNDO
      info refers to.
    * And, finally, the mount code must scan the UNDO buffers and perform
      any required UNDOs.

    The idea being that if a crash occurs at any point in the above
    sequence, HAMMER will be able to run the UNDOs to undo any partially
    written meta-data.  HAMMER would be able to do this at mount-time and
    it would probably take less then a second, so basically this gives us
    our instant crash-recovery feature.

    One interesting outcome of the separation work I just committed is
    that the frontend VOPs are *massively* disconnected from backend disk
    I/O now.  In coming weeks I hope to take advantage of this separation
    to remove the remaining stalls and significantly improve HAMMER's
    performance.

						-Matt

File system journal?

Anonymous (not verified)
on
April 25, 2008 - 12:46am

So why not call it a journal like every other file system developer does?

It does not appear to keep a

Anonymous (not verified)
on
April 25, 2008 - 3:01pm

It does not appear to keep a journal.

DragonflyBSD

Jonas (not verified)
on
April 27, 2008 - 3:09am

Speaking of Dragonfly, I believe the key difference from Free is the use of a more conservative threading/process model. From what I've seen from the version 7 benchmarks, FreeBSD did a very good job of integrating the new model. Apache and MySQL performance is up, which is that I personally care the most about.

Has it not been proven that the new model was a good one? Will the Dragonfly fork potentially close? Will HAMMER be ported to FreeBSD?

update

on
April 27, 2008 - 4:26am

From: Matthew Dillon
Subject: HAMMER update 26-Apr-2008
Date: Apr 26, 12:40 pm 2008

The UNDO buffer and crash recovery code is now in-place and working
    fairly well.  There may still be a few cases where inode link counts and
    data get out of sync after a crash but the filesystem should have
    much better integrity now.  I can ^\+^C my vkernel, restart it,
    mount, and HAMMER doesn't blow up.  I still have work to do but I am
    making really excellent progress.

    The pruning and reblocking ioctls are currently non-working due to
    the changes, but I expect to have those in working order again (and
    crash recoverable as well) in the next few days.

    I also still have to do the filesystem full handling before I can
    officially declare the filesystem 'alpha'.

    --

    Large-file write performance is pretty nasty right now, but small
    file write performance is insanely good.  Even with all the debugging
    console spew and other inefficiencies with the backend flushder, copying
    the pkgsrc tree to a HAMMER partition is already competitive with
    UFS+softupdates:

	(tar cf - /usr/pkgsrc a few times to stabilize the NFS server)

	/usr/bin/time -l cpdup /usr/pkgsrc/. /home/pkgsrc
	208.94 real         4.35 user        32.25 sys

	/usr/bin/time -l cpdup /usr/pkgsrc/. /mnt/pkgsrc
	183.00 real         4.73 user        47.46 sys

	umount /home
	umount /mnt
	mount /home
	mount /mnt

	/usr/bin/time -l rm -rf /home/pkgsrc
	72.12 real         0.10 user         3.77 sys

	/usr/bin/time -l rm -rf /mnt/pkgsrc
	73.37 real         0.23 user        13.12 sys


    I expect to be able to drop those times a lot more before I'm done.

    The large-file write performance issue is just an implementation issue.
    Because data is not overwritten it is possible for the frontend to
    directly allocate media space and issue the data portion of the writes
    directly to the media, only queueing the meta-data portion to the backend.
    But I don't do that yet, and right now the backend synchronizer is
    being woken up way, way too often due to the data buffer load.

    Please note that HAMMER spews a lot of junk to the console at the
    moment, too.  This is getting rather exciting!

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.