Yah, I agree. Here's a quick summary of the issues:
* UNDO records are used to compartmentalize atomic changes which
cover multiple disk blocks. For example, if you 'rm' a file
and a crash occurs, you want the state of the filesystem to
either show the file and its directory entry both removed, or show
the file and its directory entry both still present.
* Updates to the inode_data, which holds the stat/chmod info for
a file object, typically requires rolling a new inode_data record
with the old one still available via the filesystem history. For
example, if you append some stuff to an existing file an old
version of the inode_data must be present in order to 'see' the
previous state of the file (in particular, the previous st_size
of the file).
* BUT, having to do any of the above when updating atime and mtime
would be really expensive.
- atime gets updated all the time. We definitely do not want to
roll UNDO records *or* new inode_data records.
- mtime gets updated all the time in certain situations, such as
when overwriting a file (e.g. in ways that do not modify the
- mtime is often used to uniquely determine whether a file has
* And, finally, we want mirroring to work properly even if the
filesystem is mounted 'nohistory' (told not to roll new
inode_data records). Or, for that matter, if individual files
are chflagged 'nohistory'.
The bane of HAMMER's design is that we absolutely do not want to roll
new inode_data records unless we have to, so here is what I am going to
* ATime will be updated asynchronously and will not be CRCd, so
the B-Tree element's CRC field does not have to be updated.
(thus no UNDO records need to be generated either).
* MTime will be updated semi-synchronously and will be CRCd.
(It will be fully synchronous from the point of view of
anyone using the filesystem, of course). UNDO ...
On Fri, Jun 20, 2008 at 12:57 PM, Matthew Dillon
Pardon my ignorance if I am missing something, I haven't looked much
into HAMMER yet.
Will the FS have the same atomic update features that UFS has? Meaning
fsync(2) returns only when all directory entries are safely on the
disk (whether it's with softupdate-type ordering or journaling). It's
important for mail servers and such so they don't lose messages at the
time of powerfail/crash. If you dig around mailing lists, you'll find
interesting stories how people who ran their FS mounted async (the
default Linux EXT2/3 mount) for mail servers (and AFAIK at least on
Linux in that case fsync returns early - not atomic, so software
written with BSD behavior in mind wasn't safe to run without patching)
found some of the messages in lost+found.
Also will there be a feature to grow/and or shrink the FS live without
having to unmount? I can do this right now with XFS and LVM on Linux
(grow, but not shrink), and its working amazingly well and very
quickly to boot.