Those are interesting comments. I think I found Daniel's email address
so I am adding him to the To: Dan, feel free to post this on your Tux
groups if you want.I did consider multiple-parentage... that is the ability to have a
writable snapshot that 'forks' the filesystem history. It would be
an ultra cool feature to have but I couldn't fit it into the B-Tree
model I was using. Explicit snapshotting would be needed to make it
work, and the snapshot id would have to be made part of the B-Tree key,
which is fine. HAMMER is based around implicit snapshots (being able
to do an as-of historical lookup without having explicitly snapshotted
bits of the history). I would caution against using B-Tree iterations
in historical lookups, B-Trees are only fast if you can pretty much
zero in on the exact element you want as part of the primary search
mechanic. Once you have to iterate or chain performance goes out the
window. I know this because there are still two places where HAMMER
sometimes has to iterate due to the inexact nature of an as-of lookup.
Multiple-parentage almost certainly require two inequality searches,
or one inequality search and an iteration. A single timeline only
requires one inequality search.I couldn't get away from having a delete_tid (the 'death version
numbers'). I really tried :-) There are many cases where one is only
deleting, rather then overwriting. Both numbers are then needed to
be able to properly present a historical view of the filesystem.
For example, when one is deleting a directory entry a snapshot after
that deletion must show the directory entry gone. Both numbers are
also needed to be able to properly prune out unwanted historical data
from the filesystem. HAMMER's pruning algorithm (cleaning out old
historical data which is no longer desired) creates holes in the
sequence so once you start pruning out unwanted hist...
(resent after subscribing to the the kernel@crater)
Yes, that is the main difference indeed, essentially "log everything" vs
"commit" style versioning. The main similarity is the lifespan orientedOnce I get down to the leaf level I binary search on logical address in
the case of a file index btree or on version in the case of an inode
table block, so this cost is still Log(N) with a small k. For a
heavily versioned inode or file region this might sometimes result in
an overflow block or two that has to be linearly searched which is not
a big problem so long as it is a rare case, which it really ought to be
for the kinds of filesystem loads I have seen. A common example of a
worst case is /var/log/messages, where the mtime and size are going to
change in pretty much every version, so if you have hourly snapshots
and hold them for three months it adds up to about 2200 16 byte inode
table attributes to record, about 8 inode table leaf blocks. I really
do not see that as a problem. If it goes up to 100 leaf blocks to
search then that could be a problem.Usually, I will only be interested in the first and last of those
blocks, the first holding the rarely changing attributes including the
root of the file index btree and the last holding the most recent
incarnation of the mtime/size attribute.The penultimate inode table index block tells me how many blocks a
given inode lives in because several blocks will have the same inum
key. So the lookup algorithm for a massively versioned file becomes:
1) read the first inode table block holding that inum; 2) read the last
block with the same inum. The latter operation only needs to consult
the immediate parent index block, which is locked in the page cache at
that point.It will take a little care and attention to the inode format,
especially the inode header, to make sure that I can reliably do that
first/last probe optimization for the "head" version, but it does seem
worth the effort. For starters I will just do a mindle...
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Andrew Morton | -mm merge plans for 2.6.23 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| PJ Waskiewicz | [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe |
| David Miller | Re: [GIT]: Networking |
