Re: [Tux3] Feature interaction between multiple volumes and atomic update

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Daniel Phillips <phillips@...>
Cc: <tux3@...>
Date: Friday, August 29, 2008 - 11:31 pm

:It turns out that multiple independent volumes sharing the same
:allocation space is a feature that does not quite come for free as I
:had earlier claimed. The issue is this:
:...
: * Therefore it seems logical that Tux3 should have a separate forward
: log for each subvolume to allow independent syncing of subvolumes.
: But global allocation state must always be consistent regardless of
: the order in which subvolumes are synced.

I had a lot of trouble trying to implement multiple logs in HAMMER
(the idea being to improve I/O throughput). I eventually gave up
and went with a single log (well, UNDO fifo in HAMMER's case). So
e.g. even though HAMMER does implement pseudo-filesystem spaces
for mirroring slaves and such, everything still uses a single log
space.

: 3) When the first subvolume is remounted after a crash, implicitly
: remount and replay all subvolumes that were also mounted at the time
: of the crash, roll up the logs, and unmount them.

If you synchronize the transaction id spaces between the subvolumes
then the crash recovery code could use a single number to determine
how far to replay each subvolume. That sounds like it ought to work.

: 4) Partition the allocation space so that each subvolume allocates
: from a completely independent allocation space, which is separately
: logged and synced. Either implement this by providing an
: additional level of indrection so that each subvolume has its own
: map of the complete volume which may be expanded from time to time
: by large increments, or record in each subvolume allocation map
: only those regions that are free and available to the subvolume.

I tried this in an earlier HAMMER implementation and it was a
nightmare. I gave up on it. Also, in an earlier iteration, I
had a blockmap translation layer to support the above. That
worked fairly well as long as the blocks were very large (at least
8MB). When I went to the single global B-Tree model I didn't
need the layer any more and devolved it back down to a simple
2-layer freemap.

:something Tux3 wishes to avoid. We would be better advised to improve
:the volume manager so that it is capable enough to provide such
:incremental allocation itself in a way that maps well to the needs of
:filesystems such as Tux3.
:
:I CC'd this one to Matt Dillon, perhaps mainly for sympathy. Hammer
:does not have this issue as it does not support subvolumes, perhaps
:wisely.

Yah. We do support pseudo-filesystems within a HAMMER filesystem,
but they are implemented using a field in the B-Tree element key.
They aren't actually separate filesystems, they just use totally
independant key spaces within the global B-Tree.

We use the PFSs as replication sources and targets. This also allows
the inode numbers to be replicated (each PFS gets its own inode
numbering space).

:Regards,
:
:Daniel

-Matt
Matthew Dillon

_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [Tux3] Feature interaction between multiple volumes and ..., Matthew Dillon, (Fri Aug 29, 11:31 pm)
[Tux3] 64bit inum, mknod(), dirent change, etc., OGAWA Hirofumi, (Thu Dec 11, 3:20 pm)
Re: [Tux3] 64bit inum, mknod(), dirent change, etc., Daniel Phillips, (Thu Dec 11, 10:51 pm)
Re: [Tux3] 64bit inum, mknod(), dirent change, etc., Daniel Phillips, (Thu Dec 11, 5:23 pm)
Re: [Tux3] 64bit inum, mknod(), dirent change, etc., OGAWA Hirofumi, (Thu Dec 11, 6:33 pm)