Hmm. Those seem fairly complex. How would you deal with incidental
operations on the B-Tree such as doing a split? A single insert
could wind up requiring a split of every internal node on the way
down to the leaf. I guess you could clone the blocks, but there are
a ton of parent and child linkages that have to be changed when you
split a node. It sounds pretty expensive.Maybe implementing UNDOs for incidental B-Tree operations is the ticket.
Incidental meaning those operations (such as splits) which lead up to
an insertion or deletion but do not actually perform the insertion
or deletion. On crash recovery you would undo partially completed
B-Tree operations and then REDO the related logical operations.Having to allocate new blocks for B-Tree deletions could create a big
issue when the filesystem becomes full or near-full.-Matt
Matthew Dillon
<dillon@backplane.com>_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
Hi,
The following is summary of cleanup and small fix, and sparse warning
fix. "make tests" and some test seem to work.static-http://userweb.kernel.org/~hirofumi/tux3/
Actual patches are the above hg repo. If those are ok, please pull.
Thanks.
changeset: 421:1b9f30bfd9043fc008e6ad32141001144306089f
tag: tip
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/tux3graph.c
description:
tux3graph update to current format
The order of fields in "struct extent" was changed, use same order.changeset: 420:bc24a3a2b39e585b95d81d8d371d7cb02bbbd0c0
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/tux3graph.c
description:
tux3graph.c cleanup for sparce
This is just for making sparse happy.changeset: 419:c3e007f67e074dc187b7af3d76da4a1d4c619ff9
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/tux3fuse.c
description:
tux3fuse.c cleanup for sparce
fd in MAIN is shadow of global fd, and global one is used only in
tux3_init(). So, move global to tux3_init().changeset: 418:e3dfc7902cb2cadb68cb9e37e682b3e01c772671
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/btree.c
description:
btree.c cleanup for sparce
This is just for making sparse happy.changeset: 417:cf3766ead4852b93407fca03164abf5918b2a296
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/balloc.c
description:
balloc.c cleanup for sparce
super.volblocks is be_u64, use collect endian.changeset: 416:8f732c872c6c66ea06763a8d8c89d0ea9c104da2
user: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
date: Thu Nov 13 20:56:30 2008 +0900
files: user/filemap.c
description:
filema...
In this case a log transaction is created containing all of the split
nodes as physical updates and possibly some merged nodes from the free
tree. Btree splitting can always be committed atomically and
independently of any other activity on the filesystem. It is nicely
bounded by the btree depth. Allocations that do not require splitting
the free tree are logged logically, leaving the affected free tree
blocks dirty in cache. So the allocation updates come "for free",
being tucked into the same commit block that governs the split btreeI now see how UNDO works, thankyou for your patient explanations. The
point I overlooked is, fsync (and friends) is indeed the only barrier
you have to worry about.Still, I think it is good to try to get as much as possible of what was
going on in a bit burst of activity with no fsync durably onto disk.
Redo will clearly leave the filesystem in a consistent state less far
back in time than undo.So my general strategy is to log big changes like splits as "physical"
full block updates and small ones like allocating an extent as logical
edit records in the commit blocks of the physical updates.The complexity you noted above is due to making sure that the on-disk
image of a block is always what the logical edit record expects itYes, allocate-in-free is usually nasty. I need to ensure that there is
always a reserve of blocks available on disk that is more than the
maximum possible transaction that can be accepted. This simple idea
can get really complex, I know. One nice trick to simplify things a
little is to have a really generous limit on the maximum number of
dirty blocks that are allowed, when the filesystem has lots of free
space, and shrink that limit progressively as free space approaches
zero.I now see what you were driving at with your point about interlocking
namespace transactions, etc. While each VFS transaction can indeed be
committed on its own, atomically and independently, to do so would be
death for throughpu...
| Jeff Garzik | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Christoph Hellwig | Re: [malware-list] [RFC 0/5] [TALPA] Intro to a linux interface for on access scan... |
| Heiko Carstens | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| Jens Axboe | Re: [BUG] New Kernel Bugs |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Emmanuel Dreyfus | fixing send(2) semantics (kern/29750) |
| Christos Zoulas | Re: Melting down your network [Subject changed] |
| Juan RP | Changing the I/O scheduler on-the-fly |
| Emmanuel Dreyfus | Re: fixing send(2) semantics (kern/29750) |
