Well, you've described both sides of the debate quite well. I for
one am in the 0-risk camp. ReiserFS got into trouble depending on
a collision space (albeit a small one). I did extensive testing of
64 bit collision spaces when I wrote Diablo (A USENET news system
contemporary with INN, over 10 years ago) and even with a 64 bit space
collisions could occur quite often simply due to the volume of article
traffic.I think the cost can be reduced to the point where there's no need
to allow any risk at all. After all, we're only talking about meta-data
updates here for the most part. Even under heavy loads it can take
30 seconds for enough dirty meta-data to build up to require a flush.
One can flush the mess, then seek/update the volume header.Or doing something like using a fixed LOG area, allowing it to be
preformatted... that would remove 100% of the risk and not require
a volume header update (drafting note: edited up from further below).I know volume headers seem old-school. It was a quick and dirty
Ok, this is simpler. Hmm. That brings up the question with regards
to data allocations and the log. Are you going to use a fixed log
space and allocate data-store separately or is the log a continuously
running pointer across the disk?Ok, here I spent about 30 minutes constructing a followup but then
you answered some of the points later on, so what I am going to doWait, it isn't? I thought it was. I think it has to be because
the related physical B-Tree modifications required can be unbounded,
and because physical B-Tree modifications are occuring in parallel the
related physical operations cause the logical operations to become
bound together, meaning the logical ops *cannot* be independantly
backed outHere's an example:
rm a/b/c <---- 1TB file
rmdir a/b
rmdir aThere are three problems here.
First, the l...
This is due well-known mathematical property, called birthday problem or
birthday paradox. As a result of that property for ideal hash function
of size N bits you only need 2^(N/2) random inputs to generate a
collision with sufficient probability. Therefore, for 64 bit hash
function you will get one collision for approximately every 2^32 inputs.http://en.wikipedia.org/wiki/Birthday_attack
-Maxim
So I do not want users to fixate on that detail. The mount option
allows them to choose between "fast but theoretically riskier" and
"warm n fuzzy zero risk but not quite so fast". If the example of Ext3
is anything to go by, almost everybody chooses the "ordered data" mode
over the "journal data" mode given the tradeoff that the latter is
about 30% slower but offers better data integrity for random file
rewrites.The tradeoff for the option in Tux3 will be maybe 1 - 10% slower in
return for a miniscule reduction in the risk of a false positive on
replay. Along the lines of deciding to live underground to avoid the
risk of being hit by a meteorite. Anyway, Tux3 will offer the option
and everybody will be happy.Actually implementing the option is pretty easy because the behavior
Yes, a continuous running space, not preallocated. The forward log
will insert itself into any free blocks that happen to be near the
the transaction goal location. Such coopted free space will be
implicitly unavailable for block allocation, slightly complicating the
block allocation code which has to take into consideration both theLogical logging is not idempotent by nature because of the uncertainty
of the state of the object edited by a logical operation: has the
operation already been applied or not? If you know something about the
structure of the target you can usually tell. Suppose the operation is
a dirent create. It has been applied to the target if the direct
already exists, otherwise not. I do not like this style of special
case hacking to force the logical edit to be idempotent.Instead, I choose to be sure about the state of the target object by
introducing the rule that after a logical log operation has been
generated, nothing is allowed to write to the object being edited. The
logical operation pins the state of the disk image of target object.
The object stays pinned until the logical log entry has been retired
by a physical commit to the object that updates the object...
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Andrew Morton | Re: 2.6.23-rc6-mm1 |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH take 2] pkt_sched: Protect gen estimators under est_lock. |
