Hi Evgeniy,
On Sat, 14 Jun 2008, Evgeniy Polyakov wrote:
By synchronous/asynchronous, are you talking about whether writepages()
blocks until the write is acked by the server? (Really, any FS that does
writeback is writing asynchronously...)
Well... Ceph writes synchronously (i.e. waits for ack in write()) only
when write-sharing on a single file between multiple clients, when it is
needed to preserve proper write ordering semantics. The rest of the time,
it generates nice big writes via writepages(). The main performance issue
is with small files... the fact that writepages() waits for an ack and is
usually called from only a handful of threads limits overall throughput.
If the writeback path was asynchronous as well that would definitely help
(provided writeback is still appropriately throttled). Is that what
you're doing in POHMELFS?
Your meaning of "transaction" confused me as well. It sounds like you
just mean that the read/write operation is retried (asynchronously), and
may be redirected at another server if need be. And that writes can be
directed at multiple servers, waiting for an ack from both. Is that
right?
I my view the writeback metadata cache is definitely the most exciting
part about this project. Is there a document that describes where the
design ended up? I seem to remember a string of posts describing your
experiements with client-side inode number assignment and how that is
reconciled with the server. Keeping things consistent between clients is
definitely the tricky part, although I suspect that even something with
very coarse granularity (e.g., directory/subtree-based locking/leasing)
will capture most of the performance benefits for most workloads.
Cheers-
sage
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html