Re: Distributed storage. Security attributes and ducumentation update.

Previous thread: [PATCH 1/3] VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations by Jeff Layton on Thursday, August 30, 2007 - 11:06 am. (3 messages)

Next thread: Re: [00/36] Large Blocksize Support V6 by Christoph Lameter on Friday, August 31, 2007 - 9:11 pm. (1 message)
To: <netdev@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>
Date: Friday, August 31, 2007 - 12:06 pm

On Tue, Jul 31, 2007 at 09:13:47PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
Hi.

I'm pleased to announce third release of the distributed storage
subsystem, which allows to form a storage on top of remote and local
nodes, which in turn can be exported to another storage as a node to
form tree-like storages.

This release includes following changes:
* security attributes (permission mask assigned to addresses, allowed to
connect to given local export node)
* big documentation update (userspace documentation on the site also
includes various usage case examples and descirption of the
configuration utilitiy, protocols and userspace target)
* mirror algorithm has been moved from per-page to per-sector dirty
bitmask

Further TODO list includes:
* implement optional saving of mirroring/linear information on the remote
nodes (simple)
* implement netlink based setup (simple)
* new redundancy algorithm (complex)

Homepage:
http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/Documentation/dst/algorithms.txt b/Documentation/dst/algorithms.txt
new file mode 100644
index 0000000..bfc6984
--- /dev/null
+++ b/Documentation/dst/algorithms.txt
@@ -0,0 +1,115 @@
+Each storage by itself is just a set of contiguous logical blocks, with
+allowed number of operations. Nodes, each of which has own start and size,
+are placed into storage by appropriate algorithm, which remaps
+logical sector number into real node's sector. One can create
+own algorithms, since DST has pluggable interface for that.
+Currently mirrored and linear algorithms are supported.
+
+Let's briefly describe how they work.
+
+Linear algorithm.
+Simple approach of concatenating storages into single device with
+increased size is used in this algorithm. Essentially new device
+has size equal to sum of sizes of underlying nodes and nodes are
+placed one after another.
+
+ /----- Node 1 ---\ ...

To: Evgeniy Polyakov <johnpol@...>
Cc: <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, September 17, 2007 - 2:22 pm

How is this different from raid0/1 over nbd? Or raid0/1 over

storage?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Pavel Machek <pavel@...>
Cc: <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Saturday, September 22, 2007 - 7:31 am

Hi Pavel.

I will repeate a quote I made for previous release:

It has number of advantages, outlined in the first release and on the
project homepage, namely:

* non-blocking processing without busy loops (compared to iSCSI and NBD)
* small, plugable architecture
* failover recovery (reconnect to remote target)
* autoconfiguration
* no additional allocatins (not including network part) - at least two
in device mapper for fast path
* very simple - try to compare with iSCSI
* works with different network protocols
* storage can be formed on top of remote nodes and be exported
simultaneously (iSCSI is peer-to-peer only, NBD
requires device mapper, is synchronous and wants
special userspace thread)

DST allows to remove any nodes and then turn it
back into the storage without
breaking the dataflow, dst core will
reconnect automatically to the
failed remote nodes, it allows to work
with detouched devices just like
with usual filesystems (in case it was
not formed as a part of linear
storage, since in that case meta
information is spreaded between nodes).

It does not require special processes on
behalf of network connection,
everything will be performed
automatically on behalf of DST core
workers, it allows to export new device,
created on top of mirror or
linear combination of the others, which
in turn can be formed on top of
another and so on...

This was designed to allow to create a
distributed storage with
completely transparent failover
recovery, with ability to detouch remote
nodes from mirror array to became
standalone realtime backups (or
snapshots) and turn it back into the
storage without stopping main

Yep, thanks.

--
Evgeniy Polyakov
-

To: Evgeniy Polyakov <johnpol@...>
Cc: <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, September 10, 2007 - 6:14 pm

A couple questions below, but otherwise looks good from an RCU viewpoint.

This function is called under rcu_read_lock() or similar, right?
(Can't tell from this patch.) It is also OK to call it from under the

I see one call to this function that appears to be under the update-side
mutex, but I cannot tell if the other calls are safe. (Safe as in either

> +st

To: Paul E. McKenney <paulmck@...>
Cc: <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, September 13, 2007 - 8:22 am

Hi Paul.

Thanks for your comments, and sorry for late reply I was at KS/London

Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is

The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).

So, essentially RCU is not used in DST :)

Thanks for review, Paul.

--
Evgeniy Polyakov
-

To: Evgeniy Polyakov <johnpol@...>
Cc: <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, September 13, 2007 - 11:03 am

Ah! Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such. So, agree

Thanx, Paul

Previous thread: [PATCH 1/3] VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations by Jeff Layton on Thursday, August 30, 2007 - 11:06 am. (3 messages)

Next thread: Re: [00/36] Large Blocksize Support V6 by Christoph Lameter on Friday, August 31, 2007 - 9:11 pm. (1 message)