[PATCH 3/6] ceph: message layer can send/receive data in bio

Previous thread: [PATCH 1/6] ceph: all allocation functions should get gfp_mask by Yehuda Sadeh on Tuesday, April 13, 2010 - 4:29 pm. (1 message)

Next thread: [PATCH 0/3] [idled]: Idle Cycle Injector for power capping by Salman on Tuesday, April 13, 2010 - 5:08 pm. (22 messages)
From: Yehuda Sadeh
Date: Tuesday, April 13, 2010 - 4:29 pm

The following series implements a linux rados block device. It allows striping of data across rados, the ceph distributed block layer, and is binary compatible with Christian Brunner's kvm-rbd implementation. Similarly to the Sheepdog project, it stripes the block device across 4MB (or other configurable size) objects stored by the distributed object store, and enjoys all the rados features (high availability, scalability, etc.). Future versions will be able to use the rados snapshots mechanism.

A use case for this device would be to have some kind of a local file system created on it, and use it with conjuction of kvm to do migration and other related stuff. Another option would be to use it as data devices for other distributed file systems (e.g., ocfs2, gfs2).

The actual device driver is implemented in the last patch of the series, and is based on osdblk. Currently, it resides under the fs/ceph tree and does not exist as a separate module of its own, such that the ceph.ko module contains both the ceph file system and the rbd block device. Another option would be to have it as a separate module that resides under drivers/block, however, it would still depend on the ceph.ko module.

Any comments, questions, suggestions are more than welcome,

Yehuda

---
 fs/ceph/Makefile     |    3 +-
 fs/ceph/caps.c       |    2 +-
 fs/ceph/debugfs.c    |   11 +-
 fs/ceph/file.c       |   56 +++-
 fs/ceph/mds_client.c |   11 +-
 fs/ceph/messenger.c  |  185 +++++++--
 fs/ceph/messenger.h  |    5 +-
 fs/ceph/mon_client.c |   18 +-
 fs/ceph/msgpool.c    |    4 +-
 fs/ceph/osd_client.c |  193 ++++++---
 fs/ceph/osd_client.h |   27 ++
 fs/ceph/rbd.c        | 1224 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/rbd.h        |    8 +
 fs/ceph/super.c      |  165 ++++++--
 fs/ceph/super.h      |   31 ++-
 15 files changed, 1786 insertions(+), 157 deletions(-)
--

From: Yehuda Sadeh
Date: Tuesday, April 13, 2010 - 4:29 pm

This capability is being added so that we wouldn't
need to copy the data from the rados block device.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
---
 fs/ceph/messenger.c  |  175 +++++++++++++++++++++++++++++++++++++++++--------
 fs/ceph/messenger.h  |    3 +
 fs/ceph/osd_client.c |   14 ++++-
 fs/ceph/osd_client.h |    4 +-
 4 files changed, 164 insertions(+), 32 deletions(-)

diff --git a/fs/ceph/messenger.c b/fs/ceph/messenger.c
index b6cff8e..3f77210 100644
--- a/fs/ceph/messenger.c
+++ b/fs/ceph/messenger.c
@@ -8,6 +8,8 @@
 #include <linux/net.h>
 #include <linux/socket.h>
 #include <linux/string.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
 #include <net/tcp.h>
 
 #include "super.h"
@@ -529,8 +531,11 @@ static void prepare_write_message(struct ceph_connection *con)
 	if (le32_to_cpu(m->hdr.data_len) > 0) {
 		/* initialize page iterator */
 		con->out_msg_pos.page = 0;
-		con->out_msg_pos.page_pos =
-			le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
+		if (m->pages)
+			con->out_msg_pos.page_pos =
+				le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
+		else
+			con->out_msg_pos.page_pos = 0;
 		con->out_msg_pos.data_pos = 0;
 		con->out_msg_pos.did_page_crc = 0;
 		con->out_more = 1;  /* data + footer will follow */
@@ -712,6 +717,29 @@ out:
 	return ret;  /* done! */
 }
 
+static void init_bio_iter(struct bio *bio, struct bio **iter, int *seg)
+{
+	if (!bio) {
+		*iter = NULL;
+		*seg = 0;
+		return;
+	}
+	*iter = bio;
+	*seg = bio->bi_idx;
+}
+
+static void iter_bio_next(struct bio **bio_iter, int *seg)
+{
+	if (*bio_iter == NULL)
+		return;
+
+	BUG_ON(*seg >= (*bio_iter)->bi_vcnt);
+
+	(*seg)++;
+	if (*seg == (*bio_iter)->bi_vcnt)
+		init_bio_iter((*bio_iter)->bi_next, bio_iter, seg);
+}
+
 /*
  * Write as much message data payload as we can.  If we finish, queue
  * up the footer.
@@ -726,14 +754,20 @@ static int write_partial_msg_pages(struct ceph_connection *con)
 	size_t len;
 	int crc = con->msgr->nocrc;
 	int ...
Previous thread: [PATCH 1/6] ceph: all allocation functions should get gfp_mask by Yehuda Sadeh on Tuesday, April 13, 2010 - 4:29 pm. (1 message)

Next thread: [PATCH 0/3] [idled]: Idle Cycle Injector for power capping by Salman on Tuesday, April 13, 2010 - 5:08 pm. (22 messages)