[PATCH] Allow NBD to be used locally

Previous thread: [PATCH] latencytop s390 support. by Heiko Carstens on Friday, February 1, 2008 - 6:08 am. (3 messages)

Next thread: 2.6.24-git9 ACPI oops - regression by Lukas Hejtmanek on Friday, February 1, 2008 - 6:25 am. (4 messages)
From: Laurent Vivier
Date: Friday, February 1, 2008 - 6:25 am

This patch allows Network Block Device to be mounted locally.

It creates a kthread to avoid the deadlock described in NBD tools documentation.
So, if nbd-client hangs waiting pages, the kblockd thread can continue its
work and free pages.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 drivers/block/nbd.c |  146 ++++++++++++++++++++++++++++++++++-----------------
 include/linux/nbd.h |    4 +-
 2 files changed, 100 insertions(+), 50 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b4c0888..de6685e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <net/sock.h>
 #include <linux/net.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
@@ -434,6 +435,87 @@ static void nbd_clear_que(struct nbd_device *lo)
 }
 
 
+static void nbd_handle_req(struct nbd_device *lo, struct request *req)
+{
+	if (!blk_fs_request(req))
+		goto error_out;
+
+	nbd_cmd(req) = NBD_CMD_READ;
+	if (rq_data_dir(req) == WRITE) {
+		nbd_cmd(req) = NBD_CMD_WRITE;
+		if (lo->flags & NBD_READ_ONLY) {
+			printk(KERN_ERR "%s: Write on read-only\n",
+					lo->disk->disk_name);
+			goto error_out;
+		}
+	}
+
+	req->errors = 0;
+
+	mutex_lock(&lo->tx_lock);
+	if (unlikely(!lo->sock)) {
+		mutex_unlock(&lo->tx_lock);
+		printk(KERN_ERR "%s: Attempted send on closed socket\n",
+		       lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+		return;
+	}
+
+	lo->active_req = req;
+
+	if (nbd_send_req(lo, req) != 0) {
+		printk(KERN_ERR "%s: Request send failed\n",
+				lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+	} else {
+		spin_lock(&lo->queue_lock);
+		list_add(&req->queuelist, &lo->queue_head);
+		spin_unlock(&lo->queue_lock);
+	}
+
+	lo->active_req = NULL;
+	mutex_unlock(&lo->tx_lock);
+	wake_up_all(&lo->active_wq);
+
+	return;
+
+error_out:
+	req->errors++;
+	nbd_end_request(req);
+}
+
+static int ...
From: Pavel Machek
Date: Saturday, February 2, 2008 - 4:23 am

Hmm, and if there are no other pages that can be freed? Unlikely, but
can happen AFAICT.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Jan Engelhardt
Date: Saturday, February 2, 2008 - 4:52 am

Local NBD is good for when the content you want to make available
through the block device is dynamic (generated on-the-fly),
non-linear or supersparse.

Take for example VMware virtual disks. Just a guess, but
they roughly can look like this:

  kilobytes  0.. 1: header
  kilobytes  1..10: correspond to LBA 0..20
  kilobytes 11..20: correspond to LBA 40..60
  kilobytes 21..22: correspond to LBA 22..23

So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
does not deal with that.

And there is supersparsity -- the VMDK file itself is complete, but
unallocated regions like LBA 24..40 are sparse/zero when projected
onto a file/block device, respectively; loop cannot deal with that
either.

In fact, VMware uses local nbd today for its vmware-loop helper
utility, most likely because of the above-mentioned reasons. (Though
it quite often hung last time I tried.)
--

From: Laurent Vivier
Date: Saturday, February 2, 2008 - 8:26 am

It allows to write userlevel block device. In my case, I can mount disk

Correct. The patch improves the NBD behavior even if it is not perfect.
And I think if no other page can be freed your system is in very bad
move ;-)

Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

--

From: Miklos Szeredi
Date: Saturday, February 2, 2008 - 9:13 am

Not necessarily.  Problems start when the system wants to free memory
by writing out pages through NBD, and the userspace process servicing
it tries to allocate some memory in order to accomplish this.

Recent kernels have gotten much better at coping with this, so it
might not be easy to make local NBD deadlock under normal
circumstances.  But if you try hard enough, it's not impossible:
throttle_vm_writeout() can stall an allocation until pending writes
have completed, all with plenty of memory available in the system.

BTW, you can basically substitute local NBD with fuse-over-loop, and
get a similar kind of service, with similar problems.

Miklos
--

From: Pavel Machek
Date: Saturday, February 2, 2008 - 1:54 pm

So the description should be 

"This patch lowers probability of deadlock if you mount  Network Block Device locally"

Hmm.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

Previous thread: [PATCH] latencytop s390 support. by Heiko Carstens on Friday, February 1, 2008 - 6:08 am. (3 messages)

Next thread: 2.6.24-git9 ACPI oops - regression by Lukas Hejtmanek on Friday, February 1, 2008 - 6:25 am. (4 messages)