[PATCH 3/3] block: add a queue flag for request stacking support

Previous thread: [PATCH] [UIO] Add alignment warnings for uio-mem by Wolfram Sang on Thursday, September 18, 2008 - 7:46 am. (4 messages)

Next thread: Re: ACPI "Soft-off" power button only rebooting system, not powering off by Bodo Eggert on Thursday, September 18, 2008 - 8:04 am. (1 message)
From: Kiyoshi Ueda
Date: Thursday, September 18, 2008 - 7:43 am

Hi Jens,

The following patches are changes of the block layer for
request-based dm-multipath.
The patches are created on the following commit of for-2.6.28
in linux-2.6-block.
---------------------------------------------------------------
commit 1fd75f9b2e1c273ab145c9bf3e7a8afcbd8be87a
Author: Jens Axboe <jens.axboe@oracle.com>
Date:   Tue Sep 16 15:09:28 2008 -0700

    block: cleanup some of the integrity stuff in blkdev.h
---------------------------------------------------------------

No major changes since the last post (http://lkml.org/lkml/2008/9/12/100),
except for following up the differences from 2.6.27-rc6.
(As I said, I don't include patches for refactoring of similar
 interfaces at this time.  The refactoring patches will be posted
 after the request-based dm-multipath work is done.)

Summary of the patches:
  1/3: block: add a request data completion interface
  2/3: block: add a request submission interface
  3/3: block: add a queue flag for request stacking support

Please review and apply.

Thanks,
Kiyoshi Ueda
--

From: Kiyoshi Ueda
Date: Thursday, September 18, 2008 - 7:45 am

This patch adds blk_update_request(), which updates struct request
with completing its data part, but doesn't complete the struct
request itself.
Though it looks like end_that_request_first() of older kernels,
blk_update_request() should be used only by request stacking drivers.

Request-based dm will use it in bio->bi_end_io callback to update
the original request when a data part of a cloned request completes.
Followings are additional background information of why request-based
dm needs this interface.

  - Request stacking drivers can't use blk_end_request() directly from
    the lower driver's completion context (bio->bi_end_io or rq->end_io),
    because some device drivers (e.g. ide) may try to complete
    their request with queue lock held, and it may cause deadlock.
    See below for detailed description of possible deadlock:
    <http://marc.info/?l=linux-kernel&m=120311479108569&w=2>

  - To solve that, request-based dm offloads the completion of
    cloned struct request to softirq context (i.e. using
    blk_complete_request() from rq->end_io).

  - Though it is possible to use the same solution from bio->bi_end_io,
    it will delay the notification of bio completion to the original
    submitter.  Also, it will cause inefficient partial completion,
    because the lower driver can't perform the cloned request anymore
    and request-based dm needs to requeue and redispatch it to
    the lower driver again later.  That's not good.

  - So request-based dm needs blk_update_request() to perform the bio
    completion in the lower driver's completion context, which is more
    efficient.


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
 block/blk-core.c       |   57 +++++++++++++++++++++++++++++++++++++++++--------
 include/linux/blkdev.h |    2 +
 2 files changed, 50 insertions(+), 9 deletions(-)

Index: ...
From: Kiyoshi Ueda
Date: Thursday, September 18, 2008 - 7:45 am

This patch adds blk_insert_cloned_request(), a generic request
submission interface for request stacking drivers.
Request-based dm will use it to submit their clones to underlying
devices.

blk_rq_check_limits() is also added because it is possible that
the lower queue has stronger limitations than the upper queue
if multiple drivers are stacking at request-level.
Not only for blk_insert_cloned_request()'s internal use, the function
will be used by request-based dm when the queue limitation is
modified (e.g. by replacing dm's table).


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
 block/blk-core.c       |   81 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |    3 +
 2 files changed, 84 insertions(+)

Index: linux-2.6-block/block/blk-core.c
===================================================================
--- linux-2.6-block.orig/block/blk-core.c
+++ linux-2.6-block/block/blk-core.c
@@ -1522,6 +1522,87 @@ void submit_bio(int rw, struct bio *bio)
 EXPORT_SYMBOL(submit_bio);
 
 /**
+ * blk_rq_check_limits - Helper function to check a request for the queue limit
+ * @q:  the queue
+ * @rq: the request being checked
+ *
+ * Description:
+ *    @rq may have been made based on weaker limitations of upper-level queues
+ *    in request stacking drivers, and it may violate the limitation of @q.
+ *    Since the block layer and the underlying device driver trust @rq
+ *    after it is inserted to @q, it should be checked against @q before
+ *    the insertion using this generic function.
+ *
+ *    This function should also be useful for request stacking drivers
+ *    in some cases below, so export this fuction.
+ *    Request stacking drivers like request-based dm may change the queue
+ *    limits while requests are in the queue (e.g. dm's table swapping).
+ *    Such request stacking drivers should check those requests agaist
+ *    the ...
From: Kiyoshi Ueda
Date: Thursday, September 18, 2008 - 7:46 am

This patch adds a queue flag to indicate the block device can be
used for request stacking.

Request stacking drivers need to stack their devices on top of
only devices of which q->request_fn is functional.
Since bio stacking drivers (e.g. md, loop) basically initialize
their queue using blk_alloc_queue() and don't set q->request_fn,
the check of (q->request_fn == NULL) looks enough for that purpose.

However, dm will become both types of stacking driver (bio-based and
request-based).  And dm will always set q->request_fn even if the dm
device is bio-based of which q->request_fn is not functional actually.
So we need something else to distinguish the type of the device.
Adding a queue flag is a solution for that.


The reason why dm always sets q->request_fn is to keep
the compatibility of dm user-space tools.
Currently, all dm user-space tools are using bio-based dm without
specifying the type of the dm device they use.
To use request-based dm without changing such tools, the kernel
must decide the type of the dm device automatically.
The automatic type decision can't be done at the device creation time
and needs to be deferred until such tools load a mapping table,
since the actual type is decided by dm target type included in
the mapping table.

So a dm device has to be initialized using blk_init_queue()
so that we can load either type of table.
Then, all queue stuffs are set (e.g. q->request_fn) and we have
no element to distinguish that it is bio-based or request-based,
even after a table is loaded and the type of the device is decided.

By the way, some stuffs of the queue (e.g. request_list, elevator)
are needless when the dm device is used as bio-based.
But the memory size is not so large (about 20[KB] per queue on ia64),
so I hope the memory loss can be acceptable for bio-based dm users.


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
 block/blk-core.c     ...
From: Jens Axboe
Date: Thursday, September 18, 2008 - 9:25 am

Thank you, I've merged these three patches.

-- 
Jens Axboe

--

Previous thread: [PATCH] [UIO] Add alignment warnings for uio-mem by Wolfram Sang on Thursday, September 18, 2008 - 7:46 am. (4 messages)

Next thread: Re: ACPI "Soft-off" power button only rebooting system, not powering off by Bodo Eggert on Thursday, September 18, 2008 - 8:04 am. (1 message)