This is a follow-up to my earlier mail http://lkml.org/lkml/2008/9/23/294 ([PATCH] BUG: ll_merge_requests_fn() updates req->nr_phys_segments wrongly) It is possible for the merging code to create lesser no of phys segments than hw segments, but every hw segment needs atleast one new phys segment. This triggers the BUG() on scsi_init_sgtable() as blk_rq_map_sg() returns more no of segments than rq->nr_phys_segments The following blktrace shows a sequence of bio's to trigger such condition on my machine with max_sectors_kb=512 & max_hw_sectors_kb=32767. 8,0 0 12 1.943680621 2261 Q R 6184075 + 55 [bash] 8,0 0 13 1.943684671 2261 G R 6184075 + 55 [bash] 8,0 0 14 1.943690189 2261 P N [bash] 8,0 0 15 1.943692075 2261 I R 6184075 + 55 [bash] 8,0 0 16 1.943712119 2261 D R 6184075 + 55 [bash] 8,0 0 17 1.943718684 2261 Q R 6184130 + 55 [bash] 8,0 0 18 1.943721897 2261 G R 6184130 + 55 [bash] 8,0 0 19 1.943726576 2261 P N [bash] 8,0 0 20 1.943727763 2261 I R 6184130 + 55 [bash] 8,0 0 21 1.943731675 2261 Q R 6184241 + 56 [bash] 8,0 0 22 1.943735167 2261 G R 6184241 + 56 [bash] 8,0 0 23 1.943739497 2261 I R 6184241 + 56 [bash] 8,0 0 24 1.943742081 2261 Q R 6184185 + 56 [bash] 8,0 0 25 1.943744875 2261 M R 6184185 + 56 [bash] 8,0 0 26 1.943753535 2261 Q R 6184352 + 55 [bash] 8,0 0 27 1.943756538 2261 G R 6184352 + 55 [bash] 8,0 0 28 1.943760868 2261 I R 6184352 + 55 [bash] 8,0 0 29 1.943763522 2261 Q R 6184407 + 55 [bash] 8,0 0 30 1.943766316 2261 M R 6184407 + 55 [bash] 8,0 0 31 1.943770506 2261 Q R 6184297 + 55 [bash] 8,0 0 32 1.943772951 2261 F R 6184297 + 55 [bash] 8,0 0 33 ...
Um, don't you mean this the other way around? I can see this problem occurring if the block layer gets tricked into doing a physical merge where sector limits forbid a virtual merge. The bug would appear to be that we sometimes only look at q->max_sectors when deciding on mergability. Either we have to insist on max_sectors <= hw_max_sectors, or we have to start using min(q->max_sectors, q->max_hw_sectors) for this. James --
q->max_sectors MUST always be <= q->max_hw_sectors, otherwise we could be sending down requests that are too large for the device to handle. So that condition would be a big bug. The sysfs interface checks for this, and blk_queue_max_sectors() makes sure that is true as well. The fixes proposed still look weird. There is no phys vs hw segment constraints, the request must adhere to the limits set by both. It's mostly a moot point anyway, as 2.6.28 will get rid of the hw accounting anyway. -- Jens Axboe --
Yes, that seems always to be enforced. Perhaps there are other ways of tripping this problem ... I'm still sure if it occurs it's because we do Agree all three proposed fixes look wrong. However, if it's what I think, just getting rid of hw accounting won't fix the problem because we'll still have the case where a physical merge is forbidden by iommu constraints ... this still needs to be accounted for. What we really need is a demonstration of what actually is going wrong ... James --
Yep, IIRC we both asked for that the last time as well... Nikanth? -- Jens Axboe --
Yes. This patch fixes it.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
---
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5efc9e7..9c2fe49 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -392,7 +392,11 @@ static int ll_merge_requests_fn(struct request_queue *q,
struct request *req,
return 0;
total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
- if (blk_phys_contig_segment(q, req->biotail, next->bio))
+ if (blk_phys_contig_segment(q, req->biotail, next->bio) &&
+ BIOVEC_VIRT_MERGEABLE(__BVEC_END(req->biotail),
+ __BVEC_START(next->bio)) &&
+ !BIOVEC_VIRT_OVERSIZE(req->biotail->bi_hw_back_size
+ + next->bio->bi_hw_front_size))
total_phys_segments--;
Sorry, I had already sent the blktrace and some code snippet that would
reproduce the condition. Here is the module and user-space program that can
trigger this condition.
diskbiomod.c
---
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/blkdev.h>
#include <linux/miscdevice.h>
/*
** Shared between the driver module and application.
*/
#define MOD_NODE "/dev/DiskBio"
#define MOD_NAME "DiskBio"
/*
** IOCTLs
*/
#define DB_IOCTL_DISK 0x00000001
#define DB_IOCTL_GENDISK 0x00000002
/*
** MODULE Defines.
*/
MODULE_DESCRIPTION("DiskBio - Initiate Disk I/O Traffic");
MODULE_LICENSE("GPL");
/*
** Module Defines.
*/
#define MOD_VERSION "09-11-2008-1"
#define NUM_BIO 8
#define NUMBER_OF_VECS 16
#define START_LBA 0x5e5c8b
/*
** Module Globals.
*/
static struct gendisk *(*get_GenDisk)(dev_t, int *) = NULL;
static int ErrorsOccurred = 0;
typedef struct {
int IoSize;
char *Mem;
} DB_MEM_LIST, *pDB_MEM_LIST;
DB_MEM_LIST dbMemList[NUM_BIO] = {
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x7000, NULL },
{ 0x7000, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
};
/*
** Wait ...On Thu, 2 Oct 2008 19:13:57 +0200 Possibly, blk_phys_contig_segment might miscalculate q->max_segment_size? blk_phys_contig_segment does: req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size; But it's possible that req->biotail and the previous bio are supposed be merged into one segment? Then we could create too large segment here. --
Hmm yes, that looks like it could indeed be a problem! We could fix this
with similar logic to what we used to do for the hw merging, keep track
of the current segment size that this bio belongs to, so it would end up
ala
if (blk_phys_contig_segment(q, req->biotail, next->bio) &&
rq->biotail->bi_seg_size + next->bio->bi_size <= q->max_segment_size) {
total_phys_segments--;
next->bio->bi_seg_size = rq->biotail->bi_seg_size + next->bio->bi_size;
}
for the merge part.
--
Jens Axboe
--
On Fri, 10 Oct 2008 14:03:44 +0200 Yeah, exactly. --
Lets fix it. It wont be part of the initial merge, since it'll need some dedicated testing, but we can get it there for 2.6.28. Shall I interpret your message as willingness to write up the fix? :) -- Jens Axboe --
On Fri, 10 Oct 2008 14:37:19 +0200 Yeah, it's on this weekend todo list. :) I want to look at the code again and make sure I correctly understand the problem. --
Oops, that was incomplete information to reproduce the issue.
typedef struct {
int IoSize;
char *Mem;
} DB_MEM_LIST, *pDB_MEM_LIST;
DB_MEM_LIST dbMemList[NUM_BIO] = {
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x7000, NULL },
{ 0x7000, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
{ 0x6e00, NULL },
};
// Allocate Memory.
char *Mem0 = kmalloc(dbMemList[0].IoSize, GFP_KERNEL);
char *Mem1 = kmalloc(dbMemList[1].IoSize, GFP_KERNEL);
char *Mem2 = kmalloc((dbMemList[2].IoSize + dbMemList[3].IoSize +
dbMemList[4].IoSize), GFP_KERNEL);
char *Mem5 = kmalloc(dbMemList[5].IoSize, GFP_KERNEL);
char *Mem6 = kmalloc(dbMemList[6].IoSize, GFP_KERNEL);
char *Mem7 = kmalloc(dbMemList[7].IoSize, GFP_KERNEL);
dbMemList[0].Mem = Mem0;
dbMemList[1].Mem = Mem1;
dbMemList[2].Mem = Mem2;
dbMemList[3].Mem = dbMemList[2].Mem + dbMemList[2].IoSize;
dbMemList[4].Mem = dbMemList[3].Mem + dbMemList[3].IoSize;
dbMemList[5].Mem = Mem5;
dbMemList[6].Mem = Mem6;
dbMemList[7].Mem = Mem7;
int Loop;
struct bio *BioList[NUM_BIO] = { NULL };
struct bio *Bio = NULL;
long Sector;
/*
** Allocate BIOs.
*/
for(Loop = 0; Loop < NUM_BIO; Loop++)
BioList[Loop] = bio_alloc(GFP_KERNEL, NUMBER_OF_VECS);
for(Loop = 0; Loop < NUM_BIO; Loop++) {
Bio = BioList[Loop];
Bio->bi_sector = Sector;
Bio->bi_bdev = Bdev;
Bio->bi_end_io = db_end_io;
if (Loop == 0) {
if (!bio_add_page(BioList[Loop],
virt_to_page(dbMemList[Loop].Mem),
0x3000,
offset_in_page(dbMemList[Loop].Mem))) {
printk("NIK: bio add page failed- %dA\n", Loop);
}
if (!bio_add_page(BioList[Loop],
virt_to_page(dbMemList[7].Mem),
0x3e00,
offset_in_page(dbMemList[7].Mem))) {
printk("NIK: bio add page failed- %dB\n", Loop);
}
} else if (Loop == 7) {
if (!bio_add_page(BioList[Loop],
virt_to_page(dbMemList[Loop].Mem),
0x2000,
offset_in_page(dbMemList[Loop].Mem))) {
printk("NIK: bio add page failed- %dA\n", Loop);
}
if ...