Re: Block integrity patches for 2.6.28

Previous thread: none

Next thread: [PATCH 0/6] memcg update v6 (for review and discuss) by KAMEZAWA Hiroyuki on Wednesday, October 1, 2008 - 12:52 am. (26 messages)
From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

This patch series contains block layer integrity updates for 2.6.28.

There are a few fixes to the core bits, including a switch from
block_device to gendisk in the integrity compare function.  I also had
to add a helper to access a gendisk's integrity profile.  This is used
by MD.

The DM bits have been cleaned up and a separate dm_table_set_integrity
function has been introduced.  This allows us to avoid changing
dm_table_set_restrictions() and its callers.

The MD patch now correctly handles devices that come and go (as
requested by Neil).

The patches are against for-2.6.28 and depend on

    d7533ad0e132f92e75c1b2eb7c26387b25a583c1

being reverted.

--
Martin K. Petersen      Oracle Linux Engineering


--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

- kobject_del already puts the parent.

 - Set integrity profile to NULL to prevent stale data.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-integrity.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 69023da..e3817a0 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -376,7 +376,7 @@ void blk_integrity_unregister(struct gendisk *disk)
 
 	kobject_uevent(&bi->kobj, KOBJ_REMOVE);
 	kobject_del(&bi->kobj);
-	kobject_put(&disk_to_dev(disk)->kobj);
 	kmem_cache_free(integrity_cachep, bi);
+	disk->integrity = NULL;
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
-- 
1.6.0.2

--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

A filesystem might supply its own integrity metadata.  Introduce a
flag that indicates whether the filesystem or the block layer owns the
integrity buffer.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio-integrity.c  |    3 ++-
 include/linux/bio.h |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index c3e174b..8e409ff 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -107,7 +107,8 @@ void bio_integrity_free(struct bio *bio, struct bio_set *bs)
 	BUG_ON(bip == NULL);
 
 	/* A cloned bio doesn't own the integrity metadata */
-	if (!bio_flagged(bio, BIO_CLONED) && bip->bip_buf != NULL)
+	if (!bio_flagged(bio, BIO_CLONED) && !bio_flagged(bio, BIO_FS_INTEGRITY)
+	    && bip->bip_buf != NULL)
 		kfree(bip->bip_buf);
 
 	mempool_free(bip->bip_vec, bs->bvec_pools[bip->bip_pool]);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6520ee1..6aba97d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -109,6 +109,7 @@ struct bio {
 #define BIO_EOPNOTSUPP	7	/* not supported */
 #define BIO_CPU_AFFINE	8	/* complete bio on same CPU as submitted */
 #define BIO_NULL_MAPPED 9	/* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10	/* fs owns integrity data, not block layer */
 #define bio_flagged(bio, flag)	((bio)->bi_flags & (1 << (flag)))
 
 /*
-- 
1.6.0.2

--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

Helper function to find the sector offset in a bio given bvec index
and page offset.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio.c            |   34 ++++++++++++++++++++++++++++++++++
 include/linux/bio.h |    1 +
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index e56e768..55ee2fd 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1300,6 +1300,40 @@ struct bio_pair *bio_split(struct bio *bi, mempool_t *pool, int first_sectors)
 	return bp;
 }
 
+/**
+ *	bio_sector_offset - Find hardware sector offset in bio
+ *	@bio:		bio to inspect
+ *	@index:		bio_vec index
+ *	@offset:	offset in bv_page
+ *
+ *	Return the number of hardware sectors between beginning of bio
+ *	and an end point indicated by a bio_vec index and an offset
+ *	within that vector's page.
+ */
+sector_t bio_sector_offset(struct bio *bio, unsigned short index,
+			   unsigned int offset)
+{
+	struct bio_vec *bv;
+	unsigned int sector_sz = bio->bi_bdev->bd_disk->queue->hardsect_size;
+	sector_t sectors;
+	int i;
+
+	sectors = 0;
+
+	BUG_ON(index >= bio->bi_vcnt);
+
+	bio_for_each_segment(bv, bio, i) {
+		if (i == index) {
+			if (offset > bv->bv_offset)
+				sectors += (offset - bv->bv_offset) / sector_sz;
+			return sectors;
+		}
+
+		sectors += bv->bv_len / sector_sz;
+	}
+
+	BUG();
+}
 
 /*
  * create memory pools for biovec's in a bio_set.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6aba97d..386a1df 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -327,6 +327,7 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
+extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, ...
From: Jens Axboe
Date: Wednesday, October 1, 2008 - 1:10 am

Too may BUG's there, me thinks, and the interface looks fragile. What if
the bio is alreday done (bi_idx == bi_vcnt)? I'd prefer something ala:

sector_t bio_sector_offset(struct bio *bio, unsigned short index,
			   unsigned int offset)
{
	unsigned int sector_sz = queue_hardsect_size(bio->bi_bdev->bd_disk->queue);
	struct bio_vec *bv;
	sector_t sectors;
	int i;

	sectors = 0;

        /* total start of bio current start? */
#if 0
        if (index >= bio->bi_vcnt)
                index = bio->bi_vcnt - 1;

        __bio_for_each_segment(bv, bio, i, 0) {
#else
        if (index > bio->bi_idx)
                return 0;

	bio_for_each_segment(bv, bio, i) {
#endif
		if (i == index) {
			if (offset > bv->bv_offset)
				sectors += (offset - bv->bv_offset) / sector_sz;
                        break;
		}

		sectors += bv->bv_len / sector_sz;
	}

        return sectors;
}

Depending on which interface you want, pick one of the if/else parts :-)

-- 
Jens Axboe

--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 7:42 pm

>>>>> "Jens" == Jens Axboe <jens.axboe@oracle.com> writes:

Jens> Too may BUG's there, me thinks, and the interface looks
Jens> fragile. What if the bio is alreday done (bi_idx == bi_vcnt)?

Good point!  Obviously I've only used the function for bio splitting
and not at completion time.  Updated patch below.


block: Find bio sector offset given idx and offset

Helper function to find the sector offset in a bio given bvec index
and page offset.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio.c            |   36 ++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |    1 +
 2 files changed, 37 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index e56e768..a5af580 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1300,6 +1300,42 @@ struct bio_pair *bio_split(struct bio *bi, mempool_t *pool, int first_sectors)
 	return bp;
 }
 
+/**
+ *      bio_sector_offset - Find hardware sector offset in bio
+ *      @bio:           bio to inspect
+ *      @index:         bio_vec index
+ *      @offset:        offset in bv_page
+ *
+ *      Return the number of hardware sectors between beginning of bio
+ *      and an end point indicated by a bio_vec index and an offset
+ *      within that vector's page.
+ */
+sector_t bio_sector_offset(struct bio *bio, unsigned short index,
+			   unsigned int offset)
+{
+	unsigned int sector_sz = queue_hardsect_size(bio->bi_bdev->bd_disk->queue);
+	struct bio_vec *bv;
+	sector_t sectors;
+	int i;
+
+	sectors = 0;
+
+	if (index >= bio->bi_idx)
+		index = bio->bi_vcnt - 1;
+
+	__bio_for_each_segment(bv, bio, i, 0) {
+		if (i == index) {
+			if (offset > bv->bv_offset)
+				sectors += (offset - bv->bv_offset) / sector_sz;
+			break;
+		}
+
+		sectors += bv->bv_len / sector_sz;
+	}
+
+	return sectors;
+}
+EXPORT_SYMBOL(bio_sector_offset);
 
 /*
  * create memory pools for biovec's in a bio_set.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6aba97d..386a1df 100644
--- ...
From: Jens Axboe
Date: Thursday, October 2, 2008 - 10:07 am

Looks good, I've applied patches 1-5.

-- 
Jens Axboe

--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

If all subdevices support the same protection format the MD device is
flagged as capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/md/md.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 45521da..35542f2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1411,6 +1411,38 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 
 static LIST_HEAD(pending_raid_disks);
 
+static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
+{
+	struct mdk_personality *pers = mddev->pers;
+	struct gendisk *disk = mddev->gendisk;
+	struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
+	struct blk_integrity *bi_mddev = blk_get_integrity(disk);
+
+	/* Data integrity passthrough not supported on RAID 4, 5 and 6 */
+	if (pers && pers->level >= 4 && pers->level <= 6)
+		return;
+
+	/* If rdev is integrity capable, register profile for mddev */
+	if (!bi_mddev && bi_rdev) {
+		if (blk_integrity_register(disk, bi_rdev))
+			printk(KERN_ERR "%s: %s Could not register integrity!\n",
+			       __func__, disk->disk_name);
+		else
+			printk(KERN_NOTICE "Enabling data integrity on %s\n",
+			       disk->disk_name);
+		return;
+	}
+
+	/* Check that mddev and rdev have matching profiles */
+	if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
+		printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
+		       disk->disk_name, rdev->bdev->bd_disk->disk_name);
+		printk(KERN_NOTICE "Disabling data integrity on %s\n",
+		       disk->disk_name);
+		blk_integrity_unregister(disk);
+	}
+}
+
 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 {
 	char b[BDEVNAME_SIZE];
@@ -1471,6 +1503,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 	}
 	list_add_rcu(&rdev->same_set, &mddev->disks);
 	bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, ...
From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

This is a wrapper for accessing a gendisk's integrity bits.  It allows
the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 include/linux/blkdev.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eec0842..ca08439 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1027,6 +1027,11 @@ static inline struct blk_integrity *bdev_get_integrity(struct block_device *bdev
 	return bdev->bd_disk->integrity;
 }
 
+static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
+{
+	return disk->integrity;
+}
+
 static inline unsigned int bdev_get_tag_size(struct block_device *bdev)
 {
 	struct blk_integrity *bi = bdev_get_integrity(bdev);
@@ -1069,6 +1074,7 @@ static inline int blk_integrity_rq(struct request *rq)
 #define blk_rq_count_integrity_sg(a)		(0)
 #define blk_rq_map_integrity_sg(a, b)		(0)
 #define bdev_get_integrity(a)			(0)
+#define blk_get_integrity(a)			(0)
 #define bdev_get_tag_size(a)			(0)
 #define blk_integrity_compare(a, b)		(0)
 #define blk_integrity_register(a, b)		(0)
-- 
1.6.0.2

--

From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

If all subdevices support the same protection format the DM device is
flagged as capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/md/dm-table.c |   36 ++++++++++++++++++++++++++++++++++++
 drivers/md/dm.c       |   17 +++++++++++++++++
 drivers/md/dm.h       |    1 +
 3 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 61f4414..f089404 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -860,7 +860,43 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q)
 		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
 	else
 		queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, q);
+}
+
+void dm_table_set_integrity(struct dm_table *t, struct mapped_device *md)
+{
+	struct list_head *devices = dm_table_get_devices(t);
+	struct dm_dev *prev, *cur;
+
+	/*
+	 * Run through all devices to ensure they have matching
+	 * integrity profile
+	 */
+	cur = prev = NULL;
+
+	list_for_each_entry(cur, devices, list) {
+
+		if (prev && blk_integrity_compare(prev->bdev->bd_disk,
+						  cur->bdev->bd_disk) < 0) {
+			printk(KERN_ERR "%s: %s %s Integrity mismatch!\n",
+			       __func__, prev->bdev->bd_disk->disk_name,
+			       cur->bdev->bd_disk->disk_name);
+			return;
+		}
+		prev = cur;
+	}
 
+	/* Register dm device as being integrity capable */
+	if (prev && bdev_get_integrity(prev->bdev)) {
+		struct gendisk *disk = dm_disk(md);
+
+		if (blk_integrity_register(dm_disk(md),
+					   bdev_get_integrity(prev->bdev)))
+			printk(KERN_ERR "%s: %s Could not register integrity!\n",
+			       __func__, disk->disk_name);
+		else
+			printk(KERN_INFO "Enabling data integrity on %s\n",
+			       disk->disk_name);
+	}
 }
 
 unsigned int dm_table_get_num_targets(struct dm_table *t)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8d40369..4f65342 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -669,6 +669,13 @@ static struct bio ...
From: Martin K. Petersen
Date: Wednesday, October 1, 2008 - 12:38 am

The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.

Also update comparison logic so that two NULL profiles are a valid
configuration.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-integrity.c  |   28 ++++++++++++++--------------
 include/linux/blkdev.h |    2 +-
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index e3817a0..61a8e2f 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -108,51 +108,51 @@ new_segment:
 EXPORT_SYMBOL(blk_rq_map_integrity_sg);
 
 /**
- * blk_integrity_compare - Compare integrity profile of two block devices
- * @bd1:	Device to compare
- * @bd2:	Device to compare
+ * blk_integrity_compare - Compare integrity profile of two disks
+ * @gd1:	Disk to compare
+ * @gd2:	Disk to compare
  *
  * Description: Meta-devices like DM and MD need to verify that all
  * sub-devices use the same integrity format before advertising to
  * upper layers that they can send/receive integrity metadata.  This
- * function can be used to check whether two block devices have
+ * function can be used to check whether two gendisk devices have
  * compatible integrity formats.
  */
-int blk_integrity_compare(struct block_device *bd1, struct block_device *bd2)
+int blk_integrity_compare(struct gendisk *gd1, struct gendisk *gd2)
 {
-	struct blk_integrity *b1 = bd1->bd_disk->integrity;
-	struct blk_integrity *b2 = bd2->bd_disk->integrity;
+	struct blk_integrity *b1 = gd1->integrity;
+	struct blk_integrity *b2 = gd2->integrity;
 
-	BUG_ON(bd1->bd_disk == NULL);
-	BUG_ON(bd2->bd_disk == NULL);
+	if (!b1 && !b2)
+		return 0;
 
 	if (!b1 || !b2)
-		return 0;
+		return -1;
 
 	if (b1->sector_size != b2->sector_size) {
 		printk(KERN_ERR "%s: %s/%s sector sz %u != %u\n", __func__,
-		       bd1->bd_disk->disk_name, ...
From: Jens Axboe
Date: Thursday, October 2, 2008 - 3:56 am

As far as I can tell, most of that commit is still fine. You want
bdev_get_integrity() in blkdev.h, the 3 other moves and the unused
bdev_get_tag_size() do not look like they are being used by this patch
set.

Correct?

-- 
Jens Axboe

--

From: Martin K. Petersen
Date: Thursday, October 2, 2008 - 6:54 am

>>>>> "Jens" == Jens Axboe <jens.axboe@oracle.com> writes:

Jens> As far as I can tell, most of that commit is still fine. You
Jens> want bdev_get_integrity() in blkdev.h, the 3 other moves and the
Jens> unused bdev_get_tag_size() do not look like they are being used
Jens> by this patch set.

bdev_get_integrity() and bdev_get_tag_size() are being used by
stacking drivers and filesystems to prepare I/O.  It's correct that
none of the in-tree stuff currently uses bdev_get_tag_size().  That's
coming with the btrfs support.  If you want to pull that out for now
and have me put that back later in that's ok.  Just adds another
two-stage merge dependency for a later cycle.

bdev_integrity_enabled() and blk_integrity_tuple_size() are only being
used from within bio-integrity.c and can move there.  I originally put
them in blkdev.h because they are block device functions and not bio
ditto.

Want me to submit a new patch shuffling bdev_get_integrity() back
where it came from?

-- 
Martin K. Petersen	Oracle Linux Engineering
--

From: Jens Axboe
Date: Thursday, October 2, 2008 - 9:29 am

Well, I would not have added it in the first place, but it was there. I
already did the bdev_get_integrity() addon instead of the revert, so

Do we need any on top of current for-2.6.28?

I'll apply your series with the modified patch #5, it'll probably need a
hand edit or two since I didn't revert the commit in question, but
should be trivial to resolve.


-- 
Jens Axboe

--

Previous thread: none

Next thread: [PATCH 0/6] memcg update v6 (for review and discuss) by KAMEZAWA Hiroyuki on Wednesday, October 1, 2008 - 12:52 am. (26 messages)