This patch series contains block layer integrity updates for 2.6.28.
There are a few fixes to the core bits, including a switch from
block_device to gendisk in the integrity compare function. I also had
to add a helper to access a gendisk's integrity profile. This is used
by MD.
The DM bits have been cleaned up and a separate dm_table_set_integrity
function has been introduced. This allows us to avoid changing
dm_table_set_restrictions() and its callers.
The MD patch now correctly handles devices that come and go (as
requested by Neil).
The patches are against for-2.6.28 and depend on
d7533ad0e132f92e75c1b2eb7c26387b25a583c1
being reverted.
--
Martin K. Petersen Oracle Linux Engineering
--
- kobject_del already puts the parent. - Set integrity profile to NULL to prevent stale data. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- block/blk-integrity.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 69023da..e3817a0 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -376,7 +376,7 @@ void blk_integrity_unregister(struct gendisk *disk) kobject_uevent(&bi->kobj, KOBJ_REMOVE); kobject_del(&bi->kobj); - kobject_put(&disk_to_dev(disk)->kobj); kmem_cache_free(integrity_cachep, bi); + disk->integrity = NULL; } EXPORT_SYMBOL(blk_integrity_unregister); -- 1.6.0.2 --
A filesystem might supply its own integrity metadata. Introduce a
flag that indicates whether the filesystem or the block layer owns the
integrity buffer.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
fs/bio-integrity.c | 3 ++-
include/linux/bio.h | 1 +
2 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index c3e174b..8e409ff 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -107,7 +107,8 @@ void bio_integrity_free(struct bio *bio, struct bio_set *bs)
BUG_ON(bip == NULL);
/* A cloned bio doesn't own the integrity metadata */
- if (!bio_flagged(bio, BIO_CLONED) && bip->bip_buf != NULL)
+ if (!bio_flagged(bio, BIO_CLONED) && !bio_flagged(bio, BIO_FS_INTEGRITY)
+ && bip->bip_buf != NULL)
kfree(bip->bip_buf);
mempool_free(bip->bip_vec, bs->bvec_pools[bip->bip_pool]);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6520ee1..6aba97d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -109,6 +109,7 @@ struct bio {
#define BIO_EOPNOTSUPP 7 /* not supported */
#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
/*
--
1.6.0.2
--
Helper function to find the sector offset in a bio given bvec index
and page offset.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
fs/bio.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/bio.h | 1 +
2 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index e56e768..55ee2fd 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1300,6 +1300,40 @@ struct bio_pair *bio_split(struct bio *bi, mempool_t *pool, int first_sectors)
return bp;
}
+/**
+ * bio_sector_offset - Find hardware sector offset in bio
+ * @bio: bio to inspect
+ * @index: bio_vec index
+ * @offset: offset in bv_page
+ *
+ * Return the number of hardware sectors between beginning of bio
+ * and an end point indicated by a bio_vec index and an offset
+ * within that vector's page.
+ */
+sector_t bio_sector_offset(struct bio *bio, unsigned short index,
+ unsigned int offset)
+{
+ struct bio_vec *bv;
+ unsigned int sector_sz = bio->bi_bdev->bd_disk->queue->hardsect_size;
+ sector_t sectors;
+ int i;
+
+ sectors = 0;
+
+ BUG_ON(index >= bio->bi_vcnt);
+
+ bio_for_each_segment(bv, bio, i) {
+ if (i == index) {
+ if (offset > bv->bv_offset)
+ sectors += (offset - bv->bv_offset) / sector_sz;
+ return sectors;
+ }
+
+ sectors += bv->bv_len / sector_sz;
+ }
+
+ BUG();
+}
/*
* create memory pools for biovec's in a bio_set.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6aba97d..386a1df 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -327,6 +327,7 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
+extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, ...Too may BUG's there, me thinks, and the interface looks fragile. What if
the bio is alreday done (bi_idx == bi_vcnt)? I'd prefer something ala:
sector_t bio_sector_offset(struct bio *bio, unsigned short index,
unsigned int offset)
{
unsigned int sector_sz = queue_hardsect_size(bio->bi_bdev->bd_disk->queue);
struct bio_vec *bv;
sector_t sectors;
int i;
sectors = 0;
/* total start of bio current start? */
#if 0
if (index >= bio->bi_vcnt)
index = bio->bi_vcnt - 1;
__bio_for_each_segment(bv, bio, i, 0) {
#else
if (index > bio->bi_idx)
return 0;
bio_for_each_segment(bv, bio, i) {
#endif
if (i == index) {
if (offset > bv->bv_offset)
sectors += (offset - bv->bv_offset) / sector_sz;
break;
}
sectors += bv->bv_len / sector_sz;
}
return sectors;
}
Depending on which interface you want, pick one of the if/else parts :-)
--
Jens Axboe
--
>>>>> "Jens" == Jens Axboe <jens.axboe@oracle.com> writes:
Jens> Too may BUG's there, me thinks, and the interface looks
Jens> fragile. What if the bio is alreday done (bi_idx == bi_vcnt)?
Good point! Obviously I've only used the function for bio splitting
and not at completion time. Updated patch below.
block: Find bio sector offset given idx and offset
Helper function to find the sector offset in a bio given bvec index
and page offset.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
fs/bio.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/bio.h | 1 +
2 files changed, 37 insertions(+)
diff --git a/fs/bio.c b/fs/bio.c
index e56e768..a5af580 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1300,6 +1300,42 @@ struct bio_pair *bio_split(struct bio *bi, mempool_t *pool, int first_sectors)
return bp;
}
+/**
+ * bio_sector_offset - Find hardware sector offset in bio
+ * @bio: bio to inspect
+ * @index: bio_vec index
+ * @offset: offset in bv_page
+ *
+ * Return the number of hardware sectors between beginning of bio
+ * and an end point indicated by a bio_vec index and an offset
+ * within that vector's page.
+ */
+sector_t bio_sector_offset(struct bio *bio, unsigned short index,
+ unsigned int offset)
+{
+ unsigned int sector_sz = queue_hardsect_size(bio->bi_bdev->bd_disk->queue);
+ struct bio_vec *bv;
+ sector_t sectors;
+ int i;
+
+ sectors = 0;
+
+ if (index >= bio->bi_idx)
+ index = bio->bi_vcnt - 1;
+
+ __bio_for_each_segment(bv, bio, i, 0) {
+ if (i == index) {
+ if (offset > bv->bv_offset)
+ sectors += (offset - bv->bv_offset) / sector_sz;
+ break;
+ }
+
+ sectors += bv->bv_len / sector_sz;
+ }
+
+ return sectors;
+}
+EXPORT_SYMBOL(bio_sector_offset);
/*
* create memory pools for biovec's in a bio_set.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6aba97d..386a1df 100644
--- ...Looks good, I've applied patches 1-5. -- Jens Axboe --
If all subdevices support the same protection format the MD device is
flagged as capable.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/md/md.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 45521da..35542f2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1411,6 +1411,38 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
static LIST_HEAD(pending_raid_disks);
+static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
+{
+ struct mdk_personality *pers = mddev->pers;
+ struct gendisk *disk = mddev->gendisk;
+ struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
+ struct blk_integrity *bi_mddev = blk_get_integrity(disk);
+
+ /* Data integrity passthrough not supported on RAID 4, 5 and 6 */
+ if (pers && pers->level >= 4 && pers->level <= 6)
+ return;
+
+ /* If rdev is integrity capable, register profile for mddev */
+ if (!bi_mddev && bi_rdev) {
+ if (blk_integrity_register(disk, bi_rdev))
+ printk(KERN_ERR "%s: %s Could not register integrity!\n",
+ __func__, disk->disk_name);
+ else
+ printk(KERN_NOTICE "Enabling data integrity on %s\n",
+ disk->disk_name);
+ return;
+ }
+
+ /* Check that mddev and rdev have matching profiles */
+ if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
+ printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
+ disk->disk_name, rdev->bdev->bd_disk->disk_name);
+ printk(KERN_NOTICE "Disabling data integrity on %s\n",
+ disk->disk_name);
+ blk_integrity_unregister(disk);
+ }
+}
+
static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
{
char b[BDEVNAME_SIZE];
@@ -1471,6 +1503,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
}
list_add_rcu(&rdev->same_set, &mddev->disks);
bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, ...This is a wrapper for accessing a gendisk's integrity bits. It allows
the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
include/linux/blkdev.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eec0842..ca08439 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1027,6 +1027,11 @@ static inline struct blk_integrity *bdev_get_integrity(struct block_device *bdev
return bdev->bd_disk->integrity;
}
+static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
+{
+ return disk->integrity;
+}
+
static inline unsigned int bdev_get_tag_size(struct block_device *bdev)
{
struct blk_integrity *bi = bdev_get_integrity(bdev);
@@ -1069,6 +1074,7 @@ static inline int blk_integrity_rq(struct request *rq)
#define blk_rq_count_integrity_sg(a) (0)
#define blk_rq_map_integrity_sg(a, b) (0)
#define bdev_get_integrity(a) (0)
+#define blk_get_integrity(a) (0)
#define bdev_get_tag_size(a) (0)
#define blk_integrity_compare(a, b) (0)
#define blk_integrity_register(a, b) (0)
--
1.6.0.2
--
If all subdevices support the same protection format the DM device is
flagged as capable.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/md/dm-table.c | 36 ++++++++++++++++++++++++++++++++++++
drivers/md/dm.c | 17 +++++++++++++++++
drivers/md/dm.h | 1 +
3 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 61f4414..f089404 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -860,7 +860,43 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q)
queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
else
queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, q);
+}
+
+void dm_table_set_integrity(struct dm_table *t, struct mapped_device *md)
+{
+ struct list_head *devices = dm_table_get_devices(t);
+ struct dm_dev *prev, *cur;
+
+ /*
+ * Run through all devices to ensure they have matching
+ * integrity profile
+ */
+ cur = prev = NULL;
+
+ list_for_each_entry(cur, devices, list) {
+
+ if (prev && blk_integrity_compare(prev->bdev->bd_disk,
+ cur->bdev->bd_disk) < 0) {
+ printk(KERN_ERR "%s: %s %s Integrity mismatch!\n",
+ __func__, prev->bdev->bd_disk->disk_name,
+ cur->bdev->bd_disk->disk_name);
+ return;
+ }
+ prev = cur;
+ }
+ /* Register dm device as being integrity capable */
+ if (prev && bdev_get_integrity(prev->bdev)) {
+ struct gendisk *disk = dm_disk(md);
+
+ if (blk_integrity_register(dm_disk(md),
+ bdev_get_integrity(prev->bdev)))
+ printk(KERN_ERR "%s: %s Could not register integrity!\n",
+ __func__, disk->disk_name);
+ else
+ printk(KERN_INFO "Enabling data integrity on %s\n",
+ disk->disk_name);
+ }
}
unsigned int dm_table_get_num_targets(struct dm_table *t)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8d40369..4f65342 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -669,6 +669,13 @@ static struct bio ...The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.
Also update comparison logic so that two NULL profiles are a valid
configuration.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-integrity.c | 28 ++++++++++++++--------------
include/linux/blkdev.h | 2 +-
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index e3817a0..61a8e2f 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -108,51 +108,51 @@ new_segment:
EXPORT_SYMBOL(blk_rq_map_integrity_sg);
/**
- * blk_integrity_compare - Compare integrity profile of two block devices
- * @bd1: Device to compare
- * @bd2: Device to compare
+ * blk_integrity_compare - Compare integrity profile of two disks
+ * @gd1: Disk to compare
+ * @gd2: Disk to compare
*
* Description: Meta-devices like DM and MD need to verify that all
* sub-devices use the same integrity format before advertising to
* upper layers that they can send/receive integrity metadata. This
- * function can be used to check whether two block devices have
+ * function can be used to check whether two gendisk devices have
* compatible integrity formats.
*/
-int blk_integrity_compare(struct block_device *bd1, struct block_device *bd2)
+int blk_integrity_compare(struct gendisk *gd1, struct gendisk *gd2)
{
- struct blk_integrity *b1 = bd1->bd_disk->integrity;
- struct blk_integrity *b2 = bd2->bd_disk->integrity;
+ struct blk_integrity *b1 = gd1->integrity;
+ struct blk_integrity *b2 = gd2->integrity;
- BUG_ON(bd1->bd_disk == NULL);
- BUG_ON(bd2->bd_disk == NULL);
+ if (!b1 && !b2)
+ return 0;
if (!b1 || !b2)
- return 0;
+ return -1;
if (b1->sector_size != b2->sector_size) {
printk(KERN_ERR "%s: %s/%s sector sz %u != %u\n", __func__,
- bd1->bd_disk->disk_name, ...As far as I can tell, most of that commit is still fine. You want bdev_get_integrity() in blkdev.h, the 3 other moves and the unused bdev_get_tag_size() do not look like they are being used by this patch set. Correct? -- Jens Axboe --
>>>>> "Jens" == Jens Axboe <jens.axboe@oracle.com> writes: Jens> As far as I can tell, most of that commit is still fine. You Jens> want bdev_get_integrity() in blkdev.h, the 3 other moves and the Jens> unused bdev_get_tag_size() do not look like they are being used Jens> by this patch set. bdev_get_integrity() and bdev_get_tag_size() are being used by stacking drivers and filesystems to prepare I/O. It's correct that none of the in-tree stuff currently uses bdev_get_tag_size(). That's coming with the btrfs support. If you want to pull that out for now and have me put that back later in that's ok. Just adds another two-stage merge dependency for a later cycle. bdev_integrity_enabled() and blk_integrity_tuple_size() are only being used from within bio-integrity.c and can move there. I originally put them in blkdev.h because they are block device functions and not bio ditto. Want me to submit a new patch shuffling bdev_get_integrity() back where it came from? -- Martin K. Petersen Oracle Linux Engineering --
Well, I would not have added it in the first place, but it was there. I already did the bdev_get_integrity() addon instead of the revert, so Do we need any on top of current for-2.6.28? I'll apply your series with the modified patch #5, it'll probably need a hand edit or two since I didn't revert the commit in question, but should be trivial to resolve. -- Jens Axboe --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
