Hello, I've noticed that the size of softraid(4) disks is one sector too large. In the following description, the native disk is sd4c (which is of type RAID), and the corresponding virtual softraid(4) disk will be sd5. I will assume that sd5 is of discipline CRYPTO, but the problem should be the same for all disciplines. The size for sd5 gets eventually stored in struct sr_metadata as sd_meta->ssdi.ssd_size in function sr_crypto_create (dev/softraid_crypto.c). Its value gets initially calculated in function sr_meta_native_probe (dev/softraid.c): size = DL_GETPSIZE(&label.d_partitions[part]) - SR_DATA_OFFSET; where SR_DATA_OFFSET = 528 If the size (as shown via disklabel(8)) of the underlying disk sd4c is 'n', the variable 'size' will be calculated as 'n - 528'. However, according to disklabel(8), the resulting softraid(4) disk sd5c will be of size 'n - 528 + 1'. I think this is because function scsi_size in scsi/scsi_base.c returns 'max_addr + 1' where max_addr corresponds to sd_meta->ssdi.ssd_size. For example, newfs(8) on /dev/rsd5c tries to write to the last sector of sd5 (first wtfs call in mkfs.c) which is one sector beyond the native disk size of sd4. This gives an error if label sd4c was used as the native device. It results in a kernel panic in case another label of sd4 was used as the underlying native device (i.e., sd4a). The attached diff fixes the problem for all newly created softraid volumes. TODO: fixing the problem for existing softraid(4) volumes would require to update sd_meta->ssdi.ssd_size to the correct size (and probably some other metadata for consistency). Best regards Andreas Index: softraid.c =================================================================== RCS file: /usr/cvsync/cvs/src/sys/dev/softraid.c,v retrieving revision 1.216 diff -u -r1.216 softraid.c --- softraid.c 6 Nov 2010 23:01:56 -0000 1.216 +++ softraid.c 19 Dec 2010 13:01:14 -0000 @@ -1386,7 +1386,7 @@ goto unwind; } - size = ...
I could swear we had the sizes right but I'll have another look at this. What raid type did you test this with?
I've only tested CRYPTO, but sr_meta_native_probe() seems to be used by all disciplines. Try newfs /dev/rsdXc where X is a virtual softraid(4) disk in order to trigger the test case.
partition 'c' as a RAID partition? That's not good, is it? Or am I missing something? Nick.
It is not agood idea, but since newfs write the last sector of a partition (as a general consistency check in mkfs.c:203) it is a good test to see if that last sector is indeed writable. -Otto
I can repro. Crypto raid. sd0g: size 20986560 this is the real disk sd1a: 20985952 offset 64 sd1c: 20986033 softraid reserves metaoffset + metasize + loadersize + bootblocks == 528 sectors. 20986560 - 528 = 20986032 So that's one smaller than the fake disk thinks it is. Lucky for me, my 'a' partition with the filesystem is only 20985952 + 64 = 20986016 sectors, so it didn't try to use the last few. Numbers again, minus prefix: Base disk: 560 Space after SR meta: 32 SR disk's claimed 'c' size: 33
im pretty sure scsi reports the last addressable sector, as opposed to the total number of sectors. will have a diff shortly. dlg
compiles. can someone test?
Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.216
diff -u -p -r1.216 softraid.c
--- softraid.c 6 Nov 2010 23:01:56 -0000 1.216
+++ softraid.c 20 Dec 2010 08:21:33 -0000
@@ -3507,21 +3507,22 @@ sr_raid_read_cap(struct sr_workunit *wu)
struct scsi_read_cap_data rcd;
struct scsi_read_cap_data_16 rcd16;
int rv = 1;
+ int64_t size = sd->sd_meta->ssdi.ssd_size - 1;
DNPRINTF(SR_D_DIS, "%s: sr_raid_read_cap\n", DEVNAME(sd->sd_sc));
if (xs->cmd->opcode == READ_CAPACITY) {
bzero(&rcd, sizeof(rcd));
- if (sd->sd_meta->ssdi.ssd_size > 0xffffffffllu)
+ if (size > 0xffffffffllu)
_lto4b(0xffffffff, rcd.addr);
else
- _lto4b(sd->sd_meta->ssdi.ssd_size, rcd.addr);
+ _lto4b(size, rcd.addr);
_lto4b(512, rcd.length);
sr_copy_internal_data(xs, &rcd, sizeof(rcd));
rv = 0;
} else if (xs->cmd->opcode == READ_CAPACITY_16) {
bzero(&rcd16, sizeof(rcd16));
- _lto8b(sd->sd_meta->ssdi.ssd_size, rcd16.addr);
+ _lto8b(size, rcd16.addr);
_lto4b(512, rcd16.length);
sr_copy_internal_data(xs, &rcd16, sizeof(rcd16));
rv = 0;
Pretty much identical to the one I sent directly out to marco, Andreas
and tedu. Andreas has reported it worked for him. Mine is below for
completeness. I like my variable name better, but int64_t may be the
better type to use.
.... Ken
Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.216
diff -u -p -r1.216 softraid.c
--- softraid.c 6 Nov 2010 23:01:56 -0000 1.216
+++ softraid.c 19 Dec 2010 21:37:17 -0000
@@ -3506,22 +3506,24 @@ sr_raid_read_cap(struct sr_workunit *wu)
struct scsi_xfer *xs = wu->swu_xs;
struct scsi_read_cap_data rcd;
struct scsi_read_cap_data_16 rcd16;
+ daddr64_t addr;
int rv = 1;
DNPRINTF(SR_D_DIS, "%s: sr_raid_read_cap\n", DEVNAME(sd->sd_sc));
+ addr = sd->sd_meta->ssdi.ssd_size - 1;
if (xs->cmd->opcode == READ_CAPACITY) {
bzero(&rcd, sizeof(rcd));
- if (sd->sd_meta->ssdi.ssd_size > 0xffffffffllu)
+ if (addr > 0xffffffffllu)
_lto4b(0xffffffff, rcd.addr);
else
- _lto4b(sd->sd_meta->ssdi.ssd_size, rcd.addr);
+ _lto4b(addr, rcd.addr);
_lto4b(512, rcd.length);
sr_copy_internal_data(xs, &rcd, sizeof(rcd));
rv = 0;
} else if (xs->cmd->opcode == READ_CAPACITY_16) {
bzero(&rcd16, sizeof(rcd16));
- _lto8b(sd->sd_meta->ssdi.ssd_size, rcd16.addr);
+ _lto8b(addr, rcd16.addr);
_lto4b(512, rcd16.length);
sr_copy_internal_data(xs, &rcd16, sizeof(rcd16));
rv = 0;
If anyone can test some rebuilds with this or dlg@'s diff, at least some RAID 1 rebuilds, that would be very appreciated confirmation that this doesn't affect any internal logic of softraid. .... Ken
You should NEVER use 'c' partition. Amoung other things, it is always set by the kernel to encompass the whole disk, everytime the disklabel is read. If you restore to a different sized disk interesting things might happen. I thought we also always set it to UNUSED, but if not I will revisit that.
On 12/19/10 22:28, Kenneth R Westerback wrote: I've used the native disk sd4c as RAID only for testing, since this improved my understanding of the problem. Is there also a problem when newfs is used on the virtual softraid(4) disk 'c' partition (rsd5c in my example)? I originally discovered the problem when I tried newfs(8) on rsd5c where the native disk was sd4a. As I wrote in my initial mail, this results in a kernel panic. The kernel panics in sr_crypto_finish_io(), and before this, bounds_check_with_label (in kern/subr_disk.c) (which gets called in sd.c:572 for the softraid(4) disk sd5) returns '-1'. I didn't further inspect if this panic could/should be avoided like in the test where the native RAID disk was sd4c.
The same issue ('c' being set by kernel) will apply to softraid
devices too. You seem to have discovered a bug, so your test process
was good from that standpoint. :-). But you shouldn't use 'c' for
a filesystem or RAID.
.... Ken
Can you provide the details on the exact panic you saw? Just to make sure when an issue is reproduced we know we are working on the same problem. .... Ken
On 12/20/10 18:26, Kenneth R Westerback wrote: Debugger panic sr_crypto_finish_io sr_crypto_intr sdstrategy spec_strategy VOP_STRATEGY sr_startwu_callback sd.c:572 calls bounds_check_with_label (in kern/subr_disk.c) which returns '-1' in line 698. My assumption was that this eventually leads to the panic.
I am working on this bug and it is a side effect of wrapping VOP_* calls. krw meanwhile commited a fix for the bug that exposed the panic and should make your test case happy. The underlying bug needs more work though but I am working on it.
