softraid(4) disks are of wrong size

Previous thread: VISION WITHOUT GLASSES by Dr. William H. Bates on Saturday, December 18, 2010 - 4:32 pm. (1 message)

Next thread: EkonomiTürk Haber Ajansı by EkonomiTürk Haber Ajansı on Sunday, December 19, 2010 - 7:04 am. (1 message)
From: Andreas Bartelt
Date: Sunday, December 19, 2010 - 6:17 am

Hello,

I've noticed that the size of softraid(4) disks is one sector too large.

In the following description, the native disk is sd4c (which is of type 
RAID), and the corresponding virtual softraid(4) disk will be sd5. I 
will assume that sd5 is of discipline CRYPTO, but the problem should be 
the same for all disciplines.

The size for sd5 gets eventually stored in struct sr_metadata as 
sd_meta->ssdi.ssd_size in function sr_crypto_create 
(dev/softraid_crypto.c). Its value gets initially calculated in function 
sr_meta_native_probe (dev/softraid.c):
size = DL_GETPSIZE(&label.d_partitions[part]) - SR_DATA_OFFSET;
where SR_DATA_OFFSET = 528

If the size (as shown via disklabel(8)) of the underlying disk sd4c is 
'n', the variable 'size' will be calculated as 'n - 528'. However, 
according to disklabel(8), the resulting softraid(4) disk sd5c will be 
of size 'n - 528 + 1'. I think this is because function scsi_size in 
scsi/scsi_base.c returns 'max_addr + 1' where max_addr corresponds to 
sd_meta->ssdi.ssd_size.

For example, newfs(8) on /dev/rsd5c tries to write to the last sector of 
sd5 (first wtfs call in mkfs.c) which is one sector beyond the native 
disk size of sd4. This gives an error if label sd4c was used as the 
native device. It results in a kernel panic in case another label of sd4 
was used as the underlying native device (i.e., sd4a).

The attached diff fixes the problem for all newly created softraid volumes.

TODO: fixing the problem for existing softraid(4) volumes would require 
to update sd_meta->ssdi.ssd_size to the correct size (and probably some 
other metadata for consistency).

Best regards
Andreas
Index: softraid.c
===================================================================
RCS file: /usr/cvsync/cvs/src/sys/dev/softraid.c,v
retrieving revision 1.216
diff -u -r1.216 softraid.c
--- softraid.c	6 Nov 2010 23:01:56 -0000	1.216
+++ softraid.c	19 Dec 2010 13:01:14 -0000
@@ -1386,7 +1386,7 @@
 		goto unwind;
 	}
 
-	size = ...
From: Marco Peereboom
Date: Sunday, December 19, 2010 - 9:15 am

I could swear we had the sizes right but I'll have another look at this.

What raid type did you test this with?


From: Andreas Bartelt
Date: Sunday, December 19, 2010 - 9:47 am

I've only tested CRYPTO, but sr_meta_native_probe() seems to be used by 
all disciplines.

Try newfs /dev/rsdXc where X is a virtual softraid(4) disk in order to 
trigger the test case.

From: Nick Holland
Date: Sunday, December 19, 2010 - 9:51 am

partition 'c' as a RAID partition?  That's not good, is it?
Or am I missing something?

Nick.

From: Otto Moerbeek
Date: Sunday, December 19, 2010 - 10:16 am

It is not agood idea, but since newfs write the last sector of a
partition (as a general consistency check in mkfs.c:203) it is a good
test to see if that last sector is indeed writable. 

	-Otto

From: Ted Unangst
Date: Sunday, December 19, 2010 - 3:32 pm

I can repro.  Crypto raid.

sd0g: size 20986560  this is the real disk

sd1a: 20985952 offset 64
sd1c: 20986033

softraid reserves metaoffset + metasize + loadersize + bootblocks ==
528 sectors.

20986560 - 528 = 20986032

So that's one smaller than the fake disk thinks it is.

Lucky for me, my 'a' partition with the filesystem is only 20985952 +
64 = 20986016 sectors, so it didn't try to use the last few.

Numbers again, minus prefix:

Base disk: 560
Space after SR meta: 32
SR disk's claimed 'c' size: 33

From: David Gwynne
Date: Sunday, December 19, 2010 - 11:26 pm

im pretty sure scsi reports the last addressable sector, as opposed to the
total number of sectors. will have a diff shortly.

dlg


From: David Gwynne
Date: Monday, December 20, 2010 - 1:24 am

compiles. can someone test?

Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.216
diff -u -p -r1.216 softraid.c
--- softraid.c	6 Nov 2010 23:01:56 -0000	1.216
+++ softraid.c	20 Dec 2010 08:21:33 -0000
@@ -3507,21 +3507,22 @@ sr_raid_read_cap(struct sr_workunit *wu)
 	struct scsi_read_cap_data rcd;
 	struct scsi_read_cap_data_16 rcd16;
 	int			rv = 1;
+	int64_t			size = sd->sd_meta->ssdi.ssd_size - 1;
 
 	DNPRINTF(SR_D_DIS, "%s: sr_raid_read_cap\n", DEVNAME(sd->sd_sc));
 
 	if (xs->cmd->opcode == READ_CAPACITY) {
 		bzero(&rcd, sizeof(rcd));
-		if (sd->sd_meta->ssdi.ssd_size > 0xffffffffllu)
+		if (size > 0xffffffffllu)
 			_lto4b(0xffffffff, rcd.addr);
 		else
-			_lto4b(sd->sd_meta->ssdi.ssd_size, rcd.addr);
+			_lto4b(size, rcd.addr);
 		_lto4b(512, rcd.length);
 		sr_copy_internal_data(xs, &rcd, sizeof(rcd));
 		rv = 0;
 	} else if (xs->cmd->opcode == READ_CAPACITY_16) {
 		bzero(&rcd16, sizeof(rcd16));
-		_lto8b(sd->sd_meta->ssdi.ssd_size, rcd16.addr);
+		_lto8b(size, rcd16.addr);
 		_lto4b(512, rcd16.length);
 		sr_copy_internal_data(xs, &rcd16, sizeof(rcd16));
 		rv = 0;

From: Kenneth R Westerback
Date: Monday, December 20, 2010 - 6:06 am

Pretty much identical to the one I sent directly out to marco, Andreas
and tedu. Andreas has reported it worked for him. Mine is below for
completeness. I like my variable name better, but int64_t may be the
better type to use.

.... Ken

Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.216
diff -u -p -r1.216 softraid.c
--- softraid.c	6 Nov 2010 23:01:56 -0000	1.216
+++ softraid.c	19 Dec 2010 21:37:17 -0000
@@ -3506,22 +3506,24 @@ sr_raid_read_cap(struct sr_workunit *wu)
 	struct scsi_xfer	*xs = wu->swu_xs;
 	struct scsi_read_cap_data rcd;
 	struct scsi_read_cap_data_16 rcd16;
+	daddr64_t		addr;
 	int			rv = 1;
 
 	DNPRINTF(SR_D_DIS, "%s: sr_raid_read_cap\n", DEVNAME(sd->sd_sc));
 
+	addr = sd->sd_meta->ssdi.ssd_size - 1;
 	if (xs->cmd->opcode == READ_CAPACITY) {
 		bzero(&rcd, sizeof(rcd));
-		if (sd->sd_meta->ssdi.ssd_size > 0xffffffffllu)
+		if (addr > 0xffffffffllu)
 			_lto4b(0xffffffff, rcd.addr);
 		else
-			_lto4b(sd->sd_meta->ssdi.ssd_size, rcd.addr);
+			_lto4b(addr, rcd.addr);
 		_lto4b(512, rcd.length);
 		sr_copy_internal_data(xs, &rcd, sizeof(rcd));
 		rv = 0;
 	} else if (xs->cmd->opcode == READ_CAPACITY_16) {
 		bzero(&rcd16, sizeof(rcd16));
-		_lto8b(sd->sd_meta->ssdi.ssd_size, rcd16.addr);
+		_lto8b(addr, rcd16.addr);
 		_lto4b(512, rcd16.length);
 		sr_copy_internal_data(xs, &rcd16, sizeof(rcd16));
 		rv = 0;

From: Kenneth R Westerback
Date: Monday, December 20, 2010 - 8:02 am

If anyone can test some rebuilds with this or dlg@'s diff, at least
some RAID 1 rebuilds, that would be very appreciated confirmation
that this doesn't affect any internal logic of softraid.

.... Ken

From: Kenneth R Westerback
Date: Sunday, December 19, 2010 - 2:28 pm

You should NEVER use 'c' partition. Amoung other things, it is
always set by the kernel to encompass the whole disk, everytime the
disklabel is read. If you restore to a different sized disk interesting
things might happen. I thought we also always set it to UNUSED, but
if not I will revisit that.


From: Andreas Bartelt
Date: Monday, December 20, 2010 - 3:50 am

On 12/19/10 22:28, Kenneth R Westerback wrote:

I've used the native disk sd4c as RAID only for testing, since this 
improved my understanding of the problem.

Is there also a problem when newfs is used on the virtual softraid(4) 
disk 'c' partition (rsd5c in my example)?

I originally discovered the problem when I tried newfs(8) on rsd5c where 
the native disk was sd4a. As I wrote in my initial mail, this results in 
a kernel panic. The kernel panics in sr_crypto_finish_io(), and before 
this, bounds_check_with_label (in kern/subr_disk.c) (which gets called 
in sd.c:572 for the softraid(4) disk sd5) returns '-1'. I didn't further 
inspect if this panic could/should be avoided like in the test where the 
native RAID disk was sd4c.

From: Kenneth R Westerback
Date: Monday, December 20, 2010 - 6:09 am

The same issue ('c' being set by kernel) will apply to softraid
devices too. You seem to have discovered a bug, so your test process
was good from that standpoint. :-). But you shouldn't use 'c' for
a filesystem or RAID.

.... Ken

From: Kenneth R Westerback
Date: Monday, December 20, 2010 - 10:26 am

Can you provide the details on the exact panic you saw? Just to make sure
when an issue is reproduced we know we are working on the same problem.

.... Ken

From: Andreas Bartelt
Date: Monday, December 20, 2010 - 10:59 am

On 12/20/10 18:26, Kenneth R Westerback wrote:

Debugger
panic
sr_crypto_finish_io
sr_crypto_intr
sdstrategy
spec_strategy
VOP_STRATEGY
sr_startwu_callback

sd.c:572 calls bounds_check_with_label (in kern/subr_disk.c) which 
returns '-1' in line 698. My assumption was that this eventually leads 
to the panic.

From: Marco Peereboom
Date: Monday, December 20, 2010 - 12:18 pm

I am working on this bug and it is a side effect of wrapping VOP_*
calls.  krw meanwhile commited a fix for the bug that exposed the panic
and should make your test case happy.  The underlying bug needs more
work though but I am working on it.


Previous thread: VISION WITHOUT GLASSES by Dr. William H. Bates on Saturday, December 18, 2010 - 4:32 pm. (1 message)

Next thread: EkonomiTürk Haber Ajansı by EkonomiTürk Haber Ajansı on Sunday, December 19, 2010 - 7:04 am. (1 message)