RE: 2.6.23-rc9 boot failure (megaraid?)

Previous thread: [patch 3/3] Trace sample by David J. Wilder on Tuesday, October 2, 2007 - 12:33 pm. (3 messages)

Next thread: [git patch] libata fix by Jeff Garzik on Tuesday, October 2, 2007 - 1:17 pm. (1 message)
To: <linux-kernel@...>
Date: Tuesday, October 2, 2007 - 12:48 pm

2.6.23-rc9 fails to boot for me; 2.6.22.9 works fine.

System is a Dell Poweredge with PERC 2/DC with RAID1 volume.

From 2.6.23-rc9:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 0000:00:07.1
eth1: Optical link UP (Full Duplex, Flow Control: )
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
hdc: SAMSUNG SC-140B, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: ATAPI 40X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
ACPI: PCI Interrupt 0000:00:0d.1[A] -> GSI 17 (level, low) -> IRQ 18
megaraid: found 0x8086:0x1960:bus 0:slot 13:func 1
scsi0:Found MegaRAID controller at 0xf8812000, IRQ:18
megaraid: [1.06:1p00] detected 1 logical drives.
megaraid: channel[0] is raid.
megaraid: channel[1] is raid.
scsi0 : LSI Logic MegaRAID 1.06 254 commands 16 targs 5 chans 7 luns
scsi0: scanning scsi channel 0 for logical drives.
scsi 0:0:0:0: Direct-Access MegaRAID LD0 RAID1 8568R 1.06 PQ: 0 ANSI: 2
scsi0: scanning scsi channel 4 [P0] for physical devices.
scsi0: scanning scsi channel 5 [P1] for physical devices.
st: Version 20070203, fixed bufsize 32768, s/g segs 256
sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Asking for cache data failed
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Asking for cache data failed
sd 0:0:0:0: [sda] Assuming drive cache: write through
sda: sda1
sda: p1 exceeds device capacity
sd 0:0:0:0: [sda] Attached SCSI disk
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,...

To: Burton Windle <bwindle@...>
Cc: <linux-kernel@...>, Jens Axboe <jens.axboe@...>, FUJITA Tomonori <fujita.tomonori@...>, Sumant Patro <sumant.patro@...>, James Bottomley <James.Bottomley@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 2:15 pm

Cc's added, the complete bug report is at
http://lkml.org/lkml/2007/10/2/243

Thanks for your report.

Diff'ing the dmesg's shows:

<-- snip -->

scsi0: scanning scsi channel 4 [P0] for physical devices.
scsi0: scanning scsi channel 5 [P1] for physical devices.
st: Version 20070203, fixed bufsize 32768, s/g segs 256
-sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB)
+sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
+sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Asking for cache data failed
sd 0:0:0:0: [sda] Assuming drive cache: write through
-sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB)
+sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
+sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Asking for cache data failed
sd 0:0:0:0: [sda] Assuming drive cache: write through
sda: sda1
+ sda: p1 exceeds device capacity

<-- snip -->

Does reverting the commit below fix the problem?

cu
Adrian

commit 3f6270ef76f2ce5c134615a470685d6c2a66c07e
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Mon May 14 20:17:27 2007 +0900

[SCSI] megaraid_old: convert to use the data buffer accessors

- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Sumant Patro <sumant.patro@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 40ee07d..3907f67 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -523,10 +523,8 @@ mega_build_cmd(adapter_t *adapter, Scsi_Cmnd *cmd, int *busy)
/*
* filter...

To: Adrian Bunk <bunk@...>
Cc: Burton Windle <bwindle@...>, <linux-kernel@...>, Jens Axboe <jens.axboe@...>, FUJITA Tomonori <fujita.tomonori@...>, Sumant Patro <sumant.patro@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 4:38 pm

This is the problem piece I think. We've reintroduced a very old bug:

commit 51c928c34fa7cff38df584ad01de988805877dba
Author: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sat Oct 1 09:38:05 2005 -0500

[SCSI] Legacy MegaRAID: Fix READ CAPACITY

Some Legacy megaraid cards can't actually cope with the scatter/gather
version of the READ CAPACITY command (which is what we now send them
since altering all SCSI internal I/O to go via the block layer). Fix
this (and a few other broken megaraid driver assumptions) by sending
the non-sg version of the command if the sg list only has a single
element.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>

So what we have to do is put back the check for use_sg == 1 and send
that as a bulk transfer command.

James

-

To: <James.Bottomley@...>
Cc: <bunk@...>, <bwindle@...>, <linux-kernel@...>, <jens.axboe@...>, <fujita.tomonori@...>, <sumant.patro@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 8:00 pm

On Tue, 02 Oct 2007 15:38:13 -0500

Sorry about this. Can this fix the problem?

Thanks,

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 3907f67..da56163 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -1753,6 +1753,14 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len)

*len = 0;

+ if (scsi_sg_count(cmd) == 1 && !adapter->has_64bit_addr) {
+ sg = scsi_sglist(cmd);
+ scb->dma_h_bulkdata = sg_dma_address(sg);
+ *buf = (u32)scb->dma_h_bulkdata;
+ *len = sg_dma_len(sg);
+ return 0;
+ }
+
scsi_for_each_sg(cmd, sg, sgcnt, idx) {
if (adapter->has_64bit_addr) {
scb->sgl64[idx].address = sg_dma_address(sg);
-

To: FUJITA Tomonori <fujita.tomonori@...>, <James.Bottomley@...>
Cc: <bunk@...>, <bwindle@...>, <linux-kernel@...>, <jens.axboe@...>, DL-MegaRAID Linux <megaraidlinux@...>, <linux-scsi@...>
Date: Wednesday, October 3, 2007 - 7:32 pm

With this patch I see the correct logical disk size reported.
Thanks.

Sumant
-

To: <Sumant.Patro@...>
Cc: <fujita.tomonori@...>, <James.Bottomley@...>, <bunk@...>, <bwindle@...>, <linux-kernel@...>, <jens.axboe@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Wednesday, October 3, 2007 - 7:46 pm

On Wed, 3 Oct 2007 17:32:55 -0600

Great, thanks for testing!

Can you try the following patch instead of the above patch?

http://marc.info/?l=linux-scsi&m=119137033016550&w=2

I know the changes are pretty trivial and it should work...
-

To: FUJITA Tomonori <fujita.tomonori@...>
Cc: <Sumant.Patro@...>, <James.Bottomley@...>, <bunk@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 3:28 am

Tomo, this is the patch I added.

--
Jens Axboe

-

To: Jens Axboe <jens.axboe@...>
Cc: FUJITA Tomonori <fujita.tomonori@...>, <Sumant.Patro@...>, <James.Bottomley@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 6:48 am

Please excuse my comment in case this was already clear:

You are aware that this bug is a regression in 2.6.23-rc and the patch

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: <bunk@...>
Cc: <jens.axboe@...>, <fujita.tomonori@...>, <Sumant.Patro@...>, <James.Bottomley@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 6:55 am

On Thu, 4 Oct 2007 12:48:58 +0200

Oops, you are right. This should go via scsi-rc-fixes tree ASAP.
-

To: FUJITA Tomonori <fujita.tomonori@...>
Cc: <bunk@...>, <Sumant.Patro@...>, <James.Bottomley@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 7:00 am

Irk, the scsi accessor stuff is already in, I forgot and thought it was
pending for 2.6.24. So rush the patch upstream please!

--
Jens Axboe

-

To: <jens.axboe@...>
Cc: <fujita.tomonori@...>, <Sumant.Patro@...>, <James.Bottomley@...>, <bunk@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 6:20 am

On Thu, 4 Oct 2007 09:28:34 +0200

Thanks. I thought that it will be sent via scsi-misc because the scsi
accessor patch introduced this bug. But either is ok with me.

BTW, please add my sign-off.

-
[SCSI] megaraid_old: fix scatter/gather for legacy megaraid cards

Some legacy megaraid cards (!has_64bit_addr case) can't cope with the
catter/gather version of the READ CAPACITY command. We need to send
the non-sg version of the command if the sg list only as a single
element.

commit 3f6270ef76f2ce5c134615a470685d6c2a66c07e reintroduced this bug,
which was fixed long ago (commit 51c928c34fa7cff38df584ad01de988805877dba).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
-

To: FUJITA Tomonori <fujita.tomonori@...>
Cc: <Sumant.Patro@...>, <James.Bottomley@...>, <bunk@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 6:36 am

If it only affects the driver _after_ the scsi accessor patch and as
such doesn't screw over git-block, then I'll drop it for sure.

--
Jens Axboe

-

To: Jens Axboe <jens.axboe@...>
Cc: FUJITA Tomonori <fujita.tomonori@...>, <Sumant.Patro@...>, <bunk@...>, <bwindle@...>, <linux-kernel@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Thursday, October 4, 2007 - 8:50 am

No, this is a release critical fix ... I'll roll it up and send it in
for 2.6.23.

James

-

To: <James.Bottomley@...>
Cc: <bunk@...>, <bwindle@...>, <linux-kernel@...>, <jens.axboe@...>, <fujita.tomonori@...>, <sumant.patro@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 8:09 pm

On Tue, 02 Oct 2007 15:38:13 -0500

Sorry again. Needs to check sg count before dma mapping.

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 3907f67..ae0b220 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -1737,9 +1737,12 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len)
Scsi_Cmnd *cmd;
int sgcnt;
int idx;
+ int bulkdata;

cmd = scb->cmd;

+ bulkdata = (scsi_sg_count(cmd) == 1) ? 1 : 0;
+
/*
* Copy Scatter-Gather list info into controller structure.
*
@@ -1753,6 +1756,14 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len)

*len = 0;

+ if (bulkdata && !adapter->has_64bit_addr) {
+ sg = scsi_sglist(cmd);
+ scb->dma_h_bulkdata = sg_dma_address(sg);
+ *buf = (u32)scb->dma_h_bulkdata;
+ *len = sg_dma_len(sg);
+ return 0;
+ }
+
scsi_for_each_sg(cmd, sg, sgcnt, idx) {
if (adapter->has_64bit_addr) {
scb->sgl64[idx].address = sg_dma_address(sg);
-

To: Adrian Bunk <bunk@...>
Cc: <linux-kernel@...>, Jens Axboe <jens.axboe@...>, FUJITA Tomonori <fujita.tomonori@...>, Sumant Patro <sumant.patro@...>, James Bottomley <James.Bottomley@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 2:46 pm

Confirmed; reverting the above (snipped) patch does fix the issue.

--
Burton Windle bwindle@fint.org

-

To: Burton Windle <bwindle@...>
Cc: Adrian Bunk <bunk@...>, <linux-kernel@...>, Jens Axboe <jens.axboe@...>, FUJITA Tomonori <fujita.tomonori@...>, Sumant Patro <sumant.patro@...>, James Bottomley <James.Bottomley@...>, <megaraidlinux@...>, <linux-scsi@...>
Date: Tuesday, October 2, 2007 - 3:55 pm

I've created a bugzilla entry for your report at:

http://bugzilla.kernel.org/show_bug.cgi?id=9113

Please add a summary of your observations in there.

Greetings,
Rafael
-

Previous thread: [patch 3/3] Trace sample by David J. Wilder on Tuesday, October 2, 2007 - 12:33 pm. (3 messages)

Next thread: [git patch] libata fix by Jeff Garzik on Tuesday, October 2, 2007 - 1:17 pm. (1 message)