Re: INITIO scsi driver fails to work properly

Previous thread: Re: [PATCH 2.6.24-rc5-mm 3/3] gpiolib: obsolete drivers/i2c/chips/pca9539.c by eric miao on Wednesday, December 19, 2007 - 1:45 am. (3 messages)

Next thread: Re: [RFC/PATCH 2/8] revoke: inode revoke lock V7 by Pekka J Enberg on Wednesday, December 19, 2007 - 2:02 am. (1 message)
From: Filippos Papadopoulos
Date: Wednesday, December 19, 2007 - 1:48 am

I tried patch[2] (addition of   sg++)  at 2.6.24-rc5-mm1 but the
system hangs after some seconds when the initio driver loads.
I will try patch[1] next week to see what happens.

--

From: Boaz Harrosh
Date: Wednesday, December 19, 2007 - 3:08 am

Please first pull from scsi-rc-fixes git-tree first. it has a couple
of other fixes for initio plus patch[2] included.
(maybe its already in -mm tree I'm not sure).
I would prefer linux-scsi ml

<snip>

Boaz
--

From: Matthew Wilcox
Date: Wednesday, December 19, 2007 - 6:29 am

No, it wouldn't.  Bugzilla is a place where bug reports go to be
ignored.  Witness 9370 where despite my best efforts to move discussion
to the mailing list, it's been thoroughly ignored because the original
reporte insists on posting additional information there instead of to
the mailing list.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: James Bottomley
Date: Wednesday, December 19, 2007 - 9:50 am

That's a bit harsh on bugzilla.  It is of use to people whose job it is
to track outstanding bugs.

However, Matthew is completely correct, it's useless for getting bugs
fixed *if* the information isn't on the mailing list.  The reason for
using mailing list is the more eyes principle:  if you email linux-scsi,
all the SCSI experts will see it, not just the one email listed as owner
in bugzilla.  Likewise, as the bug goes through analysis, if it turns
out to be in a different area, that areas mailing list can be added to
the Cc list.

So, to get the best of both worlds, file a bugzilla and note the bugid.
Then email a complete report to the relevant list, but add [BUG <bugid>]
to the subject line and cc bugme-daemon@bugzilla.kernel.org  If you do
this, bugzilla will keep track of the entire discussion as it progresses
and allow those who track bugs through bugzilla to get a pretty accurate
idea of the status.  You should never need to touch bugzilla again once
the initial bug report is filed: all future information flow is via the
mailing lists.

James


--

From: Matthew Wilcox
Date: Wednesday, December 19, 2007 - 10:05 am

The problem is that it appears to the casual observer as if they can
then add information to the bug through the web interface.  But that
information will never be forwarded to the mailing list.  Unless there's
a way of marking bugs as 'unchangable through the web interface' or 'all
messages appended to this bug need to be forwarded', Bugzilla just
doesn't fit our needs.

The Debian BTS fits our way of working much better.  Perhaps somebody
should investigate a migration.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Natalie Protasevich
Date: Thursday, December 20, 2007 - 2:32 am

This is excellent observation by Matthew and James. There is no magic
in bugzilla not being loved, it is just "not the right set of features
for effective work on a problem". It doesn't support multiple
developer' collaboration well.
This distaste is not universal, since some people don't have a problem
with bugzilla as is, maybe those who tend to work on problems
"alone"...
But making it to be a workable tool for everyone is definitely worth it.
Any other favorite bugzillas that are nice to work with and that have
--

From: James Bottomley
Date: Thursday, December 20, 2007 - 8:08 am

We have actually been trying for over two years to get bugzilla fixed so
that it suits our email and list publishing workflow for fixing bugs.  I
surmise that 90% of our problems with bugzilla could be solved if it
simply tipped a SCSI bug report onto the SCSI list when it was created
in such a way that all replies were gathered back into bugzilla.
Unfortunately, no-one who maintains our bugzilla has actually been able
to make this happen.  The other 10% of the problem is that bugzilla
doesn't seem to have a way properly to integrate people who insist on
using its web interface to reply into the email flow.

James


--

From: Chuck Ebbert
Date: Friday, December 21, 2007 - 12:30 pm

There is also a Fedora bug report against 2.6.23. The user has
applied commit e9e42faf47255274a1ed0b9bf1c46118023ec5fa from
2.6.24-rc plus the two additional fixes under discussion and it
hangs for him too.

https://bugzilla.redhat.com/show_bug.cgi?id=390531
--

From: James Bottomley
Date: Friday, December 21, 2007 - 2:03 pm

It really sounds like there's some problem applying the patches.  The
consistent report throughout is this one:

initio: I/O port range 0x0 is busy.

Which should be fixed by 99f1f534922a2f2251ba05b14657a1c62882a80e.  I
didn't actually find that in the bug thread anywhere, but maybe I missed
it?


--

From: Chuck Ebbert
Date: Friday, December 21, 2007 - 3:43 pm

The "I/O port 0" bug just prints the message and the system continues
to run. It's only after that is fixed that the system just hangs on
boot shortly after loading the driver.
--

From: James Bottomley
Date: Friday, December 21, 2007 - 3:49 pm

That should happen unless the PCI BAR is genuinely misconfigured; it's
saying we got zero when we requested the starting address of BAR0.  What
does lspci -vv show for this device?

James


--

From: James Bottomley
Date: Thursday, January 10, 2008 - 10:16 pm

This proves the BAR0 to be non zero, but I also take it from your report
that the

initio: I/O port range 0x0 is busy.


I think there's still one remaining bug from the sg_list conversion,
namely that cblk->sglen is never set, but it is used to count the number
of elements in the sg array.  Could you try this patch (on top of
everything else) and see if the problem is finally fixed?

Thanks,

James

---
diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c
index 01bf018..d038459 100644
--- a/drivers/scsi/initio.c
+++ b/drivers/scsi/initio.c
@@ -2603,6 +2603,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c
 	nseg = scsi_dma_map(cmnd);
 	BUG_ON(nseg < 0);
 	if (nseg) {
+		cblk->sglen = nseg;
 		dma_addr = dma_map_single(&host->pci_dev->dev, &cblk->sglist[0],
 					  sizeof(struct sg_entry) * TOTAL_SG_ENTRY,
 					  DMA_BIDIRECTIONAL);


--

From: James Bottomley
Date: Friday, January 11, 2008 - 8:44 am

Sorry ... we appear to have several reporters of different bugs in this
thread.  That message was copied by Chuck Ebbert from a Red Hat

First off, has this driver ever worked for you in 2.6?  Just booting
SLES9 (2.6.5) or RHEL4 (2.6.9) ... or one of their open equivalents to
check a really old kernel would be helpful.  If you can get it to work,
then we can proceed with a patch reversion regime based on the
assumption that the problem is a recent commit.

Thanks,

James


--

From: Filippos Papadopoulos
Date: Friday, January 11, 2008 - 9:44 am

On Jan 11, 2008 5:44 PM, James Bottomley

Yes it works under 2.6.16.13.  See the beginning of this thread, i
--

From: James Bottomley
Date: Friday, January 11, 2008 - 10:01 am

Could you try with a vanilla 2.6.22 kernel?  The reason for all of this
is that 2.6.22 predates Alan's conversion of this driver (which was my
95% candidate for the source of the bug).  I want you to try the vanilla
kernel just in case the opensuse one contains a backport.

Thanks,

James


--

From: Filippos Papadopoulos
Date: Sunday, January 13, 2008 - 5:28 am

Yes you are right. I compiled the vanilla 2.6.22 and initio driver works.
--

From: FUJITA Tomonori
Date: Tuesday, January 15, 2008 - 10:59 pm

On Tue, 15 Jan 2008 09:16:06 -0600

Can you try this patch?

Thanks,

diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c
index 01bf018..6891d2b 100644
--- a/drivers/scsi/initio.c
+++ b/drivers/scsi/initio.c
@@ -2609,6 +2609,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c
 		cblk->bufptr = cpu_to_le32((u32)dma_addr);
 		cmnd->SCp.dma_handle = dma_addr;
 
+		cblk->sglen = nseg;
 
 		cblk->flags |= SCF_SG;	/* Turn on SG list flag       */
 		total_len = 0;
--

From: James Bottomley
Date: Wednesday, January 16, 2008 - 7:57 am

We already tried a variant of this here:

http://marc.info/?l=linux-scsi&m=120002863806103&w=2

The answer was negative.  Although I've saved the patch because it's
clearly one of the bugs.

James


--

From: Alan Cox
Date: Monday, January 21, 2008 - 3:20 pm

Ok my attempt to get the card failed so we are going to have to do this
the hard way. See where this patch crashes and what it prints

(On top of the other patches)

diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.24-rc8-mm1/drivers/scsi/initio.c linux-2.6.24-rc8-mm1/drivers/scsi/initio.c
--- linux.vanilla-2.6.24-rc8-mm1/drivers/scsi/initio.c	2008-01-19 14:22:43.000000000 +0000
+++ linux-2.6.24-rc8-mm1/drivers/scsi/initio.c	2008-01-21 14:54:48.000000000 +0000
@@ -2537,10 +2537,12 @@
 	struct Scsi_Host *dev = dev_id;
 	unsigned long flags;
 	int r;
-	
+
+	printk("ISR\n");	
 	spin_lock_irqsave(dev->host_lock, flags);
 	r = initio_isr((struct initio_host *)dev->hostdata);
 	spin_unlock_irqrestore(dev->host_lock, flags);
+	printk("ISR DONE %d\n", r);
 	if (r)
 		return IRQ_HANDLED;
 	else
@@ -2643,6 +2645,7 @@
 	struct initio_host *host = (struct initio_host *) cmd->device->host->hostdata;
 	struct scsi_ctrl_blk *cmnd;
 
+	printk("SCB QUEUE\n");
 	cmd->scsi_done = done;
 
 	cmnd = initio_alloc_scb(host);
@@ -2650,7 +2653,9 @@
 		return SCSI_MLQUEUE_HOST_BUSY;
 
 	initio_build_scb(host, cmnd, cmd);
+	printk("SCB EXEC\n");
 	initio_exec_scb(host, cmnd);
+	printk("SCB EXEC DONE\n");
 	return 0;
 }
 
@@ -2766,6 +2771,8 @@
 	struct scsi_cmnd *cmnd;	/* Pointer to SCSI request block */
 	struct initio_host *host;
 	struct scsi_ctrl_blk *cblk;
+	
+	printk("SCB POST\n");
 
 	host = (struct initio_host *) host_mem;
 	cblk = (struct scsi_ctrl_blk *) cblk_mem;
@@ -2934,9 +2941,11 @@
 
 	pci_set_drvdata(pdev, shost);
 
+	printk("SAH\n");
 	error = scsi_add_host(shost, &pdev->dev);
 	if (error)
 		goto out_free_irq;
+	printk("SSH\n");
 	scsi_scan_host(shost);
 	return 0;
 out_free_irq:
--

From: Filippos Papadopoulos
Date: Tuesday, January 22, 2008 - 10:50 am

I get the following:
SAH
SSH
SCB Q
SCB EXEC
SCB EXEC DONE

After ~3 secs the system freezes.




--

From: James Bottomley
Date: Friday, January 25, 2008 - 9:49 am

Actually, I suspect your issues should be fixed by this patch:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e2d435ea40...

Could you download 2.6.24 and try it out to see if they are?

Thanks,

James


--

From: Filippos Papadopoulos
Date: Friday, January 25, 2008 - 2:04 pm

Well, 2.6.24 fixes the problem.
Thanks to all of you!
--

From: Alan Cox
Date: Friday, January 11, 2008 - 10:01 am

It worked (ish.. it has problems and always has had) before the big
updates, and according to my tester after the big update + two patches
that escaped somewhere in the process. Unfortunately my tester no longer
has the card to dig further.

The 0x0 bug was fixed a while ago but seems to have sat in -mm for a bit.
Don't know about further stuff.
--

From: James Bottomley
Date: Friday, January 11, 2008 - 10:33 am

The statement that OpenSuse 10.3, based on 2.6.22.5, also fails
indicates there may be something else that predates your reorganisation
at the root of this (depending on whether the vendor kernel contains a
back port or not).  That's why I want to see what happens on this system
with a vanilla 2.6.22

James


--

From: Chuck Ebbert
Date: Friday, January 11, 2008 - 10:52 am

Our reporter has applied patches since then and now reports the exact
same symptoms that Filippos does. (It just hangs after loading the driver.)
--

Previous thread: Re: [PATCH 2.6.24-rc5-mm 3/3] gpiolib: obsolete drivers/i2c/chips/pca9539.c by eric miao on Wednesday, December 19, 2007 - 1:45 am. (3 messages)

Next thread: Re: [RFC/PATCH 2/8] revoke: inode revoke lock V7 by Pekka J Enberg on Wednesday, December 19, 2007 - 2:02 am. (1 message)