I tried patch[2] (addition of sg++) at 2.6.24-rc5-mm1 but the system hangs after some seconds when the initio driver loads. I will try patch[1] next week to see what happens. --
Please first pull from scsi-rc-fixes git-tree first. it has a couple of other fixes for initio plus patch[2] included. (maybe its already in -mm tree I'm not sure). I would prefer linux-scsi ml <snip> Boaz --
No, it wouldn't. Bugzilla is a place where bug reports go to be ignored. Witness 9370 where despite my best efforts to move discussion to the mailing list, it's been thoroughly ignored because the original reporte insists on posting additional information there instead of to the mailing list. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
That's a bit harsh on bugzilla. It is of use to people whose job it is to track outstanding bugs. However, Matthew is completely correct, it's useless for getting bugs fixed *if* the information isn't on the mailing list. The reason for using mailing list is the more eyes principle: if you email linux-scsi, all the SCSI experts will see it, not just the one email listed as owner in bugzilla. Likewise, as the bug goes through analysis, if it turns out to be in a different area, that areas mailing list can be added to the Cc list. So, to get the best of both worlds, file a bugzilla and note the bugid. Then email a complete report to the relevant list, but add [BUG <bugid>] to the subject line and cc bugme-daemon@bugzilla.kernel.org If you do this, bugzilla will keep track of the entire discussion as it progresses and allow those who track bugs through bugzilla to get a pretty accurate idea of the status. You should never need to touch bugzilla again once the initial bug report is filed: all future information flow is via the mailing lists. James --
The problem is that it appears to the casual observer as if they can then add information to the bug through the web interface. But that information will never be forwarded to the mailing list. Unless there's a way of marking bugs as 'unchangable through the web interface' or 'all messages appended to this bug need to be forwarded', Bugzilla just doesn't fit our needs. The Debian BTS fits our way of working much better. Perhaps somebody should investigate a migration. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
This is excellent observation by Matthew and James. There is no magic in bugzilla not being loved, it is just "not the right set of features for effective work on a problem". It doesn't support multiple developer' collaboration well. This distaste is not universal, since some people don't have a problem with bugzilla as is, maybe those who tend to work on problems "alone"... But making it to be a workable tool for everyone is definitely worth it. Any other favorite bugzillas that are nice to work with and that have --
We have actually been trying for over two years to get bugzilla fixed so that it suits our email and list publishing workflow for fixing bugs. I surmise that 90% of our problems with bugzilla could be solved if it simply tipped a SCSI bug report onto the SCSI list when it was created in such a way that all replies were gathered back into bugzilla. Unfortunately, no-one who maintains our bugzilla has actually been able to make this happen. The other 10% of the problem is that bugzilla doesn't seem to have a way properly to integrate people who insist on using its web interface to reply into the email flow. James --
There is also a Fedora bug report against 2.6.23. The user has applied commit e9e42faf47255274a1ed0b9bf1c46118023ec5fa from 2.6.24-rc plus the two additional fixes under discussion and it hangs for him too. https://bugzilla.redhat.com/show_bug.cgi?id=390531 --
It really sounds like there's some problem applying the patches. The consistent report throughout is this one: initio: I/O port range 0x0 is busy. Which should be fixed by 99f1f534922a2f2251ba05b14657a1c62882a80e. I didn't actually find that in the bug thread anywhere, but maybe I missed it? --
The "I/O port 0" bug just prints the message and the system continues to run. It's only after that is fixed that the system just hangs on boot shortly after loading the driver. --
That should happen unless the PCI BAR is genuinely misconfigured; it's saying we got zero when we requested the starting address of BAR0. What does lspci -vv show for this device? James --
This proves the BAR0 to be non zero, but I also take it from your report
that the
initio: I/O port range 0x0 is busy.
I think there's still one remaining bug from the sg_list conversion,
namely that cblk->sglen is never set, but it is used to count the number
of elements in the sg array. Could you try this patch (on top of
everything else) and see if the problem is finally fixed?
Thanks,
James
---
diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c
index 01bf018..d038459 100644
--- a/drivers/scsi/initio.c
+++ b/drivers/scsi/initio.c
@@ -2603,6 +2603,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c
nseg = scsi_dma_map(cmnd);
BUG_ON(nseg < 0);
if (nseg) {
+ cblk->sglen = nseg;
dma_addr = dma_map_single(&host->pci_dev->dev, &cblk->sglist[0],
sizeof(struct sg_entry) * TOTAL_SG_ENTRY,
DMA_BIDIRECTIONAL);
--
Sorry ... we appear to have several reporters of different bugs in this thread. That message was copied by Chuck Ebbert from a Red Hat First off, has this driver ever worked for you in 2.6? Just booting SLES9 (2.6.5) or RHEL4 (2.6.9) ... or one of their open equivalents to check a really old kernel would be helpful. If you can get it to work, then we can proceed with a patch reversion regime based on the assumption that the problem is a recent commit. Thanks, James --
On Jan 11, 2008 5:44 PM, James Bottomley Yes it works under 2.6.16.13. See the beginning of this thread, i --
Could you try with a vanilla 2.6.22 kernel? The reason for all of this is that 2.6.22 predates Alan's conversion of this driver (which was my 95% candidate for the source of the bug). I want you to try the vanilla kernel just in case the opensuse one contains a backport. Thanks, James --
Yes you are right. I compiled the vanilla 2.6.22 and initio driver works. --
On Tue, 15 Jan 2008 09:16:06 -0600 Can you try this patch? Thanks, diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c index 01bf018..6891d2b 100644 --- a/drivers/scsi/initio.c +++ b/drivers/scsi/initio.c @@ -2609,6 +2609,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c cblk->bufptr = cpu_to_le32((u32)dma_addr); cmnd->SCp.dma_handle = dma_addr; + cblk->sglen = nseg; cblk->flags |= SCF_SG; /* Turn on SG list flag */ total_len = 0; --
We already tried a variant of this here: http://marc.info/?l=linux-scsi&m=120002863806103&w=2 The answer was negative. Although I've saved the patch because it's clearly one of the bugs. James --
Ok my attempt to get the card failed so we are going to have to do this the hard way. See where this patch crashes and what it prints (On top of the other patches) diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.24-rc8-mm1/drivers/scsi/initio.c linux-2.6.24-rc8-mm1/drivers/scsi/initio.c --- linux.vanilla-2.6.24-rc8-mm1/drivers/scsi/initio.c 2008-01-19 14:22:43.000000000 +0000 +++ linux-2.6.24-rc8-mm1/drivers/scsi/initio.c 2008-01-21 14:54:48.000000000 +0000 @@ -2537,10 +2537,12 @@ struct Scsi_Host *dev = dev_id; unsigned long flags; int r; - + + printk("ISR\n"); spin_lock_irqsave(dev->host_lock, flags); r = initio_isr((struct initio_host *)dev->hostdata); spin_unlock_irqrestore(dev->host_lock, flags); + printk("ISR DONE %d\n", r); if (r) return IRQ_HANDLED; else @@ -2643,6 +2645,7 @@ struct initio_host *host = (struct initio_host *) cmd->device->host->hostdata; struct scsi_ctrl_blk *cmnd; + printk("SCB QUEUE\n"); cmd->scsi_done = done; cmnd = initio_alloc_scb(host); @@ -2650,7 +2653,9 @@ return SCSI_MLQUEUE_HOST_BUSY; initio_build_scb(host, cmnd, cmd); + printk("SCB EXEC\n"); initio_exec_scb(host, cmnd); + printk("SCB EXEC DONE\n"); return 0; } @@ -2766,6 +2771,8 @@ struct scsi_cmnd *cmnd; /* Pointer to SCSI request block */ struct initio_host *host; struct scsi_ctrl_blk *cblk; + + printk("SCB POST\n"); host = (struct initio_host *) host_mem; cblk = (struct scsi_ctrl_blk *) cblk_mem; @@ -2934,9 +2941,11 @@ pci_set_drvdata(pdev, shost); + printk("SAH\n"); error = scsi_add_host(shost, &pdev->dev); if (error) goto out_free_irq; + printk("SSH\n"); scsi_scan_host(shost); return 0; out_free_irq: --
I get the following: SAH SSH SCB Q SCB EXEC SCB EXEC DONE After ~3 secs the system freezes. --
Actually, I suspect your issues should be fixed by this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e2d435ea40... Could you download 2.6.24 and try it out to see if they are? Thanks, James --
Well, 2.6.24 fixes the problem. Thanks to all of you! --
It worked (ish.. it has problems and always has had) before the big updates, and according to my tester after the big update + two patches that escaped somewhere in the process. Unfortunately my tester no longer has the card to dig further. The 0x0 bug was fixed a while ago but seems to have sat in -mm for a bit. Don't know about further stuff. --
The statement that OpenSuse 10.3, based on 2.6.22.5, also fails indicates there may be something else that predates your reorganisation at the root of this (depending on whether the vendor kernel contains a back port or not). That's why I want to see what happens on this system with a vanilla 2.6.22 James --
Our reporter has applied patches since then and now reports the exact same symptoms that Filippos does. (It just hangs after loading the driver.) --
