Re: Kernel panic during boot in usb_add_task

Previous thread: Re: HAMMER update - 15 nov 2007 by Justin C. Sherrill on Thursday, November 15, 2007 - 9:19 pm. (1 message)

Next thread: Is there analog of `softint_establish and co.` in DFly kernel? by Dmitry Komissaroff on Monday, November 26, 2007 - 2:03 am. (1 message)
From: Michael Neumann
Date: Tuesday, November 20, 2007 - 1:52 am

Hi,

I tried my brand new HP Compaq laptop 6710b under DragonFly, but during booting
the installer CD it "throws" a page fault:

   uhub0: 2 ports ...
   uhub0: <Intel UHCI root hub, ...>

   Fatal trap 12: page fault while in kernel mode
   fault virtual address = 0x0
   fault code            = supervisor write, page not present
   instruction pointer   = 0x8:0xc04a9c5c
   stack pointer         = 0x10:0xc25f8d38
   frame pointer         = 0x10:0xc25f8d48
   code segment          = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gram 1
   processor eflags      = interrupt enabled, resume, IOPL = 0
   current process       = Idle
   current thread        = pri 46 (CRIT)

   kernel: type 12, code=2
   stopped at       usb_add_task+0x4c:    movl     %edi,0(%eax)


This happens with the latest snapshot version as of yesterday and also with the
1.10 release.

FreeBSD 7.0-BETA3 silently hangs during boot, while NetBSD 4.0RC_4 works like a 
charm (it can even dual-boot windows natively)!

Any hints?

Regards,

   Michael
From: Bill Hacker
Date: Tuesday, November 20, 2007 - 7:27 am

For starters, what does NetBSD report as the hardware and can you capture *any* 
scan, dmesg, or other report from FreeBSD 7-BETA3 or DFLY?

And have you tried DFLY and/or FreeBSD with the various boot options selected - 
or can you not get far enough to do so?

Bill


From: Michael Neumann
Date: Wednesday, November 21, 2007 - 9:18 am

Dmesg of NetBSD-4.0_RC4 is appended.

I can start FreeBSD 7-BETA3 if I disable the LAN in the BIOS and boot without 
ACPI as explained here:

http://www.nabble.com/RELENG_7-and-HEAD:-bge-causes-system-hang-t4804426.html


I can, but it doesn't help much. I tried "unset acpi_load", but that gives the 
same result (panic). Maybe a SATA problem I thought? But I disabled "native SATA 
mode" in BIOS, so I guess it's now seen by DF as a (P)ATA controller.

Well, I just wanted to share this, maybe this will help others. For now I don't 
want to install DF anyway, because of lack of Bluetooth and wpi (and probably 
X3100 graphics support).

Thanks!

Regards,

   Michael
From: Michael Neumann
Date: Wednesday, November 21, 2007 - 9:46 am

Forgot attachement.

Regards,

   Michael
From: Michael Neumann
Date: Monday, November 26, 2007 - 1:54 pm

I could track it down where the panic occurs:

http://opengrok.creo.hu/dragonfly/xref/src/sys/bus/usb/usb.c#374

More specifically:

   http://opengrok.creo.hu/dragonfly/xref/src/sys/sys/queue.h#428

   *(head)->tqh_last = (elm);

This expands to:

   *(&taskq->tasks)->tgh_last = task;

There a NULL pointer is dereferenced somehow.

usb_add_task is called from uhci_timeout:

   http://opengrok.creo.hu/dragonfly/xref/src/sys/bus/usb/uhci.c#1428

It seems to get only called when a timeout occurs. That's maybe that I 
am the only one having those problems :)

I couldn't track it down further. My pure guess would be that it would 
not panic if "uhci_abort_xfer(&uxfer->xfer, USBD_TIMEOUT);" is called 
instead (sc->sc_dying == 1), but I can't build a kernel right now, so I 
can't change the code and build an ISO image.

Any further ideas?

Regards,

   Michael
From: Bill Hacker
Date: Monday, November 26, 2007 - 2:24 pm

Looks to be decent detective work to me!

But I quit coding in 'C' the same year the 386-16 was launched,

;-)

..so we'll now need to wait for a developer to weigh-in.

Meanwhile - it is not *supposed to* matter - but on any OS that you are able to 
boot with the devices all optioned ON, can you confirm use of shared interrupts 
or NOT?

Ex:

Tyan Tomcat - some sharing:

triligon# ps -xa | grep '\[irq'
    22  ??  WL     0:00.00 [irq9: acpi0]
    23  ??  WL     0:07.04 [irq16: bge0 uhci3]
    24  ??  WL     0:00.02 [irq17: bge1]
    25  ??  WL     0:00.00 [irq23: uhci0 ehci0]
    29  ??  WL     0:23.97 [irq19: uhci1+]
    31  ??  WL     0:00.00 [irq18: uhci2+]
    35  ??  WL     0:00.00 [irq14: ata0]
    36  ??  WL     0:00.00 [irq15: ata1]
    37  ??  WL     0:00.00 [irq1: atkbd0]
    38  ??  WL     0:00.00 [irq12: psm0]
    40  ??  WL     0:00.00 [irq7: ppc0]
  3401  p0  S+     0:00.00 grep \\[irq


HP-Compaq Proliant - nothing shared:

datareplica# ps -xa | grep '\[irq'
    19  ??  WL     0:00.00 [irq9: acpi0]
    20  ??  WL     0:07.15 [irq76: ciss0]
    22  ??  WL     0:01.68 [irq72: em0]
    23  ??  WL     3:58.75 [irq73: em1]
    24  ??  WL     0:11.54 [irq24: mpt0]
    26  ??  WL     0:00.00 [irq25: mpt1]
    28  ??  WL     0:04.47 [irq16: uhci0]
    31  ??  WL     0:00.00 [irq19: uhci1]
    33  ??  WL     0:00.00 [irq23: ehci0]
    35  ??  WL     2:15.89 [irq17: bge0]
    36  ??  WL     0:00.00 [irq14: ata0]
    37  ??  WL     0:00.00 [irq15: ata1]
    40  ??  WL     0:00.00 [irq1: atkbd0]
    41  ??  WL     0:00.00 [irq12: psm0]
    42  ??  WL     0:00.00 [irq7: ppc0]

Asus P5K (with 'fast' PCI-e empty).
Poop hits the fan bigtime PCI-wise (one PCI-attached NIC, max) if that slot is used.

    25  ??  WL     0:00.00 [irq9: acpi0]
    26  ??  WL     0:00.00 [irq16: de0 uhci0+]
    30  ??  WL     0:00.00 [irq21: uhci1]
    32  ??  WL     0:00.00 [irq18: rl1 uhci2++]
    35  ??  WL     0:00.12 [irq17: rl0 atapci1]
    36  ??  WL     0:00.00 ...
Previous thread: Re: HAMMER update - 15 nov 2007 by Justin C. Sherrill on Thursday, November 15, 2007 - 9:19 pm. (1 message)

Next thread: Is there analog of `softint_establish and co.` in DFly kernel? by Dmitry Komissaroff on Monday, November 26, 2007 - 2:03 am. (1 message)