Re: Urgent problem - 7.x doesn't work on HP servers

Previous thread: fixing using 3rd party mount tools from fstab by Dominic Fandrey on Monday, February 18, 2008 - 4:42 am. (1 message)

Next thread: hptrr driver panics on 7.0-RC2 by Alex Trull on Monday, February 18, 2008 - 7:12 pm. (6 messages)
To: <freebsd-current@...>
Date: Monday, February 18, 2008 - 7:56 am

Hi,

I've again encountered the problem of FreeBSD 7 not wanting to boot on a
HP server. The last time was early in 7.x development on a HP blade
(2xdual-core Opteron), without any solution (reported on this list about
a year ago). This time it's on a ML 350 G5 machine, with a quad-core Xeon=
=2E

The problem is very hard to diagnose - the entire machine locks up
during pci bus/device detection - the kernel debugger doesn't work, the
keyboard lights (PS/2 keyboard) don't work, it's completely frozen.

This is on both i386 and AMD64 kernels.

The machine freezes after detecting pcib6. The working 6.x kernel
detects upto pcib16, and the first device detected after pcib6 is the
CISS controller, so maybe it's the controller driver, but the first
machine (the blade) didn't have CISS controllers.

Any ideas?

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Monday, February 18, 2008 - 12:34 pm

FYI, I'm not seeing anything like this on the two DL 145 boxes I'm using for
10gbps testing with 7.x / 8.x. I did have problems with at least a couple of
the BIOS revs in the past, so I'd repeat the advice offered elsewhere in the
thread and make sure that it's up-to-date. There was one BIOS rev where I
couldn't use a boot loader cross-built from i386 to amd64, but both the i386
boot loader and the natively built amd64 boot loader worked fine. The BIOS
upgrade made the problem entirely go away, go figure...

Otherwise, you're probably down to the printf model for debugging, unless you
have an NMI button that can get into DDB? Mine have NMI buttons on the
botherboard, I believe, but it requires opening the case to get to.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: Robert Watson <rwatson@...>
Cc: <freebsd-current@...>, Ivan Voras <ivoras@...>
Date: Monday, February 18, 2008 - 3:43 pm

You can generate NMI from the iLO and iLO2 interface.

--
Regards, Ulf.

---------------------------------------------------------------------
Ulf Zimmermann, 1525 Pacific Ave., Alameda, CA-94501, #: 510-865-0204
You can find my resume at: http://www.Alameda.net/~ulf/resume.html
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: <freebsd-current@...>
Date: Monday, February 18, 2008 - 1:09 pm

Hi!

It is possible to invoke an NMI through the Intregrated LightsOut
management system that is built into the server.

Look under Diagnostics when you get into the iLO system.

Kind regards

Morten Strårup

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: <freebsd-current@...>
Date: Monday, February 18, 2008 - 2:55 pm

Thanks for the ideas, Robert and Morten - I'll try them. The reason I=20
thought it was something well known/common was that this is the second=20
HP system I tried 7.0 on (very different from each other) and both of=20
them failed in what looks as the same place :( It might just be bad luck.=

I have the machine on my desk so any other hardware-related ideas are=20
also welcome.

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Monday, February 18, 2008 - 8:04 am

Hi Ivan,

I don't have a ML 350 G5 machine at hand, but fwiw I do have a HP BL465c G1
blade. It's a 2 dual core AMD Opteron blade.
Right now it's running FreeBSD 7.0-BETA4. I'm running make world right now
to get a recent RELENG_7

Meanwhile:
[root@marian46-23] <~>uname -a
[12:01:36 on 08-02-18]
FreeBSD marian46-23 7.0-BETA4 FreeBSD 7.0-BETA4 #0: Tue Dec 18 12:07:27 CET
2007 root@marian46-23:/usr/obj/usr/src/sys/MOBILE amd64

dmesg:
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-BETA4 #0: Tue Dec 18 12:07:27 CET 2007
root@marian46-23:/usr/obj/usr/src/sys/MOBILE
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2600.11-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0x40f13 Stepping = 3

Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
Features2=0x2001<SSE3,CX16>
AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
Cores per package: 2
usable memory = 4280225792 (4081 MB)
avail memory = 4118867968 (3928 MB)
ACPI APIC Table: <HP 00000083>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Dec 18 2007 12:07:17)
acpi0: <HP A13> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3...

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Monday, February 18, 2008 - 8:05 am

Hi Ivan,

Try building a kernel with VERBOSE_SYSINIT option in it - you'll see
which functions it's calling, and you can use this to see where it gets
stuck.

Cheers,

--
Rink P.W. Springer - http://rink.nu
"Anyway boys, this is America. Just because you get more votes doesn't
mean you win." - Fox Mulder
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Monday, February 18, 2008 - 8:03 am

Ivan, good day.

I have a couple of BL640c and older BL<something>p running 7.0 --
no problems encountered. While this is not the direct answer to
your question, had you tried to update the blade firmware to the
latest versions with Firmware Maintenance CD? Sometimes it helps...
--
Eygene
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: <freebsd-current@...>
Date: Monday, February 18, 2008 - 2:48 pm

Thanks for the suggestion, but the blade server is deployed now and will =

stay with 6.x until there's a reason to move.

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Monday, February 18, 2008 - 3:04 pm

Ivan,

But you can try to update firmware images on the ML350. Firmware
Maintenance CD 7.91 supports this beast:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=...
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp...
--
Eygene
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: <freebsd-current@...>
Date: Tuesday, February 19, 2008 - 8:59 am

Updating the firmware didn't help. I generated a NMI and have the
debugger running. Apparently it's stuck in DELAY; the trace (transcribed
by hand) is:

DELAY()
vpd_nextbyte()
pci_read_device()
pci_add_children()
acpi_pci_attach()
device_attach()
bus_generic_attach()
acpi_pcib_attach()
acpi_pcib_pci_attach()
device_attach()
bus_generic_attach()
=2E..

this stack goes on... note repetition of device_attach in the stack,
it's repeated at least three more times. I don't know if this is normal.

Any suggestion what to do while in the debugger?

To: <freebsd-current@...>
Date: Tuesday, February 19, 2008 - 9:34 am

Hmm, new data! It works on 8-CURRENT!

Something's fishy here. I'll try and investigate more, but if anyone has
more ideas about where to look, I'd appreciate them - I don't want to
run a -CURRENT system in production.

To: Ivan Voras <ivoras@...>
Cc: <freebsd-current@...>
Date: Wednesday, February 20, 2008 - 5:47 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Would you please try to see if the latest snapshot, say, a RC3 image
would work? IIRC there was a known issue with ciss(4) which is widely
used on HP servers in RC2, which was fixed (new code disabled by
default) now.

Cheers,
- --
Xin LI <delphij@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHvKACi+vbBBjt66ARAsjFAKCAyi47CwSgg3Mo3YAL8tyMvX8NzwCdHbsD
XWPam/if/74bor3oevab9C4=
=tlrW
-----END PGP SIGNATURE-----
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

To: <freebsd-current@...>
Date: Wednesday, February 20, 2008 - 8:13 pm

I cannot find RC3 images on
ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/ISO-IMAGES/7.0/ but
booting a RELENG_7 kernel works (!) so your it looks like you have
found the problem!

Previous thread: fixing using 3rd party mount tools from fstab by Dominic Fandrey on Monday, February 18, 2008 - 4:42 am. (1 message)

Next thread: hptrr driver panics on 7.0-RC2 by Alex Trull on Monday, February 18, 2008 - 7:12 pm. (6 messages)