Re: 4.6 reboots x336 ibm server(s)

Previous thread: Re: Payment Card Industry (PCI) Data Security Standard HELP! by Stuart VanZee on Thursday, October 22, 2009 - 11:58 am. (4 messages)

Next thread: Network problems with OpenBSD 4.6 on a IBM xSeries 335 by Mauro Rezzonico on Thursday, October 22, 2009 - 1:18 pm. (4 messages)
From: FRLinux
Date: Thursday, October 22, 2009 - 12:36 pm

Hello,

I have several IBM x series 336 servers and attempted to upgrade them
today. My usual way is to use a build server which makes a release for
my servers. It went well on that server (which is the only one not
being an IBM x336, that will teach me...) so decided to deploy the new
build to the IBM servers.

When applied and i issued a reboot, the server rebooted after locking
at this line:

"Intel E7520 Error Reporting" rev 0x0c at pci0 dev 0 function 1 not configured
ppb0 at pci0 dev 2 function 0 "Intel E7520 PCIE" rev 0x0c

At this stage, server reboots and its BIOS issues the following:
re-booting due to unexpected NMI at 0000:0000

Now, I have tested my build and the official 4.6 ISO which both show
exactly the same behavior. Thinking it might have been a system issue,
I tried 3 other servers which ALL reported the same NMI issue. That
leads me to believe that my systems do not have a hardware issue (as
the NMI message would imply).

So, it looks like something in the 4.6 kernel code triggers that
behavior and I can test many things and provide output, please let me
know where I can start.

# dmesg
OpenBSD 4.5-stable (GENERIC) #0: Tue Aug 18 09:09:22 IST 2009
    root@puffy:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 3.20GHz ("GenuineIntel" 686-class) 3.21 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,CNXT-ID,CX16,xTPR
real mem  = 3623231488 (3455MB)
avail mem = 3517079552 (3354MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 02/15/07, BIOS32 rev. 0 @
0xfd6f1, SMBIOS rev. 2.3 @ 0xf5f9e (52 entries)
bios0: vendor IBM version "-[APE137AUS-1.14]-" date 02/15/2007
bios0: IBM eserver xSeries 336 -[883722Y]-
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP APIC MCFG
acpi0: wakeup devices PCI0(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: ...
From: richardtoohey
Date: Thursday, October 22, 2009 - 1:20 pm

[cut]

I've seen the same on IBM x346 - install goes fine, reboot, and then it does not
want to play nice.

Also got the unexpected NMI at 0000:0000 message.

(This was a clean install, not an upgrade, so don't know if 4.5 works on this
box or not.)

Thanks.

From: FRLinux
Date: Thursday, October 22, 2009 - 2:12 pm

Installs and boot of all previous versions up until 4.6 work. I rolled
back the server to 4.5 home release and it is back and running.

From: FRLinux
Date: Friday, October 23, 2009 - 3:08 am

Hello

I have generated a verbose trace using a com port on the server, this
is by booting the official 4.6 i386 install CD.

boot> boot -c
booting cd0a:/4.6/i386/bsd.rd: 5651156+913072 [52+211008+196339]=0x6a6260
entry point at 0x200120

Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2009 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 4.6 (RAMDISK_CD) #53: Thu Jul  9 21:41:35 MDT 2009
    deraadt@i386.openbsd.org:/usr/src/sys/arch/i386/compile/RAMDISK_CD
cpu0: Intel(R) Xeon(TM) CPU 3.20GHz ("GenuineIntel" 686-class) 3.21 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CR
real mem  = 3623231488 (3455MB)
avail mem = 3518279680 (3355MB)
User Kernel Config
UKC> verbose
autoconf verbose enabled
UKC> quit
bios0 at mainbus0: AT/286+ BIOS, date 02/15/07, BIOS32 rev. 0 @ 0xfd6f1, SMBIOS)
bios0: vendor IBM version "-[APE137AUS-1.14]-" date 02/15/2007
acpi0 at bios0: rev 2
cpu0 at mainbus0: apid 0 (boot processor)
acpiprt3 at acpi0: bus 0 (PCI0)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pci1 at ppb0 bus 2

Hope this helps...

From: FRLinux
Date: Friday, October 23, 2009 - 3:34 am

Alright, disabling ACPI allows me to install the system, but then on
reboot, even disabling ACPI makes the system restart:

boot> boot -c
booting hd0a:/bsd: 6563548+1052072 [52+345584+327881]=0x7e7ce8
entry point at 0x200120

[ using 673892 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2009 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 4.6 (GENERIC) #58: Thu Jul  9 21:24:42 MDT 2009
    deraadt@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 3.20GHz ("GenuineIntel" 686-class) 3.21 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CR
real mem  = 3623231488 (3455MB)
avail mem = 3516555264 (3353MB)
User Kernel Config
UKC> disable acpi
466 acpi0 disabled
UKC> quit
Continuing...
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 02/15/07, BIOS32 rev. 0 @ 0xfd6f1, SMBIOS)
bios0: vendor IBM version "-[APE137AUS-1.14]-" date 02/15/2007
bios0: IBM eserver xSeries 336 -[883722Y]-
acpi at bios0 function 0x0 not configured
mpbios0 at bios0: Intel MP Specification 1.4
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 200MHz
cpu at mainbus0: not configured
mpbios0: bus 0 is type PCI
mpbios0: bus 1 is type PCI
mpbios0: bus 2 is type PCI
mpbios0: bus 3 is type PCI
mpbios0: bus 4 is type PCI
mpbios0: bus 5 is type PCI
mpbios0: bus 6 is type PCI
mpbios0: bus 7 is type PCI
mpbios0: bus 8 is type ISA
ioapic0 at mainbus0: apid 14 pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0: apid 13 pa 0xfec82000, version 20, 24 pins
ioapic2 at mainbus0: apid 12 pa 0xfec82400, version 20, 24 pins
pcibios0 at bios0: rev 2.1 @ 0xf0000/0xffff
pcibios0: PCI BIOS has 11 Interrupt Routing table entries
pcibios0: PCI Exclusive IRQs: 9 10 11 15
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev 0x00)
pcibios0: PCI bus #7 is the last bus
bios0: ROM ...
From: Vadim Zhukov
Date: Friday, October 23, 2009 - 4:07 am

My $0.02: try to disable intagp, agp, inteldrm, drm devices.

--
  Best wishes,
    Vadim Zhukov

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

From: FRLinux
Date: Friday, October 23, 2009 - 5:28 am

Thanks, tried disabling a fair few based on successful boot log but
still fails after wsdisplay0

Also, as pointed out, before 4.6, it just works.

Cheers,
Steph

From: FRLinux
Date: Monday, October 26, 2009 - 1:24 pm

What else can i provide to help fixing this? I am no developer but
really would love to see that issue fixed :)

Cheers,
Steph

From: Marco Peereboom
Date: Monday, October 26, 2009 - 2:03 pm

Does it have broadcom nics?

if do disable those and try again.


From: FRLinux
Date: Tuesday, October 27, 2009 - 5:00 am

I do. I'll try that tomorrow.

On a related matter, can anyone tell me which switches are disabled
during an OpenBSD install (using the official ISO) ? That would help
me narrowing the problem down since I was able to install 4.6 from the
official CD without hassle.

Cheers,
Steph

From: FRLinux
Date: Wednesday, October 28, 2009 - 8:36 am

Hello still the same problem. Out of curiosity, tried to boot off the
amd64 CD but failing the same. Suggestions?

As I asked, can anyone tell me which flags are disabled during the
install ? (disabling acpi during install was enough to get the system
installed but then it won't boot...)

Cheers,
Steph

From: Vadim Zhukov
Date: Wednesday, October 28, 2009 - 8:43 am

You can just diff /usr/src/sys/arch/`uname -m`/conf/GENERIC
and /usr/src/sys/arch/`uname -m`/conf/RAMDISK.

--
  Best wishes,
    Vadim Zhukov

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

From: Joachim Schipper
Date: Wednesday, October 28, 2009 - 9:13 am

Just to check the obvious: did you disable acpi when booting after the
install? (And did you try both bsd and bsd.mp? The latter is less like
the install kernel than the former.)

Otherwise, you could look at
/usr/src/sys/arch/<foo>/conf/{GENERIC,RAMDISK_CD}. But that's likely a
bit daunting.

		Joachim

From: FRLinux
Date: Thursday, October 29, 2009 - 11:06 am

On Wed, Oct 28, 2009 at 4:13 PM, Joachim Schipper


Hello, the problem is related to the network cards alright. Disabling
ppb* allows it to boot. My problem is that even if I disable a card in
the bios, i cannot boot the system. I tried to disable ppb2 but it
doesn't seem to take it. What am I missing ?

Cheers,
Steph

From: Joachim Schipper
Date: Thursday, October 29, 2009 - 12:11 pm

I'm not really sure what you are asking. Is your question answered by
pointing you at the -u option of config(8) (i.e. showing you how to get
the 'disable ppb*' to stick)? If not, you'll have to rephrase it or hope
someone else understands it...

		Joachim

From: FRLinux
Date: Friday, October 30, 2009 - 4:27 am

On Thu, Oct 29, 2009 at 7:11 PM, Joachim Schipper

Sorry, let me rephrase this.

I have established that the problem lies with the PCI Express bus as
disabling ppb* allows the server to boot. Unfortunately in that state,
you have no network, slightly annoying...

When doing a boot -c, i try to specify : disable ppb2 but it does not
take it, only disable ppb* reports : ppb* disabled. Is there a way to
disable only part of it?

Another test I did was to disable both network cards in the bios but
that still doesn't work. I have however noticed that a shit load of
devices share the same IRQ. Unfortunately IBM Bios does not allow you
to disable one device at a time, you can just select another IRQ.

If anyone has insight on what else I can do to get workable systems,
i'd be grateful. The option of sticking an alternate PCI network card
is not an option as I have about 10 more servers in prod awaiting 4.6

Cheers,
Steph

From: Marco Peereboom
Date: Friday, October 30, 2009 - 6:48 am

Run a -current system.  This could be a pci resource allocation issue.


From: FRLinux
Date: Friday, October 30, 2009 - 9:08 am

Hello again,

It surely is, even leaving acpi enabled, the only way to allow the
machine to boot is to disable ppb*

It affect all the IBM Xseries 336 that I have.

I just tried snapshot 28/10/2009 which has the same symptoms... only
disabling ppb* allows it to boot.

It seems that my problem is ppb2, but i cannot disable that one only, can I?

Steph

From: Marco Peereboom
Date: Friday, October 30, 2009 - 9:20 am

no.  kettenis needs to see a pci -v -xx of this machine.  Send in the
acpidump -o as well.  I can't volunteer his time so he'll look at it
whenever he'll look at it.


From: FRLinux
Date: Monday, November 2, 2009 - 10:07 am

# pcidump -v -xx
Domain /dev/pci0:
 0:0:0: Intel E7520 Host
        0x0000: Vendor ID: 8086 Product ID: 3590
        0x0004: Command: 0146 Status ID: 0090
        0x0008: Class: 06 Subclass: 00 Interface: 00 Revision: 0c
        0x000c: BIST: 00 Header Type: 80 Latency Timer: 00 Cache Line Size:
00
        0x0010: BAR empty (00000000)
        0x0014: BAR empty (00000000)
        0x0018: BAR empty (00000000)
        0x001c: BAR empty (00000000)
        0x0020: BAR empty (00000000)
        0x0024: BAR empty (00000000)
        0x0028: Cardbus CIS: 00000000
        0x002c: Subsystem Vendor ID: 1014 Product ID: 02dc
        0x0030: Expansion ROM Base Address: 00000000
        0x0038: 00000000
        0x003c: Interrupt Pin: 00 Line: 00 Min Gnt: 00 Max Lat: 00
        0x0040: Capability 0x09: Vendor Specific
        0x0000: 35908086 00900146 0600000c 00800000
        0x0010: 00000000 00000000 00000000 00000000
        0x0020: 00000000 00000000 00000000 02dc1014
        0x0030: 00000000 00000040 00000000 00000000
        0x0040: 41050009 00000010 00000000 00000000
        0x0050: 000a200c 00000000 01111000 11110000
        0x0060: 10100808 20201818 00000000 00000000
        0x0070: 0e0e0e0e 00000000 555e1144 2c20021e
        0x0080: 00411248 00000000 f0000180 00000000
        0x0090: 00000000 39092a00 301caaaa 070208d5
        0x00a0: 00000001 00000000 00000001 00000000
        0x00b0: 77bbddee 00000000 00000000 00000000
        0x00c0: 3350c044 0040d800 000a0049 e0000020
        0x00d0: 0e002802 00000007 b5930000 01040000
        0x00e0: 00000000 00000000 00004036 00000000
        0x00f0: 00000000 00420132 000c0f80 00000000
 0:0:1: Intel E7520 Error Reporting
        0x0000: Vendor ID: 8086 Product ID: 3591
        0x0004: Command: 0100 Status ID: 0000
        0x0008: Class: ff Subclass: 00 Interface: 00 Revision: 0c
        0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size:
00
        0x0010: BAR empty (00000000)
        0x0014: BAR empty (00000000)
    ...
From: FRLinux
Date: Thursday, November 5, 2009 - 8:51 am

Here is the acpidump from 4.5 running on the same server:

# acpidump
/*
RSD PTR: Checksum=85, OEMID=IBM, RsdtAddress=0xd7fcff80
 */
/*
RSDT: Length=48, Revision=1, Checksum=61,
        OEMID=IBM, OEM Table ID=SERONYXP, OEM Revision=0x1001,
        Creator ID=IBM, Creator Revision=0x45444f43
 */
/*
        Entries={ 0xd7fcfe40, 0xd7fcfd80, 0xd7fcfd40 }
 */
/*
        DSDT=0xd7fccf00
        INT_MODEL=APIC
        SCI_INT=9
        SMI_CMD=0xb2, ACPI_ENABLE=0xf0, ACPI_DISABLE=0xf1, S4BIOS_REQ=0x0
        PM1a_EVT_BLK=0x580-0x583
        PM1a_CNT_BLK=0x584-0x585
        PM2_TMR_BLK=0x588-0x58b
        PM2_GPE0_BLK=0x5a8-0x5af
        P_LVL2_LAT=101ms, P_LVL3_LAT=1001ms
        FLUSH_SIZE=0, FLUSH_STRIDE=0
        DUTY_OFFSET=1, DUTY_WIDTH=3
        DAY_ALRM=68, MON_ALRM=69, CENTURY=0
        Flags={WBINVD,PROC_C1,SLP_BUTTON}
 */
/*
DSDT: Length=8990, Revision=2, Checksum=246,
        OEMID=IBM, OEM Table ID=SERTURQU, OEM Revision=0x1000,
        Creator ID=INTL, Creator Revision=0x20041203
 */
DefinitionBlock (
"acpi_dsdt.aml",        //Output filename
"DSDT",                 //Signature
0x2,                    //DSDT Revision
"IBM",                  //OEMID
"SERTURQU",             //TABLE ID
0x1000                  //OEM Revision
)

{
Scope(\) {
    Method(CWRT, 3) {
        Name(TMPB, Buffer(0x10) {0x88, 0xd, 0x0, 0x0, 0xc, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 })
        Store(Arg2, Index(TMPB, 0x3))
        If(LEqual(Arg2, 0x0)) {
            Store(0x1, Index(TMPB, 0x5))
        }
        Store(And(Arg0, 0xff), Index(TMPB, 0x8))
        Store(And(ShiftRight(Arg0, 0x8), 0xff), Index(TMPB, 0x9))
        Store(And(Arg1, 0xff), Index(TMPB, 0xa))
        Store(And(ShiftRight(Arg1, 0x8), 0xff), Index(TMPB, 0xb))
        Store(Add(Subtract(Arg1, Arg0), 0x1), Local7)
        Store(And(Local7, 0xff), Index(TMPB, 0xe))
        Store(And(ShiftRight(Local7, 0x8), 0xff), Index(TMPB, 0xf))
        Return(TMPB)
    }
    Method(CDRT, 3) {
     ...
From: FRLinux
Date: Saturday, November 21, 2009 - 1:20 pm

*bump*


From: Marcin
Date: Monday, January 11, 2010 - 3:03 pm

Hello,

I ran into the same issue with IBM x336 while trying to launch 4.6
after installation.
I checked 4.6 ISO and -current, neither of them booted successfully.

My x336s are fairly standard machines (4GB of RAM, 2x3.2GHz Xeon, 2x73GB SCSI)
however I have Intel em(4) adapter installed.

I equipped my test machine with management card so I am happy to provide
more information in addition to what has been already sent by Steph.

I can also confirm another problem with IBM x336 - while loading the
kernel it freezes
for several minutes, just after "entry point at ....." message. The
freeze can be skipped
by pressing any key - the same behaviour was observed with 4.5.

Finally - does anyone successfully use ipmi with x336? I was hoping to
use watchdog,
but it behaved very unstable and lead to kernel panic.

Many thanks,
Marcin

From: FRLinux
Date: Monday, January 11, 2010 - 4:00 pm

I can confirm that latest snapshot does not boot on my x336 servers either.

Marcin, can you run the following on your server:

pcidump -v -xx
acpidump

Then paste in reply? Developers here might find something in there
explaining why this happens to these servers. Right now i have 7
servers in production stuck in 4.5 until this can be fixed. I hope
someone can really look into that, that would be really appreciated :)

Cheers,
Steph

From: Marcin
Date: Tuesday, January 12, 2010 - 1:43 am

Sure - hope it is of any use.
Please let me know if anything more is required.

pcidump -v
###########################################

Domain /dev/pci0:
 0:0:0: Intel E7520 Host
	0x0000: Vendor ID: 8086 Product ID: 3590
	0x0004: Command: 0146 Status ID: 0090
	0x0008: Class: 06 Subclass: 00 Interface: 00 Revision: 0c
	0x000c: BIST: 00 Header Type: 80 Latency Timer: 00 Cache Line Size: 00
	0x0010: BAR empty (00000000)
	0x0014: BAR mem 32bit addr: 0xff000000
	0x0018: BAR empty (00000000)
	0x001c: BAR empty (00000000)
	0x0020: BAR empty (00000000)
	0x0024: BAR empty (00000000)
	0x0028: Cardbus CIS: 00000000
	0x002c: Subsystem Vendor ID: 1014 Product ID: 02dc
	0x0030: Expansion ROM Base Address: 00000000
	0x0038: 00000000
	0x003c: Interrupt Pin: 00 Line: 00 Min Gnt: 00 Max Lat: 00
	0x0040: Capability 0x09: Vendor Specific
 0:0:1: Intel E7520 Error Reporting
	0x0000: Vendor ID: 8086 Product ID: 3591
	0x0004: Command: 0100 Status ID: 0000
	0x0008: Class: ff Subclass: 00 Interface: 00 Revision: 0c
	0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 00
	0x0010: BAR empty (00000000)
	0x0014: BAR empty (00000000)
	0x0018: BAR empty (00000000)
	0x001c: BAR empty (00000000)
	0x0020: BAR empty (00000000)
	0x0024: BAR empty (00000000)
	0x0028: Cardbus CIS: 00000000
	0x002c: Subsystem Vendor ID: 1014 Product ID: 02dc
	0x0030: Expansion ROM Base Address: 00000000
	0x0038: 00000000
	0x003c: Interrupt Pin: 00 Line: 00 Min Gnt: 00 Max Lat: 00
 0:2:0: Intel E7520 PCIE
	0x0000: Vendor ID: 8086 Product ID: 3595
	0x0004: Command: 0147 Status ID: 0010
	0x0008: Class: 06 Subclass: 04 Interface: 00 Revision: 0c
	0x000c: BIST: 00 Header Type: 01 Latency Timer: 00 Cache Line Size: 10
	0x0010: 00000000
	0x0014: 00000000
	0x0018: Primary Bus: 0 Secondary Bus: 2 Subordinate Bus: 2
	        Secondary Latency Timer: 00
	0x001c: I/O Base: 40 I/O Limit: 30 Secondary Status: 0000
	0x0020: Memory Base: df00 Memory Limit: def0
	0x0024: Prefetch Memory Base: df01 Prefetch Memory ...
From: J.D. Bronson
Date: Tuesday, January 12, 2010 - 4:44 am

I just joined this thread today, but had a similar issue with an IBM 305 
machine.

On 4.5, it would randomly just shut down. No reason. Nothing in any 
logs, it was as if the power was pulled.

I have 2 identical IBM 305 machines and it was happening on both, so 
that technically ruled out any specific hardware failure.
What I did notice (in the BIOS events) was that the IBM reported fan 
#1,2,3 loss. Something seemed to disrupt the fan speed to bios reporting 
and I suspect the machine powered down since it thought it was 
overheating? - I could go a day or 2 weeks. Very random.

4.6 hasn't done this (yet) and uptime has been over a month.
However, eventhough both IBMs are the same in every way, 4.6-REL will 
boot on machine #2 but I have no networking. If I use a 4.6-CUR 
snapshot, it comes up fine. That makes NO sense, yet another user 
reported the same exact thing.

-- 
J.D. Bronson

From: Kenneth R Westerback
Date: Tuesday, January 12, 2010 - 8:10 am

Please try -current as of today (Jan 13, 2010 Melbourne time), there were 
number of significant fixes committed in the last couple of days.

.... Ken

From: J.D. Bronson
Date: Tuesday, January 12, 2010 - 8:16 am

I would try a -current but the 4.6-STABLE I have in use on Machine #1
has been running fine and I am not seeing reboots or unexpected 
shutdowns as the OP has been experiencing.

The Machine #2 will only run -current and I can't figure that out when 
they are identical. I suspect 4.7 will run fine on both machines..

-- 
J.D. Bronson

From: Marcin
Date: Tuesday, January 12, 2010 - 11:05 am

Hi,

I tried current - the good news is the problem with freeze at startup is gone
 - kernel boots immediately.

However, it hangs later on just after printing out following lines:

pci0 at mainbus0 bus 0: configuration mode 1 (bios)
mem address conflict 0xff000000/0x1000
pchb0 at pci0 dev 0 function 0 "Intel E7520 Host" rev 0x0c
"Intel E7520 Error Reporting" rev 0x0c at pci0 dev 0 function 1 not configured
ppb0 at pci0 dev 2 function 0 "Intel E7520 PCIE" rev 0x0c


Thanks,
Marcin

From: FRLinux
Date: Tuesday, January 12, 2010 - 1:26 pm

Yup, same error here, precisely at that line.

Just to confirm that we have the same issue, can you try disabling
ppb* on boot -c then see if it goes to the login prompt?

Cheers,
Steph

From: Marcin
Date: Wednesday, January 13, 2010 - 8:33 am

Was even worse for me here, as although disabling ppb* makes the
kernel go slightly further, it has a nasty side effect of disabling
scsi controllerl.

However, I have just checked out and compiled -current and can confirm
the issue is gone - machine booted and all network interfaces are
accessible.

Many thanks to everyone involved in fixing that!

Regards,
Marcin

From: Ted Unangst
Date: Wednesday, January 13, 2010 - 9:28 am

One more time, for the record.  If the kernel hangs after printing out
a line, that's NOT the device that caused trouble.  The lines are
printed out mean "I did this", not "I'm about to do this."  This
should be obvious if you think about it, network cards print out their
MAC addresses.  How could the kernel do that if the device wasn't
attached yet?  [There are some more details, but that's the high
level.]

The reason disabling the last line /sometimes/ works is that if it's a
bus, you then prevent probing of all the attached devices.

From: FRLinux
Date: Wednesday, January 13, 2010 - 10:36 am

Sorry, my formulation was not the most accurate. I meant that in my
case, when disabling the whole ppb*, it is the only way to get the
server booting to login prompt on 4.6, and as a side effect of
disabling devices on that bus, i have no network card available then.

Cheers,
Steph

From: FRLinux
Date: Thursday, March 11, 2010 - 8:00 am

Hey guys, sent an acpi dump with dmesg info a couple of months ago to
this list hoping the developers might be able to fix this. Just
letting you know that 4.7 snapshot still reboots the box unless you
disable ppb*. Any way i can help?

Cheers,
Steph

From: FRLinux
Date: Friday, March 12, 2010 - 4:43 am

Would donating such a server help in the matter? If so, to whom?

Cheers,
Steph

From: Joel Sing
Date: Friday, March 12, 2010 - 6:55 am

The issue has already been investigated and kettenis@ committed a fix during 
n2k10. However, the fix that allowed these servers to work happened to break 
other systems that were previously working, hence the change was backed out. 
I believe that an alternate fix is being worked on, however if you want to 
use this hardware in the meantime you can revert dev/pci/pci.c to r1.72.
-- 

   "Stop assuming that systems are secure unless demonstrated insecure;
    start assuming that systems are insecure unless designed securely."
          - Bruce Schneier

From: FRLinux
Date: Friday, March 12, 2010 - 12:07 pm

Hello, thanks for your reply, i'll look into that soon.

Cheers,
Steph

From: FRLinux
Date: Thursday, May 27, 2010 - 2:10 am

Hey guys, just to let you know, the issue is still present on stock
4.7 CDs. Any hope that I might use a current containing a fix or do
you consider that chaning that would pretty much break too many other
types of servers?

Thanks for your replies,
Steph

Previous thread: Re: Payment Card Industry (PCI) Data Security Standard HELP! by Stuart VanZee on Thursday, October 22, 2009 - 11:58 am. (4 messages)

Next thread: Network problems with OpenBSD 4.6 on a IBM xSeries 335 by Mauro Rezzonico on Thursday, October 22, 2009 - 1:18 pm. (4 messages)