Re: e1000e: sporadic "hardware error"s with Intel 82563EB on Supermicro X7DB3

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Hillier, Gernot
Date: Thursday, October 9, 2008 - 6:18 am

Dear David,

first of all thanks for your quick answer! This is what I call great
support from a hardware vendor!! :-)

Graham, David wrote:

That sounds quite promising and seems to fit to our problem.

However, one detail confuses us: we can currently reproduce this problem on
two machines. One of them is equipped with an optional IPMI card, the other
one isn't. (The Supermicro X7DB3 doesn't include full IPMI support onboard,
but has a "LP IPMI 2.0 (SIMLP) Slot" where you can place an optional card).

The box with the IPMI card shows the hardware errors quite often (in one of
about 200 tries) while the other box still shows the problem, but much more
seldom (in one of >1000 tries). Now we wonder if the BMC is on the IPMI
card or on the board itself - in the first case, I'm not sure if you thesis
fully explains the problems we can see.

And there's another detail I'd like to mention: we first found the problem
by doing continuous reboots as originally described, but we found we can
also reproduce it with an endless loop of "rmmod;sleep 3;modprobe". Does
this somehow contradict with your thesis?


Yes, I did. Unfortunately, 0.4.1.7 still shows the problem - on both machines:

e1000e: Intel(R) PRO/1000 Network Driver - 0.4.1.7-NAPI
e1000e: Copyright (c) 1999-2008 Intel Corporation.
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:06:00.0 to 64
0000:06:00.0: 0000:06:00.0: Hardware Error
0000:06:00.0: eth0: (PCI Express:2.5GB/s:Width x4) 00:30:48:66:c7:06
0000:06:00.0: eth0: Intel(R) PRO/1000 Network Connection
0000:06:00.0: eth0: MAC: 5, PHY: 5, PBA No: 2050ff-0ff
ACPI: PCI Interrupt 0000:06:00.1[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:06:00.1 to 64
0000:06:00.1: eth1: (PCI Express:2.5GB/s:Width x4) 00:30:48:66:c7:07
0000:06:00.1: eth1: Intel(R) PRO/1000 Network Connection
0000:06:00.1: eth1: MAC: 5, PHY: 5, PBA No: 2050ff-0ff
0000:06:00.0: eth0: Hardware Error
0000:06:00.0: eth0: Hardware Error
0000:06:00.0: eth0: Hardware Error
0000:06:00.0: eth0: Hardware Error
0000:06:00.0: eth0: Hardware Error

Is there any further debug code I could add to narrow down things?


Sorry, but we can't provide any further details about this yet. We still
try to get through to the Supermicro developers, but so far our FAE contact
insists on telling us "don't use e1000e, e1000 is the right driver for your
hardware".

-- 
Gernot Hillier
Siemens AG, CT SE 2
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: e1000e: sporadic "hardware error"s with Intel 82563EB ..., Hillier, Gernot, (Thu Oct 9, 6:18 am)