Re: e1000e crashes with 2.6.34.x and ThinkPad T60

Previous thread: support for drives larger than 2TiB by Tejun Heo on Saturday, July 24, 2010 - 2:58 am. (20 messages)

Next thread: [PATCH] CEPH: Correct obvious typo of Kconfig variable "CRYPTO_AES" by Robert P. J. Day on Saturday, July 24, 2010 - 3:41 am. (1 message)
From: Marc Haber
Date: Saturday, July 24, 2010 - 2:26 am

Hi,

I have a new notebook, a Thinkpad T60, which is freezing in random
intervals (like 30 minutes to two days) as long as I am using the
on-board wired ethernet interface, which is an e1000e, [8086:109a]. As
long as I keep using the WLAN, the system runs for weeks despite
frequent suspend/resume cycles etc. The crashes seem really to be tied
to using the wired ethernet. This is a hard freeze, with nothing
happening on the system, only a long push on the power button helps.

Additionally, sometimes, probably after suspend/resume, the wired
ethernet does not come up properly again, ip addr claims "NO CARRIER"
even if the LEDs on the interface and on the switch claim that there
was a link. No packets are received by the interface when it's at this
stage.

Both issues appear with 2.6.34 and 2.6.34.1. I didn't try any of these
issues with an older kernel, 2.6.34 was already out when I started
using the T60.

To rule out defective hardware, I have tried with a second T60, with
the same results.

Full dmesg and lspci-nn attached, please say if you need more.

Greetings
Marc

P.S.: Please Cc: me on replies, both linux-kernel and netdev are too
big for me to timely follow. I am subscribed to both lists, but a Cc
helps in getting a faster reply.

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Allan, Bruce W
Date: Monday, July 26, 2010 - 9:13 am

Adding e1000-devel (the Intel LAN developers list).

Please supply the full dmesg you meant to attach with the original report, as well as the output of lspci -vvv.

Thanks,
Bruce.
--

From: Marc Haber
Date: Friday, July 30, 2010 - 5:56 am

Stupid me.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
From: Allan, Bruce W
Date: Friday, July 30, 2010 - 10:42 am

Please also provide an eeprom dump from the wired LOM via 'ethtool -e ethX'.

Thanks,
Bruce.--

From: Marc Haber
Date: Sunday, August 1, 2010 - 12:14 pm

Offset          Values
------          ------
0x0000          00 16 41 aa be 37 30 0b b2 ff 51 00 ff ff ff ff
0x0010          53 00 03 01 6b 02 01 20 aa 17 9a 10 86 80 df 80
0x0020          00 00 00 20 54 7e 00 00 14 00 da 00 04 00 00 27
0x0030          c9 6c 50 31 3e 07 0b 04 8b 29 00 00 00 f0 02 0f
0x0040          08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff
0x0050          14 00 1d 00 14 00 1d 00 af aa 1e 00 00 00 1d 00
0x0060          00 01 00 40 1f 12 07 40 ff ff ff ff ff ff ff ff
0x0070          ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7a a7

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Chuck Ebbert
Date: Thursday, August 5, 2010 - 10:46 pm

On Fri, 30 Jul 2010 14:56:14 +0200

That's not very useful.

The pcie capabilities are completely missing.
--

From: Marc Haber
Date: Tuesday, August 10, 2010 - 5:02 am

Again, apologizes. The attached lspci -vvv is from 2.6.35, right after
the first freeze with this kernel version.

Greetings
Marc

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
	Subsystem: Lenovo ThinkPad T60
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [e0] Vendor Specific Information: Len=09 <?>

00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: ee100000-ee1fffff
	Prefetchable memory behind bridge: 00000000d8000000-00000000dfffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [88] Subsystem: Lenovo Device 2014
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0300c  Data: 4161
	Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, ...
From: Tantilov, Emil S
Date: Tuesday, July 27, 2010 - 5:33 pm

When the crashes occur - is there a trace on the screen?

Do you know of a way to reproduce the issue? For example were
you downloading files, browsing internet, or using the ethernet device


Doesn't seem that there were any attachments to this email. 

Thanks,
Emil
From: Marc Haber
Date: Friday, July 30, 2010 - 5:54 am

Hi,


Not that I can see any. I'm working in X11, and the screen simply
freezes. Mouse doesn't move any more, clock stops. System is not
pingable, does not react to Ctrl-Alt-Bksp nor to any Magic Sysrq, nor
does it shut down when I have the power button issue an acpi shutdown

Sadly, no. I have it happen when typing in a remote ssh session, while
on the other hand hundreds of megabytes can be downloaded just fine.
It just happens "at random", two or three times a day. I have resorted
to using the WLAN instead of the wired ethernet, but that's not a


Attached as well.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
From: Tantilov, Emil S
Date: Wednesday, August 4, 2010 - 11:23 am

I have a T60 running with 2.6.34.1 using your config and so far no issues. 

Looking at your lspci output - your system has a slightly different HW, but I don't know if this is significant.

Are you loading the kernel with any parameters (cat /proc/cmdline)?

Do you have firewall configured (iptables -L)?


Thanks,
Emil

From: Marc Haber
Date: Tuesday, August 10, 2010 - 5:04 am

BOOT_IMAGE=/vmlinuz-2.6.35-zgws1 root=/dev/mapper/root ro

I am working pretty intensively with virtual machines which are natted
here and there. I have a handful of MASQUERADE rules in the

Tried, no improvement.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Allan, Bruce W
Date: Tuesday, August 10, 2010 - 9:34 am

[adding e1000-devel, the Intel wired ethernet developers mailing list]

We have had other recent reports of issues with this part that are due to
ASPM L1 being enabled.  Would you please try disabling L1 after the driver
is loaded as follows (assuming your adapter is still PCI bus/device/number
02:00.0 as indicated in the lspci output you provided earlier):
1) First check the hexadecimal value of the LnkCtl register -
# setpci -s 2:0.0 0xf0
2) Disable ASPM (both L0s and L1) by zeroing out bits 0 and 1 in the value
returned by the previous step.  For example, if it returned 42 (hex 42,
that is) -
# setpci -s 2:0.0 0xf0=0x40
3) Confirm ASPM is disabled by checking the output from lspci again.

Please let us know if this helps your situation, thanks.
Bruce.
--

From: Marc Haber
Date: Thursday, August 12, 2010 - 3:50 am

Hi,



$ sudo setpci --version
setpci version 3.1.7                   
$ sudo setpci -s 2:0.0 0xf0
setpci: Missing width.
Try `setpci --help' for more information.
$

Looking at --help doesn't help me, sorry.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Allan, Bruce W
Date: Thursday, August 12, 2010 - 9:12 am

Hmm, that's a newer version than I am familiar with.  Apparently in
more recent versions, the tool is requiring a width be specified for
unnamed registers and/or registers for which the width is unknown.
That being the case, append the width specifier .B (one byte) to the
register address.  For example:

# setpci -s 2:0.0 0xf0.B

HTH,
Bruce.
--

From: Marc Haber
Date: Thursday, August 12, 2010 - 6:25 pm

Hi Bruce,


It returned 42, and after setting 0x40, LnkCtl now says "ASPM Disabled".

I'll dock the system now and will report after the weekend or after
the first crash.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Marc Haber
Date: Friday, August 13, 2010 - 1:07 am

Didn't work, freeze within the first 60 minutes after starting serious
work. The system did sit idle for the night though without freezing.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Marc Haber
Date: Tuesday, August 24, 2010 - 2:45 am

Hi,


As of 2.6.35.3, this issue has become worse. It now even happens
directly after freshly booting the system that I cannot get a link on
the wired ethernet.

What information do you need to find out what's going on there?

Please Cc: me on replies

Greetings
Marc

$ sudo ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: no
$

[    0.000000]   #38 [0002810840 - 0002810850]         BOOTMEM
[    0.000000]   #39 [0002810880 - 00028108f3]         BOOTMEM
[    0.000000]   #40 [0002810900 - 0002810973]         BOOTMEM
[    0.000000]   #41 [0002c00000 - 0002c10000]         BOOTMEM
[    0.000000]   #42 [0002e00000 - 0002e10000]         BOOTMEM
[    0.000000]   #43 [0002812980 - 0002812984]         BOOTMEM
[    0.000000]   #44 [00028129c0 - 00028129c4]         BOOTMEM
[    0.000000]   #45 [0002812a00 - 0002812a08]         BOOTMEM
[    0.000000]   #46 [0002812a40 - 0002812a48]         BOOTMEM
[    0.000000]   #47 [0002812a80 - 0002812b28]         BOOTMEM
[    0.000000]   #48 [0002812b40 - 0002812ba8]         BOOTMEM
[    0.000000]   #49 [0002812bc0 - 0002816bc0]         BOOTMEM
[    0.000000]   #50 [0002816bc0 - 0002896bc0]         BOOTMEM
[    0.000000]   #51 [0002896bc0 - 00028d6bc0]         BOOTMEM
[    ...
From: Tantilov, Emil S
Date: Wednesday, August 25, 2010 - 11:01 am

[Empty message]
From: Marc Haber
Date: Thursday, August 26, 2010 - 4:27 am

Attached, for an interface in the working state. I'll deliver another

Will do in the next days.

I guess it would be better to file two bugs, one for the crashes and
one for the no-link-in-some-situations issue, right? Or is it likely
that they both have the same cause?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
From: Marc Haber
Date: Thursday, August 26, 2010 - 10:14 am

Here we go, with a non-working interface.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
From: Tantilov, Emil S
Date: Thursday, August 26, 2010 - 2:31 pm

After reviewing the output from dmidecode I determined that your model T60 is slightly different than mine. It appears that you have the widescreen version. Is that correct?

Also you seem to be running a fairly old version of the BIOS (1.08). The latest is 1.18:
http://www-307.ibm.com/pc/support/site.wss/MIGR-67020.html

I would recommend that you upgrade your BIOS. If that does not help we can continue with the investigation. I will also try to locate a widescreen T60 that would hopefully help me with the repro. 



Thanks,
Emil
From: Marc Haber
Date: Friday, September 3, 2010 - 11:33 pm

Hi,


The dmidecode output is from the widescreen model, yes, but I also
have two "normal" T60 with the non-wide screen 15" display (with
1400x1050 pixels). The freezes happen on all three. The one I have at
hand is running BIOS 2.26 dated 2010-04-01. I will also try updating
the Widescreen unit which is - not surprisingly - the one I use the

Thanks for that pointer, I am having difficulties in navigating the

I can give you ssh access to mine if you want to. Do you have IPv6

Please note that usually the freezes happen when the network is rather
slightly loaded, for example when I'm typing into an ssh window with
nothing else happening on the box. When I do things that are rather
traffic intensive such as a backup, the box is fine. The "no link"
issue appears most frequently on a system that has been running for
some time and suspend-to-ram was used. I am traveling a lot, and every
change of train or bus involves a suspend-resume cycle.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Tantilov, Emil S
Date: Tuesday, September 7, 2010 - 12:05 pm

[Empty message]
From: Marc Haber
Date: Sunday, September 12, 2010 - 2:04 pm

Hi,


No, IPv6 is just some kind of luxury. The majority of work is done via
IPv4. It is just easier to make the box accessible from the outside

I will try doing so. At the moment, I am pretty much traveling five
days a week and do not have much opportunity to use the wired
Ethernet, but I haven't seen the crashes recently. So it might as well

I will do that as soon as I see the issue again.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Tantilov, Emil S
Date: Thursday, August 26, 2010 - 10:22 am

[Empty message]
From: Maciej W. Rozycki
Date: Thursday, August 26, 2010 - 11:51 am

If that helps -- there's a serial port option available for the T60's 
UltraBay too that works as ttyS0.

  Maciej
--

From: Marc Haber
Date: Friday, September 3, 2010 - 11:34 pm

Hi,


I have indeed recently bought a docking station. How do I obtain a
trace from a frozen sysem?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

Previous thread: support for drives larger than 2TiB by Tejun Heo on Saturday, July 24, 2010 - 2:58 am. (20 messages)

Next thread: [PATCH] CEPH: Correct obvious typo of Kconfig variable "CRYPTO_AES" by Robert P. J. Day on Saturday, July 24, 2010 - 3:41 am. (1 message)