Hi,
I have one HP DL 140 G1 running with Debian stable and a
locally-built, vanilla kernel. With 2.6.25.11, everything is fine.
But, after updating to 2.6.26.2, I noticed that pinging the host on
the local ethernet sometimes (e.g. several times a minute, but not
always) results in a round-trip time of larger than two seconds. The
packets are not lost though, they're only severly delayed. For an ssh
session to the host, this feels like somebody rocking a bad network
connector. The same behavior is visible with 2.6.26 and 2.6.26.1.
Going back to 2.6.25.11 immediately fixes the issue for me.
Syslog doesn't say anything conspicious, unfortunately. As I don't
have local access to the box, I cannot say whether it's only the
network that freezes or whether it's the entire box.
Does it make sense to test any later 2.6.25.x kernel, or is there any
post-2.6.26 patch available that may fix the issue for me?
Strangely, another DL 140 from the same charge runs just fine with
2.6.26.2.
Here is the output of lspci -vvv for the network interface:
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 02)
Subsystem: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (16000ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at febc0000 (64-bit, non-prefetchable) [size=64K]
Region 2: Memory at febb0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [40] PCI-X non-bridge device
Command: DPERE- ERO- RBC=2048 OST=1
Status: Dev=02:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
Capabilities: [48] Power Management version ...From: Marc Haber <mh+linux-kernel@zugschlus.de> Date: Tue, 19 Aug 2008 19:20:19 +0200 --
The may be the 2.5 second polling that was mistakenly added to the code instead of the intended 2.5 msec. It has been fixed a few days ago in the net-2.6 tree: tg3: Fix firmware event timeouts and it should be in Linus' tree very soon. This reminds us to send the same patch to -stable. Thanks. --
Not having much clue about git, can you send me the patch to try locally? I am not even sure to have a network issue here. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 --
Can you try the attached patch? The patch reduces the delay back to what it should have been. If this helps, then it means you are being bitten by the same bug the upstream patch fixed.
It looks like the issue is fixed now. Thanks for your help. Will this fix be in 2.6.26.3 and/or 2.6.27? Now I need to understand why the other, nearly[1] identical box didn't need the patch to function properly. Greetings Marc [1] only difference is that the working box has two e1000 interfaces on a PCI card in addition to the two tg3 interfaces on board (all four of them being in use) -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 --
It was just submitted to -stable so it should appear 2.6.26.3 or .4. It depends on whether you have ASF enabled or not and what version of ASF you have. ASF is management firmware running inside the NIC. When --
Both machines have that string in their dmesg. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 --
It may be different versions of ASF. Try ethtool -i eth0. It may tell us the firmware version. --
$ sudo ethtool -i eth0 driver: tg3 version: 3.92.1 firmware-version: 5704-v3.26 bus-info: 0000:02:00.0 The other box is the same, only that bus-info ends in .1, and the .0 device is unused. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 --
