Re: 2.6.26/tg3 ping roundtrip times > 2000 ms on local network

Previous thread: Re: INFO: task blocked for more than 120 seconds by Michael Madore on Tuesday, August 19, 2008 - 11:00 am. (2 messages)

Next thread: Re: [PATCH] X86: Change the default value of nr_irqs from 32 to NR_IRQs by Yinghai Lu on Tuesday, August 19, 2008 - 11:24 am. (9 messages)
From: Marc Haber
Date: Tuesday, August 19, 2008 - 10:20 am

Hi,

I have one HP DL 140 G1 running with Debian stable and a
locally-built, vanilla kernel. With 2.6.25.11, everything is fine.

But, after updating to 2.6.26.2, I noticed that pinging the host on
the local ethernet sometimes (e.g. several times a minute, but not
always) results in a round-trip time of larger than two seconds. The
packets are not lost though, they're only severly delayed. For an ssh
session to the host, this feels like somebody rocking a bad network
connector. The same behavior is visible with 2.6.26 and 2.6.26.1.

Going back to 2.6.25.11 immediately fixes the issue for me.

Syslog doesn't say anything conspicious, unfortunately. As I don't
have local access to the box, I cannot say whether it's only the
network that freezes or whether it's the entire box.

Does it make sense to test any later 2.6.25.x kernel, or is there any
post-2.6.26 patch available that may fix the issue for me?

Strangely, another DL 140 from the same charge runs just fine with
2.6.26.2.

Here is the output of lspci -vvv for the network interface:

02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 02)
        Subsystem: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (16000ns min), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        Region 0: Memory at febc0000 (64-bit, non-prefetchable) [size=64K]
        Region 2: Memory at febb0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [40] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=2048 OST=1
                Status: Dev=02:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
        Capabilities: [48] Power Management version ...
From: David Miller
Date: Tuesday, August 19, 2008 - 1:20 pm

From: Marc Haber <mh+linux-kernel@zugschlus.de>
Date: Tue, 19 Aug 2008 19:20:19 +0200

--

From: Michael Chan
Date: Tuesday, August 19, 2008 - 10:29 am

The may be the 2.5 second polling that was mistakenly added to the code
instead of the intended 2.5 msec.

It has been fixed a few days ago in the net-2.6 tree:

tg3: Fix firmware event timeouts

and it should be in Linus' tree very soon.  This reminds us to send the
same patch to -stable.

Thanks.


--

From: Marc Haber
Date: Tuesday, August 19, 2008 - 3:14 pm

Not having much clue about git, can you send me the patch to try
locally?

I am not even sure to have a network issue here.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Matt Carlson
Date: Tuesday, August 19, 2008 - 3:30 pm

Can you try the attached patch?  The patch reduces the delay back to
what it should have been.  If this helps, then it means you are being
bitten by the same bug the upstream patch fixed.

From: Marc Haber
Date: Wednesday, August 20, 2008 - 10:47 am

It looks like the issue is fixed now. Thanks for your help.

Will this fix be in 2.6.26.3 and/or 2.6.27?

Now I need to understand why the other, nearly[1] identical box didn't
need the patch to function properly.

Greetings
Marc

[1] only difference is that the working box has two e1000 interfaces
on a PCI card in addition to the two tg3 interfaces on board (all four
of them being in use)

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Michael Chan
Date: Wednesday, August 20, 2008 - 7:11 am

It was just submitted to -stable so it should appear 2.6.26.3 or .4.

It depends on whether you have ASF enabled or not and what version of
ASF you have.  ASF is management firmware running inside the NIC.  When


--

From: Marc Haber
Date: Thursday, August 21, 2008 - 9:09 am

Both machines have that string in their dmesg.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

From: Michael Chan
Date: Thursday, August 21, 2008 - 9:21 am

It may be different versions of ASF.  Try ethtool -i eth0.  It
may tell us the firmware version.

--

From: Marc Haber
Date: Friday, August 22, 2008 - 4:33 am

$ sudo ethtool -i eth0
driver: tg3
version: 3.92.1
firmware-version: 5704-v3.26
bus-info: 0000:02:00.0

The other box is the same, only that bus-info ends in .1, and the .0
device is unused.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--

Previous thread: Re: INFO: task blocked for more than 120 seconds by Michael Madore on Tuesday, August 19, 2008 - 11:00 am. (2 messages)

Next thread: Re: [PATCH] X86: Change the default value of nr_irqs from 32 to NR_IRQs by Yinghai Lu on Tuesday, August 19, 2008 - 11:24 am. (9 messages)