Re: tg3 driver not advertising 1000mbit

Previous thread: net: fix network drivers ndo_start_xmit() return values (part 3) by Patrick McHardy on Friday, June 12, 2009 - 7:08 am. (2 messages)

Next thread: net: fix network drivers ndo_start_xmit() return values (part 4) by Patrick McHardy on Friday, June 12, 2009 - 7:37 am. (1 message)
From: Jean-Louis Dupond
Date: Friday, June 12, 2009 - 7:05 am

Hello,

I'm experiencing a problem with my "Broadcom Corporation NetXtreme
BCM5722 Gigabit Ethernet PCI Express" network card in my Dell R300
servers. When booting the server, the network card sometimes doesn't
advertise gigabit speeds, and so it auto-negotiates @ 10mbit. The speed
can then be set to gigabit with mii-tool, but its not a good solution!
On the other hand, sometimes when it boots, it just works perfectly, and
advertises gigabit speeds & auto-negotiates on gigabit!

@ 10mbit & gigabit.


On line 809 frame_val is filled with a read (tr32) from the device.
On line 810 we check if its not 'MI_COM_BUSY'
If not then we have a delay of 5, and then we read it again ?!
I don't know why the value is read twice! I checked with some other
drivers (also broadcom) and there the read command was given some other
argument when reading the BUSY state, and then in the if, it was really
fetching the data. But in this case, we only have 2 times the same
argument!

With the original code the server booted into 10mbit the half of the
boots! When I removed line 811 & 812, then it booted into 10mbit only
1/20 times ! Which is way better ! But its still not fully fixed!

---------------------------------------------------------------------------

Today I found the programmers documentation on the Broadcom website and it

Here u can see its NOT needed to read the value twice.
Also there is no delay mentioned, so I removed it, and rebooted 20 times
without 1 time on 10mbit !


Feel free to give any other solutions / comments !

Sincerely,
Jean-Louis Dupond
--

From: Michael Chan
Date: Friday, June 12, 2009 - 11:37 am

This code was written like this to make sure we get the correct MDIO
data.  The data is supposed to be valid when the MI_COM_BUSY bit is
cleared.  But on some chips, the data may not be ready until some
microseconds after the BUSY bit is cleared.

When you see the "wrong" speed being established, please provide the mii
register dump using mii-tool -vvv eth0.  We'll then be able to see what
we advertised and what the other side advertised.



--

From: Jean-Louis Dupond
Date: Friday, June 12, 2009 - 2:51 pm

Hello!

Here is a mii-tool -vvv on a box that doesn't advertise gbit speeds! It 
just doesn't advertise gbit @ random, sometimes it does, sometimes not! 
Without any logic in it!

# mii-tool  -vvv
Using SIOCGMIIPHY=0x8947
eth0: link ok
    registers for MII PHY 1:
      1000 794d 0143 bed0 05e1 0000 0064 2001
      0000 0300 0000 0000 0000 0000 0000 3000
      0000 0101 0000 0000 0000 0000 0000 0000
      7477 0104 0000 ffff 2801 0000 8000 0000
    product info: vendor 00:50:ef, model 45 rev 0
    basic mode:   autonegotiation enabled
    basic status: link ok
    capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
    advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-
control

# dmesg | grep tg3
[    3.331702] tg3.c:v3.92.1 (June 9, 2008)
[   18.238654] tg3: eth0: Link is up at 10 Mbps, half duplex.
[   18.238654] tg3: eth0: Flow control is off for TX and off for RX.


Sincerely,
Jean-Louis Dupond

--

From: Michael Chan
Date: Friday, June 12, 2009 - 3:01 pm

Register 1 shows that autoneg did not complete (bit 5 is not set).
The tg3 device has advertised 10/100/1000 in register 4 and register 9,
but registers 5 and 0xa (link partner's advertisement registers are 0).

When it works, these registers should look very different.




--

From: Jean-Louis Dupond
Date: Friday, June 12, 2009 - 3:16 pm

I surely tried other cables. But its surely not the cable because:
1) We tried other cables
2) After a reboot it works, or after a mii-tool -R
3) We have this issue on like +100 servers :(

The servers are connected on a Dell PowerConnect 6248 switch.

Sincerely

--

From: Michael Chan
Date: Friday, June 12, 2009 - 6:31 pm

I'll ask our lab to test this.  Thanks.

--

From: Matt Carlson
Date: Wednesday, June 24, 2009 - 10:36 am

Was there a version of the kernel where this device worked reliably?

Can you post the driver sign-on messages?  (I'm looking to see if ASF
is enabled.)


--

From: Jean-Louis Dupond
Date: Friday, June 26, 2009 - 2:33 pm

Hi,

All tested kernels had the same issue (2.6.22, 2.5.25.4 & 2.6.28.1).

dmesg |grep tg3:

tg3.c:v3.97 (December 10, 2008)
tg3 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
tg3 0000:01:00.0: setting latency timer to 64
tg3 0000:01:00.0: PME# disabled
tg3 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
tg3 0000:02:00.0: setting latency timer to 64
tg3 0000:02:00.0: PME# disabled
tg3 0000:01:00.0: PME# disabled
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.

Now here it booted @ Gbit speeds!

--

From: Matt Carlson
Date: Friday, June 26, 2009 - 7:26 pm

O.K.  So that means the problem wasn't recently introduced.  Good to

The specific set of lines I'm looks something like this:

eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:10:18:15:16:b6
eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1])
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
                                        ^^^^^^
eth0: dma_rwctrl[76180000] dma_mask[64-bit]


Like Michael, I'm tempted to look into the cabling too.  How long are
they?  Silly question, but are they Cat5e?  When link does come up, does

--

From: Jean-Louis Dupond
Date: Saturday, June 27, 2009 - 2:58 am

Hi,

Got the output for gbit & 100mbit now.

GBIT:
eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 
00:22:19:be:c4:48
eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1])
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
eth0: dma_rwctrl[76180000] dma_mask[64-bit]
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.


100mbit (sometimes it goes 100mbit instead of gbit also):
eth0: Tigon3 [partno(BCM95722) rev a200 PHY(5722/5756)] (PCI Express) 
10/100/1000Base-T Ethernet 00:22:19:c7:28:c3
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[76180000] dma_mask[64-bit]
tg3: eth0: Link is up at 100 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.


Cables are 2 meters long & cat 5e. The servers are in racks, so its all 
near each other :)


--

From: Matt Carlson
Date: Monday, June 29, 2009 - 11:50 am

O.K.  ASF is enabled.  Can you give me the output of 'ethtool -i eth0'
on as late a kernel (or driver version) as you can?  This should give me
the firmware version number.


--

From: Jean-Louis Dupond
Date: Tuesday, June 30, 2009 - 2:20 am

# ethtool -i eth0
driver: tg3
version: 3.97
firmware-version: 5722-v3.08, ASFIPMI v6.02
bus-info: 0000:01:00.0

Kernel version 2.6.29.4

--

From: Matt Carlson
Date: Thursday, July 2, 2009 - 9:42 am

Rats.  I mirrored your setup here, but I still can't reproduce the
problem.  I still suspect this is a bad driver <=> firmware interaction.

Can you apply the following patch and show me the resulting syslog
entries?  The patch is just making sure the firmware request to shutdown
really goes through.


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 46a3f86..900e28b 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -1124,6 +1124,9 @@ static void tg3_wait_for_event_ack(struct tg3 *tp)
 			break;
 		udelay(8);
 	}
+
+	if (i == delay_cnt)
+		printk( KERN_WARNING "Firmware didn't ack driver event!\n" );
 }
 
 /* tp->lock is held. */
@@ -6330,12 +6333,16 @@ static void tg3_stop_fw(struct tg3 *tp)
 		/* Wait for RX cpu to ACK the previous event. */
 		tg3_wait_for_event_ack(tp);
 
+		printk( KERN_NOTICE "%s: Stopping firmware.\n", tp->dev->name );
+
 		tg3_write_mem(tp, NIC_SRAM_FW_CMD_MBOX, FWCMD_NICDRV_PAUSE_FW);
 
 		tg3_generate_fw_event(tp);
 
 		/* Wait for RX cpu to ACK this event. */
 		tg3_wait_for_event_ack(tp);
+
+		printk( KERN_NOTICE "%s: Operation completed.\n", tp->dev->name );
 	}
 }
 
@@ -7537,6 +7544,8 @@ static void tg3_timer(unsigned long __opaque)
 		    !(tp->tg3_flags3 & TG3_FLG3_ENABLE_APE)) {
 			tg3_wait_for_event_ack(tp);
 
+			printk( KERN_NOTICE "%s: Sending keepalive event.\n", tp->dev->name );
+
 			tg3_write_mem(tp, NIC_SRAM_FW_CMD_MBOX,
 				      FWCMD_NICDRV_ALIVE3);
 			tg3_write_mem(tp, NIC_SRAM_FW_CMD_LEN_MBOX, 4);

--

From: Krzysztof Olędzki
Date: Wednesday, November 24, 2010 - 1:09 pm

Hello,

Have you been able to solve this issue? I have a similar problem with 
Dell PowerEdge R300 servers connected to HP2610 100Mbps switches. The 
servers contain two BCM5722 NICs and after a reboot, with probability 
about 70%, I end up with 10Mbps HD mainly on the first NIC.

I discovered that it is enough to run:
  /sbin/mii-tool -R eth0
  /sbin/mii-tool -R eth1
to trigger renegotiation that brings expected 100Mbps FD. For now, I 
added this to my startups scripts as a workaround.

This problem exists in 2.6.30-stable, 2.6.31-stable and 2.6.34-stable 
which I'm currently running.

Best regards,

			Krzysztof Olędzki
--

From: Jean-Louis Dupond
Date: Wednesday, November 24, 2010 - 3:27 pm

I didn't do more research on the issue.

The guys @ broadcom advised me to do BIOS update, so the firmware of the 
NIC is updated.

Maby you can try that also?

Sincerely,
Jean-Louis Dupond

--

Previous thread: net: fix network drivers ndo_start_xmit() return values (part 3) by Patrick McHardy on Friday, June 12, 2009 - 7:08 am. (2 messages)

Next thread: net: fix network drivers ndo_start_xmit() return values (part 4) by Patrick McHardy on Friday, June 12, 2009 - 7:37 am. (1 message)