Hello, I'm experiencing a problem with my "Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express" network card in my Dell R300 servers. When booting the server, the network card sometimes doesn't advertise gigabit speeds, and so it auto-negotiates @ 10mbit. The speed can then be set to gigabit with mii-tool, but its not a good solution! On the other hand, sometimes when it boots, it just works perfectly, and advertises gigabit speeds & auto-negotiates on gigabit! @ 10mbit & gigabit. On line 809 frame_val is filled with a read (tr32) from the device. On line 810 we check if its not 'MI_COM_BUSY' If not then we have a delay of 5, and then we read it again ?! I don't know why the value is read twice! I checked with some other drivers (also broadcom) and there the read command was given some other argument when reading the BUSY state, and then in the if, it was really fetching the data. But in this case, we only have 2 times the same argument! With the original code the server booted into 10mbit the half of the boots! When I removed line 811 & 812, then it booted into 10mbit only 1/20 times ! Which is way better ! But its still not fully fixed! --------------------------------------------------------------------------- Today I found the programmers documentation on the Broadcom website and it Here u can see its NOT needed to read the value twice. Also there is no delay mentioned, so I removed it, and rebooted 20 times without 1 time on 10mbit ! Feel free to give any other solutions / comments ! Sincerely, Jean-Louis Dupond --
This code was written like this to make sure we get the correct MDIO data. The data is supposed to be valid when the MI_COM_BUSY bit is cleared. But on some chips, the data may not be ready until some microseconds after the BUSY bit is cleared. When you see the "wrong" speed being established, please provide the mii register dump using mii-tool -vvv eth0. We'll then be able to see what we advertised and what the other side advertised. --
Hello!
Here is a mii-tool -vvv on a box that doesn't advertise gbit speeds! It
just doesn't advertise gbit @ random, sometimes it does, sometimes not!
Without any logic in it!
# mii-tool -vvv
Using SIOCGMIIPHY=0x8947
eth0: link ok
registers for MII PHY 1:
1000 794d 0143 bed0 05e1 0000 0064 2001
0000 0300 0000 0000 0000 0000 0000 3000
0000 0101 0000 0000 0000 0000 0000 0000
7477 0104 0000 ffff 2801 0000 8000 0000
product info: vendor 00:50:ef, model 45 rev 0
basic mode: autonegotiation enabled
basic status: link ok
capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-
control
# dmesg | grep tg3
[ 3.331702] tg3.c:v3.92.1 (June 9, 2008)
[ 18.238654] tg3: eth0: Link is up at 10 Mbps, half duplex.
[ 18.238654] tg3: eth0: Flow control is off for TX and off for RX.
Sincerely,
Jean-Louis Dupond
--
Register 1 shows that autoneg did not complete (bit 5 is not set). The tg3 device has advertised 10/100/1000 in register 4 and register 9, but registers 5 and 0xa (link partner's advertisement registers are 0). When it works, these registers should look very different. --
I surely tried other cables. But its surely not the cable because: 1) We tried other cables 2) After a reboot it works, or after a mii-tool -R 3) We have this issue on like +100 servers :( The servers are connected on a Dell PowerConnect 6248 switch. Sincerely --
Was there a version of the kernel where this device worked reliably? Can you post the driver sign-on messages? (I'm looking to see if ASF is enabled.) --
Hi, All tested kernels had the same issue (2.6.22, 2.5.25.4 & 2.6.28.1). dmesg |grep tg3: tg3.c:v3.97 (December 10, 2008) tg3 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 tg3 0000:01:00.0: setting latency timer to 64 tg3 0000:01:00.0: PME# disabled tg3 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 tg3 0000:02:00.0: setting latency timer to 64 tg3 0000:02:00.0: PME# disabled tg3 0000:01:00.0: PME# disabled tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. Now here it booted @ Gbit speeds! --
O.K. So that means the problem wasn't recently introduced. Good to
The specific set of lines I'm looks something like this:
eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:10:18:15:16:b6
eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1])
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
^^^^^^
eth0: dma_rwctrl[76180000] dma_mask[64-bit]
Like Michael, I'm tempted to look into the cabling too. How long are
they? Silly question, but are they Cat5e? When link does come up, does
--
Hi, Got the output for gbit & 100mbit now. GBIT: eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:22:19:be:c4:48 eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1]) eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] eth0: dma_rwctrl[76180000] dma_mask[64-bit] tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. 100mbit (sometimes it goes 100mbit instead of gbit also): eth0: Tigon3 [partno(BCM95722) rev a200 PHY(5722/5756)] (PCI Express) 10/100/1000Base-T Ethernet 00:22:19:c7:28:c3 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[76180000] dma_mask[64-bit] tg3: eth0: Link is up at 100 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. Cables are 2 meters long & cat 5e. The servers are in racks, so its all near each other :) --
O.K. ASF is enabled. Can you give me the output of 'ethtool -i eth0' on as late a kernel (or driver version) as you can? This should give me the firmware version number. --
# ethtool -i eth0 driver: tg3 version: 3.97 firmware-version: 5722-v3.08, ASFIPMI v6.02 bus-info: 0000:01:00.0 Kernel version 2.6.29.4 --
Rats. I mirrored your setup here, but I still can't reproduce the
problem. I still suspect this is a bad driver <=> firmware interaction.
Can you apply the following patch and show me the resulting syslog
entries? The patch is just making sure the firmware request to shutdown
really goes through.
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 46a3f86..900e28b 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -1124,6 +1124,9 @@ static void tg3_wait_for_event_ack(struct tg3 *tp)
break;
udelay(8);
}
+
+ if (i == delay_cnt)
+ printk( KERN_WARNING "Firmware didn't ack driver event!\n" );
}
/* tp->lock is held. */
@@ -6330,12 +6333,16 @@ static void tg3_stop_fw(struct tg3 *tp)
/* Wait for RX cpu to ACK the previous event. */
tg3_wait_for_event_ack(tp);
+ printk( KERN_NOTICE "%s: Stopping firmware.\n", tp->dev->name );
+
tg3_write_mem(tp, NIC_SRAM_FW_CMD_MBOX, FWCMD_NICDRV_PAUSE_FW);
tg3_generate_fw_event(tp);
/* Wait for RX cpu to ACK this event. */
tg3_wait_for_event_ack(tp);
+
+ printk( KERN_NOTICE "%s: Operation completed.\n", tp->dev->name );
}
}
@@ -7537,6 +7544,8 @@ static void tg3_timer(unsigned long __opaque)
!(tp->tg3_flags3 & TG3_FLG3_ENABLE_APE)) {
tg3_wait_for_event_ack(tp);
+ printk( KERN_NOTICE "%s: Sending keepalive event.\n", tp->dev->name );
+
tg3_write_mem(tp, NIC_SRAM_FW_CMD_MBOX,
FWCMD_NICDRV_ALIVE3);
tg3_write_mem(tp, NIC_SRAM_FW_CMD_LEN_MBOX, 4);
--
Hello, Have you been able to solve this issue? I have a similar problem with Dell PowerEdge R300 servers connected to HP2610 100Mbps switches. The servers contain two BCM5722 NICs and after a reboot, with probability about 70%, I end up with 10Mbps HD mainly on the first NIC. I discovered that it is enough to run: /sbin/mii-tool -R eth0 /sbin/mii-tool -R eth1 to trigger renegotiation that brings expected 100Mbps FD. For now, I added this to my startups scripts as a workaround. This problem exists in 2.6.30-stable, 2.6.31-stable and 2.6.34-stable which I'm currently running. Best regards, Krzysztof Olędzki --
I didn't do more research on the issue. The guys @ broadcom advised me to do BIOS update, so the firmware of the NIC is updated. Maby you can try that also? Sincerely, Jean-Louis Dupond --
