r8169 chips on some Intel D945GSEJT boards fail to work after PXE boot

Previous thread: Getting physical packet counts with LRO enabled with ixgbe? by Ben Greear on Wednesday, September 23, 2009 - 9:40 am. (8 messages)

Next thread: [PATCH 0/8] [RFC] CAIF Protocol Stack by sjur.brandeland on Wednesday, September 23, 2009 - 10:30 am. (1 message)
From: Simon Farnsworth
Date: Wednesday, September 23, 2009 - 9:57 am

Hello,

I'm having trouble getting Intel D945GSEJT boards to reliably install
via PXE boot. They all have apparently identical r8169 chips, and I'm
using the r8169 driver from Fedora's 2.6.30-1 kernel; I've also tried
porting the changes in r8169.c from Linus's git
85910a8e9f425656bb7202d0fc62800000ffa262 to the kernel I'm using,
without success.

Some boards are good, and just work, whether I boot via PXE or boot from
the local disk; dmesg.working and lspci.working are from a good board.

Some boards are bad; they work fine if I boot from local disk (including
network), but the kernel cannot detect link, or send or receive data if
I PXE boot. dmesg.broken and lspci.broken are from a bad board.

I've tried disabling MSI, in case it's an interrupt issue, which hasn't
helped; unfortunately, the pungi-generated initramfs for PXE boot
doesn't have a shell I can use to interrogate the kernel.

I've updated them to the current BIOS revision,
JT94510H.86A.0037.2009.0820.1551, which hasn't helped. I'm happy to try
any suggestions, or to provide more information if needed.
-- 
Simon Farnsworth
From: Francois Romieu
Date: Wednesday, September 23, 2009 - 1:57 pm

Simon Farnsworth <simon.farnsworth@onelan.com> :

No cunning theroy in sight but does reducing the amount of memory on a
bad board from 1 Go to 512 Mo turn it into a good one ?

The failing board exhibits a correctable error status bit. Clearing it
is the least we can do.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 50c6a3c..79bc4ab 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2200,6 +2200,11 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	tp->pcie_cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
 	if (!tp->pcie_cap && netif_msg_probe(tp))
 		dev_info(&pdev->dev, "no PCI Express capability\n");
+	else {
+		pci_write_config_word(pdev, tp->pcie_cap + PCI_EXP_DEVSTA,
+				      PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
+				      PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD);
+	}
 
 	RTL_W16(IntrMask, 0x0000);
 
-- 
Ueimor
--

From: Simon Farnsworth
Date: Thursday, September 24, 2009 - 4:12 am

We've tried this, and we've tried 2GB and 1GB modules; the failure to
boot sticks with the board, not with the memory module. On my most
recent attempt, the failing board isn't showing a correctable error
status, so I've not yet tried your patch, on the assumption that it just
clears the error status.

Is my assumption wrong? If not, is there anything else I can do that

-- 
Simon Farnsworth

--

From: Francois Romieu
Date: Wednesday, September 30, 2009 - 3:07 pm

Simon Farnsworth <simon.farnsworth@onelan.com> :

Try this against 2.6.31 or latest -rc.


diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 50c6a3c..74488a6 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -115,7 +115,9 @@ enum mac_version {
 	RTL_GIGA_MAC_VER_22 = 0x16, // 8168C
 	RTL_GIGA_MAC_VER_23 = 0x17, // 8168CP
 	RTL_GIGA_MAC_VER_24 = 0x18, // 8168CP
-	RTL_GIGA_MAC_VER_25 = 0x19  // 8168D
+	RTL_GIGA_MAC_VER_25 = 0x19, // 8168D
+	RTL_GIGA_MAC_VER_26 = 0x1a, // 8168D
+	RTL_GIGA_MAC_VER_27 = 0x1b  // 8168DP
 };
 
 #define _R(NAME,MAC,MASK) \
@@ -150,7 +152,9 @@ static const struct {
 	_R("RTL8168c/8111c",	RTL_GIGA_MAC_VER_22, 0xff7e1880), // PCI-E
 	_R("RTL8168cp/8111cp",	RTL_GIGA_MAC_VER_23, 0xff7e1880), // PCI-E
 	_R("RTL8168cp/8111cp",	RTL_GIGA_MAC_VER_24, 0xff7e1880), // PCI-E
-	_R("RTL8168d/8111d",	RTL_GIGA_MAC_VER_25, 0xff7e1880)  // PCI-E
+	_R("RTL8168d/8111d",	RTL_GIGA_MAC_VER_25, 0xff7e1880), // PCI-E
+	_R("RTL8168d/8111d",	RTL_GIGA_MAC_VER_26, 0xff7e1880), // PCI-E
+	_R("RTL8168dp/8111dp",	RTL_GIGA_MAC_VER_27, 0xff7e1880)  // PCI-E
 };
 #undef _R
 
@@ -253,6 +257,13 @@ enum rtl8168_8101_registers {
 	DBG_REG			= 0xd1,
 #define	FIX_NAK_1			(1 << 4)
 #define	FIX_NAK_2			(1 << 3)
+	EFUSEAR			= 0xdc,
+#define	EFUSEAR_FLAG			0x80000000
+#define	EFUSEAR_WRITE_CMD		0x80000000
+#define	EFUSEAR_READ_CMD		0x00000000
+#define	EFUSEAR_REG_MASK		0x03ff
+#define	EFUSEAR_REG_SHIFT		8
+#define	EFUSEAR_DATA_MASK		0xff
 };
 
 enum rtl_register_content {
@@ -568,6 +579,14 @@ static void mdio_patch(void __iomem *ioaddr, int reg_addr, int value)
 	mdio_write(ioaddr, reg_addr, mdio_read(ioaddr, reg_addr) | value);
 }
 
+static void mdio_plus_minus(void __iomem *ioaddr, int reg_addr, int p, int m)
+{
+	int val;
+
+	val = mdio_read(ioaddr, reg_addr);
+	mdio_write(ioaddr, reg_addr, (val | p) & ~m);
+}
+
 static void rtl_mdio_write(struct net_device *dev, int phy_id, int location,
 			   int val)
 {
@@ -651,6 +670,24 @@ static u32 ...
From: Simon Farnsworth
Date: Monday, October 5, 2009 - 2:47 am

This worked for my boards.

Thanks for your help,



-- 
Simon Farnsworth

--

From: Francois Romieu
Date: Tuesday, October 6, 2009 - 2:56 pm

Francois Romieu <romieu@fr.zoreil.com> :

Can you check if this part of the patch is required to fix
your issue ?

I'd rather avoid including it under the 8168d support banner
if it is not needed.

-- 
Ueimor
--

From: Simon Farnsworth
Date: Wednesday, October 7, 2009 - 3:39 am

I can confirm that I don't need that hunk.
-- 
Simon Farnsworth

--

Previous thread: Getting physical packet counts with LRO enabled with ixgbe? by Ben Greear on Wednesday, September 23, 2009 - 9:40 am. (8 messages)

Next thread: [PATCH 0/8] [RFC] CAIF Protocol Stack by sjur.brandeland on Wednesday, September 23, 2009 - 10:30 am. (1 message)