nfe(4) hardware checksum support

Previous thread: YOUR LUCKY DAY HAS COME by DAVID BROWN on Friday, December 29, 2006 - 6:34 pm. (1 message)

Next thread: Question about line discipline linesw struct by J Sevy on Monday, January 1, 2007 - 10:05 pm. (1 message)
Cc: <tech-kern@...>, <chs@...>, <tsutsui@...>
Date: Monday, January 1, 2007 - 2:45 am

A happy new year,

Could anyone test the attached patch which enables
hardware checksumming on nfe(4), NVIDIA nForce
integrated Ethernet?

Values in descriptor flags are teken from FreeBSD's driver
and it seems working on my nForce3 250 100/10M Ethernet,
but I'm not sure how I can confirm RX part working properly
and it's still better to test it on more other nForce chipsets
including gigabit variants.
---
Izumi Tsutsui

Index: if_nfe.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_nfe.c,v
retrieving revision 1.11
diff -u -r1.11 if_nfe.c
--- if_nfe.c 1 Jan 2007 04:13:25 -0000 1.11
+++ if_nfe.c 1 Jan 2007 06:15:40 -0000
@@ -319,12 +319,12 @@
sc->sc_ethercom.ec_capabilities |=
ETHERCAP_VLAN_HWTAGGING | ETHERCAP_VLAN_MTU;
#endif
-#ifdef NFE_CSUM
if (sc->sc_flags & NFE_HW_CSUM) {
- ifp->if_capabilities |= IFCAP_CSUM_IPv4 | IFCAP_CSUM_TCPv4 |
- IFCAP_CSUM_UDPv4;
+ ifp->if_capabilities |=
+ IFCAP_CSUM_IPv4_Tx | IFCAP_CSUM_IPv4_Rx |
+ IFCAP_CSUM_TCPv4_Tx | IFCAP_CSUM_TCPv4_Rx |
+ IFCAP_CSUM_UDPv4_Tx | IFCAP_CSUM_UDPv4_Rx;
}
-#endif

sc->sc_mii.mii_ifp = ifp;
sc->sc_mii.mii_readreg = nfe_miibus_readreg;
@@ -801,19 +801,23 @@
m->m_pkthdr.len = m->m_len = len;
m->m_pkthdr.rcvif = ifp;

-#ifdef notyet
- if (sc->sc_flags & NFE_HW_CSUM) {
+ if ((sc->sc_flags & NFE_HW_CSUM) != 0) {
+ /*
+ * XXX
+ * no way to check M_CSUM_IPv4_BAD or non-IPv4 packets?
+ */
if (flags & NFE_RX_IP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
+ m->m_pkthdr.csum_flags |= M_CSUM_IPv4;
+ /*
+ * XXX
+ * no way to check M_CSUM_TCP_UDP_BAD or
+ * other protocols?
+ */
if (flags & NFE_RX_UDP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_UDP_CSUM_IN_OK;
- if (flags & NFE_RX_TCP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_TCP_CSUM_IN_OK;
- }
-#elif defined(NFE_CSUM)
- if ((sc...

Cc: <tech-net@...>, <tech-kern@...>, <chs@...>
Date: Saturday, January 6, 2007 - 2:15 pm

Has anything been done to fix the nfe freeze problem which causes the
packet reception to stop completely? The reception may or may not
resume after a few minutes. This problem forced me to install a PCI
networking card and stop using nfe.

Thanks,

-jm

Cc: Izumi Tsutsui <tsutsui@...>, <tech-net@...>, <tech-kern@...>, <chs@...>
Date: Saturday, January 6, 2007 - 9:18 pm

This should be fixed in -current; can you please try the latest revision
of the driver?

Cheers,
Jared

To: Jared D. McNeill <jmcneill@...>
Cc: Jukka Marin <jmarin@...>, Izumi Tsutsui <tsutsui@...>, <tech-net@...>, <tech-kern@...>, <chs@...>
Date: Wednesday, January 24, 2007 - 1:57 pm

After more testing - revision 1.12 still has problems under heavy packet
reception load, such as during large HFS operations. Reception seems to
recover after 30 seconds or so (with the older revisions, I had to reboot
to get nfe going again, so this is an improvement).

There seems to be no problem under regular use - but when I try to edit
video files located on a remote machine, the problems begin.

-jm

To: <jmarin@...>
Cc: <jmcneill@...>, <tech-net@...>, <tech-kern@...>, <chs@...>, <tsutsui@...>
Date: Thursday, January 25, 2007 - 11:01 am

How about the attached patch?
(though I can't test it on so heavy load)
---
Izumi Tsutsui

Index: if_nfe.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_nfe.c,v
retrieving revision 1.13
diff -u -r1.13 if_nfe.c
--- if_nfe.c 9 Jan 2007 10:29:27 -0000 1.13
+++ if_nfe.c 25 Jan 2007 14:59:48 -0000
@@ -490,36 +490,50 @@
struct nfe_softc *sc = arg;
struct ifnet *ifp = &sc->sc_ethercom.ec_if;
uint32_t r;
+ int handled;

- if ((r = NFE_READ(sc, NFE_IRQ_STATUS)) == 0)
- return 0; /* not for us */
- NFE_WRITE(sc, NFE_IRQ_STATUS, r);
+ if ((ifp->if_flags & IFF_UP) == 0)
+ return 0;

- DPRINTFN(5, ("nfe_intr: interrupt register %x\n", r));
+ handled = 0;

NFE_WRITE(sc, NFE_IRQ_MASK, 0);

- if (r & NFE_IRQ_LINK) {
- NFE_READ(sc, NFE_PHY_STATUS);
- NFE_WRITE(sc, NFE_PHY_STATUS, 0xf);
- DPRINTF(("%s: link state changed\n", sc->sc_dev.dv_xname));
- }
-
- if (ifp->if_flags & IFF_RUNNING) {
- /* check Rx ring */
- nfe_rxeof(sc);
+ for (;;) {
+ r = NFE_READ(sc, NFE_IRQ_STATUS);
+ if ((r & NFE_IRQ_WANTED) == 0)
+ break;

- /* check Tx ring */
- nfe_txeof(sc);
+ NFE_WRITE(sc, NFE_IRQ_STATUS, r);
+ handled = 1;
+ DPRINTFN(5, ("nfe_intr: interrupt register %x\n", r));
+
+ if ((r & (NFE_IRQ_RXERR | NFE_IRQ_RX_NOBUF | NFE_IRQ_RX))
+ != 0) {
+ /* check Rx ring */
+ nfe_rxeof(sc);
+ }
+
+ if ((r & (NFE_IRQ_TXERR | NFE_IRQ_TXERR2 | NFE_IRQ_TX_DONE))
+ != 0) {
+ /* check Tx ring */
+ nfe_txeof(sc);
+ }
+
+ if ((r & NFE_IRQ_LINK) != 0) {
+ NFE_READ(sc, NFE_PHY_STATUS);
+ NFE_WRITE(sc, NFE_PHY_STATUS, 0xf);
+ DPRINTF(("%s: link state changed\n",
+ sc->sc_dev.dv_xname));
+ }
}

NFE_WRITE(sc, NFE_IRQ_MASK, NFE_IRQ_WANTED);

- if (ifp->if_flags & IFF_RUNNING &&
- !IF_IS_EMPTY(&ifp->if_snd))
+ if (handled && !IF_IS_EMPTY(&ifp->if_snd))
nfe_start(ifp);
...

To: Izumi Tsutsui <tsutsui@...>
Cc: <jmarin@...>, <jmcneill@...>, <tech-net@...>, <tech-kern@...>, <chs@...>
Date: Monday, January 29, 2007 - 6:55 am

I applied the patch today and built a new kernel from the -current sources
of today. Unfortunately, the patch doesn't seem to fix the problem.
When accessing two NFS servers simultaneously, packet reception stops
every now and then for about 30 seconds at a time.

-jm

To: <jmarin@...>
Cc: <tech-net@...>, <tech-kern@...>, <tsutsui@...>
Date: Saturday, February 3, 2007 - 11:46 pm

Hmm, I'm afraid it's difficult to debug it only by code inspection,
but could you check which TX or RX could cause problem (by ttcp etc.)
and interrupt/network statistics during stall (by vmstat -i or
netstat -i etc.)?

If FreeBSD or OpenBSD have the similar problem, maybe
we need chip docs (there are several magic in the source).
---
Izumi Tsutsui

To: Izumi Tsutsui <tsutsui@...>
Cc: <jmarin@...>, <tech-net@...>, <tech-kern@...>
Date: Sunday, February 4, 2007 - 11:56 am

Not sure if it's related, but you might want to have a look at the
DragonFly modifications to the nfe driver to prevent watchdog timeouts:

http://www.dragonflybsd.org/cvsweb/src/sys/dev/netif/nfe/if_nfe.c?
rev=1.1&content-type=text/x-cvsweb-markup

Cheers,
Jared

To: <jmcneill@...>
Cc: <jmarin@...>, <tech-net@...>, <tech-kern@...>, <tsutsui@...>
Date: Sunday, February 4, 2007 - 12:37 pm

I think this has been pulled via OpenBSD in if_nfe.c rev 1.11.
(though I have not checked their other changes yet)
---
Izumi Tsutsui

To: Izumi Tsutsui <tsutsui@...>
Cc: <jmarin@...>, <tech-net@...>, <tech-kern@...>
Date: Sunday, February 4, 2007 - 12:31 am

RX is definitely the problem. When I experience such stalls (which seem
to happen as soon as a high packet rate comes in), I can still see
packets coming out through nfe. I don't remember if I checked with
tcpdump, but pinging the broadcast address does blink the whole switch,
so I'm quite positive.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"You could have made it, spitting out benchmarks
Owe it to yourself not to fail"
Amplifico, Spitting Out Benchmarks, Hometakes Vol. 2, 2005.

To: <cube@...>
Cc: <jmarin@...>, <tech-net@...>, <tech-kern@...>, <tsutsui@...>
Date: Sunday, February 4, 2007 - 1:40 am

If it's caused by some race condition, how about the attached one?

BTW, which is your port, i386 or amd64?
With a quick glance there is not any improper reordering around
descriptor access even without volatile on i386, but I'm not sure.
---
Izumi Tsutsui

Index: if_nfereg.h
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_nfereg.h,v
retrieving revision 1.3
diff -u -r1.3 if_nfereg.h
--- if_nfereg.h 9 Jan 2007 10:29:27 -0000 1.3
+++ if_nfereg.h 4 Feb 2007 05:01:10 -0000
@@ -144,9 +147,9 @@

/* Rx/Tx descriptor */
struct nfe_desc32 {
- uint32_t physaddr;
- uint16_t length;
- uint16_t flags;
+ volatile uint32_t physaddr;
+ volatile uint16_t length;
+ volatile uint16_t flags;
#define NFE_RX_FIXME_V1 0x6004
#define NFE_RX_VALID_V1 (1 << 0)
#define NFE_TX_ERROR_V1 0x7808
@@ -155,12 +158,12 @@

/* V2 Rx/Tx descriptor */
struct nfe_desc64 {
- uint32_t physaddr[2];
- uint32_t vtag;
+ volatile uint32_t physaddr[2];
+ volatile uint32_t vtag;
#define NFE_RX_VTAG (1 << 16)
#define NFE_TX_VTAG (1 << 18)
- uint16_t length;
- uint16_t flags;
+ volatile uint16_t length;
+ volatile uint16_t flags;
#define NFE_RX_FIXME_V2 0x4300
#define NFE_RX_VALID_V2 (1 << 13)
#define NFE_RX_IP_CSUMOK (1 << 12)

To: Izumi Tsutsui <tsutsui@...>
Cc: <cube@...>, <jmarin@...>, <tech-net@...>, <tech-kern@...>
Date: Saturday, March 10, 2007 - 11:47 am

I'm running i386.

All I need to do is to read files off two NFS servers simultaneously
and *freeze*, packet reception stops. If I ping some other machine
when this happens, the other machine sees all ping requests, but the
one with nfe doesn't see the replies. So, tx works and rx freezes
(until it comes back alive after 10-30 seconds and then dies again).

-jm

To: Izumi Tsutsui <tsutsui@...>
Cc: <cube@...>, <jmarin@...>, <tech-net@...>, <tech-kern@...>
Date: Sunday, February 4, 2007 - 4:51 am

I can confirm this - RX is the problem and transmitting packets during stall

Should it be applied on top of your previous patch or to clean NetBSD

i386. I have run amd64 on the system, too, but not long enough to be able
to tell if the nfe driver works better or worse there (had too many problems
with XFree and applications with amd64).

-jm

To: <jmarin@...>
Cc: <cube@...>, <tech-net@...>, <tech-kern@...>, <tsutsui@...>
Date: Sunday, February 4, 2007 - 6:23 am

Okay. Maybe we should check nfe_rxeof() especially around error paths
(I wonder what "FIXME" magic vaules mean ;-p) and which interrupt

It can be applied independently, but prefer with the previous one.
---
Izumi Tsutsui

Cc: Jukka Marin <jmarin@...>, Izumi Tsutsui <tsutsui@...>, <tech-net@...>, <tech-kern@...>, <chs@...>
Date: Wednesday, January 10, 2007 - 2:29 pm

I finally rebooted a new kernel this morning. It still does that:

64 bytes from 10.1.0.2: icmp_seq=39679 ttl=255 time=0.234 ms
64 bytes from 10.1.0.2: icmp_seq=39680 ttl=255 time=0.142 ms
64 bytes from 10.1.0.2: icmp_seq=39711 ttl=255 time=0.120 ms
64 bytes from 10.1.0.2: icmp_seq=39712 ttl=255 time=0.150 ms

I'll keep using it for a while, but if it gets too bad, I'll go back
to (gasp) rtk..

-jm

Cc: <tech-net@...>, <tech-kern@...>, <tsutsui@...>
Date: Saturday, January 6, 2007 - 8:41 pm

I haven't used nfe(4) with so heavy load, but there are
some fixes for timeout problems recently:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_nfe.c
---
Izumi Tsutsui

Cc: <tech-kern@...>
Date: Friday, January 5, 2007 - 2:19 pm

On Mon, 1 Jan 2007 15:45:15 +0900
Patch applied to my nForce 250 10/100, and looking for ways to test the
feature.

--
C=C3=A9sar Catri=C3=A1n Carre=C3=B1o

Cc: <tsutsui@...>
Date: Saturday, January 6, 2007 - 12:24 pm

After some debugging, I notice that bit definitions of
NFE_RX_UDP_CSUMOK and NFE_RX_TCP_CSUMOK in if_nfereg.h are
swapped (FreeBSD doesn't seem to distinguish them in RX csum_flags).

As Thomas E. Spanjaard said, ifconfig(8) shows/controls
hardware checksum features on each interface.

I'd like to see:
- there is no negative side effect
- any performance improvements on benchmarks like ttcp
(though I'm afraid we can't see it on 100baseTX with fast CPU)
- 'vmstat -e' output on kernel with "options TCP_CSUM_COUNTERS"
and "options UDP_CSUM_COUNTERS"
- packets which have bad checksum data are handled correctly
(I'm not sure how we can test it without special debugging programs)
etc.
---
Izumi Tsutsui

Index: if_nfe.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_nfe.c,v
retrieving revision 1.11
diff -u -r1.11 if_nfe.c
--- if_nfe.c 1 Jan 2007 04:13:25 -0000 1.11
+++ if_nfe.c 6 Jan 2007 16:18:23 -0000
@@ -319,12 +319,12 @@
sc->sc_ethercom.ec_capabilities |=
ETHERCAP_VLAN_HWTAGGING | ETHERCAP_VLAN_MTU;
#endif
-#ifdef NFE_CSUM
if (sc->sc_flags & NFE_HW_CSUM) {
- ifp->if_capabilities |= IFCAP_CSUM_IPv4 | IFCAP_CSUM_TCPv4 |
- IFCAP_CSUM_UDPv4;
+ ifp->if_capabilities |=
+ IFCAP_CSUM_IPv4_Tx | IFCAP_CSUM_IPv4_Rx |
+ IFCAP_CSUM_TCPv4_Tx | IFCAP_CSUM_TCPv4_Rx |
+ IFCAP_CSUM_UDPv4_Tx | IFCAP_CSUM_UDPv4_Rx;
}
-#endif

sc->sc_mii.mii_ifp = ifp;
sc->sc_mii.mii_readreg = nfe_miibus_readreg;
@@ -801,19 +801,31 @@
m->m_pkthdr.len = m->m_len = len;
m->m_pkthdr.rcvif = ifp;

-#ifdef notyet
- if (sc->sc_flags & NFE_HW_CSUM) {
- if (flags & NFE_RX_IP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
- if (flags & NFE_RX_UDP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_UDP_CSUM_IN_OK;
- if (flags & NFE_RX_TCP_CSUMOK)
- m->m_pkthdr.csum_flags |= M_TCP_CSUM_IN_OK;
- }
-#elif defined(NFE_CSUM)
- if (...

Cc: <tech-kern@...>
Date: Monday, January 8, 2007 - 12:38 pm

On Sun, 7 Jan 2007 01:24:51 +0900
Applied.

Results obtained with iperf are shown below.

Pcap captures are available at ftp://mioficina.cjc.cl/pub/NetBSD

Regards

--
C=C3=A9sar Catri=C3=A1n Carre=C3=B1o

$ iperf -v
iperf version 2.0.2 (03 May 2005) pthreads
Server : PII 400MHz with vr0
Client : AMD 2800+ 1800MHz with nfe0 integrated on motherboard Asus K8N.
Both connected via crossover cable.

/* -------- SERVER OUTPUT -------- */

# uname -a
NetBSD 4.99.7 NetBSD 4.99.7 (GENERIC) #0: Thu Jan 4 20:34:00 CLST 2007 =
cetrox
@core.cjc.cl:/home/src/netbsd-current/src/sys/arch/i386/compile/obj/GENER=
IC i386

# ifconfig vr0
vr0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 00:11:95:e2:18:ae
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 192.168.1.5 netmask 0xffffff00 broadcast 192.168.1.255
inet6 fe80::211:95ff:fee2:18ae%vr0 prefixlen 64 scopeid 0x1

# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.1.5 port 5001 connected with 192.168.1.1 port 65475
[ 4] 0.0-10.4 sec 3.68 MBytes 2.97 Mbits/sec

# iperf -s -u
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 40.6 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.5 port 5001 connected with 192.168.1.1 port 59906
[ 3] 0.0- 0.6 sec 102 KBytes 1.39 Mbits/sec 19.596 ms 655/ 726 (=
90%)
[ 3] 0.0- 0.6 sec 1 datagrams received out-of-order

/* TCP and UDP tests with hardware checksum enabled */

# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (default)
--------------------------------------------------...

Cc: <tech-kern@...>, <tsutsui@...>
Date: Tuesday, January 9, 2007 - 6:50 am

:

If you find any problem, please report here or via send-pr.
---
Izumi Tsutsui

Cc: Izumi Tsutsui <tsutsui@...>, <tech-kern@...>
Date: Friday, January 5, 2007 - 10:04 pm

ifconfig nfe0 ip4csum tcp4csum udp4csum; then see if IP/TCP/UDP traffic=20
over the interface still works (and has correct checksums - check with=20
e.g. wireshark).

Cheers,
--=20
Thomas E. Spanjaard
tgen@netphreax.net

Previous thread: YOUR LUCKY DAY HAS COME by DAVID BROWN on Friday, December 29, 2006 - 6:34 pm. (1 message)

Next thread: Question about line discipline linesw struct by J Sevy on Monday, January 1, 2007 - 10:05 pm. (1 message)