Re: [Bugme-new] [Bug 16626] New: Machine hangs with EIP at skb_copy_and_csum_dev

Previous thread: ip_tables.h can't be used in C++ programs. by Ben Greear on Friday, August 20, 2010 - 11:54 am. (3 messages)

Next thread: by pavel potoplyak on Friday, August 20, 2010 - 2:51 pm. (1 message)
From: Jarek Poplawski
Date: Friday, August 20, 2010 - 12:38 pm

If you're bored in the meantime I'd suggest to do check the realtek
driver eg:
- for locking with the patch below,
- to turn off with ethtool its tx-checksumming and/or scatter-gather,
or if possible try to reproduce this with other NIC.

Thanks,
Jarek P.
---

diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index f5166dc..aaaccc5 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -1692,6 +1692,8 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff *skb,
 	unsigned int len = skb->len;
 	unsigned long flags;
 
+	spin_lock_irqsave(&tp->lock, flags);
+
 	/* Calculate the next Tx descriptor entry. */
 	entry = tp->cur_tx % NUM_TX_DESC;
 
@@ -1700,14 +1702,14 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff *skb,
 		if (len < ETH_ZLEN)
 			memset(tp->tx_buf[entry], 0, ETH_ZLEN);
 		skb_copy_and_csum_dev(skb, tp->tx_buf[entry]);
-		dev_kfree_skb(skb);
+		dev_kfree_skb_irq(skb);
 	} else {
+		spin_unlock_irqrestore(&tp->lock, flags);
 		dev_kfree_skb(skb);
 		dev->stats.tx_dropped++;
 		return NETDEV_TX_OK;
 	}
 
-	spin_lock_irqsave(&tp->lock, flags);
 	/*
 	 * Writing to TxStatus triggers a DMA transfer of the data
 	 * copied to tp->tx_buf[entry] above. Use a memory barrier
--

From: Jarek Poplawski
Date: Saturday, August 21, 2010 - 12:47 am

After rethinking, it's almost impossible this patch could change
anything here, so don't bother, but consider mainly the second
proposal.

Jarek P.
--

From: Eric Dumazet
Date: Saturday, August 21, 2010 - 12:50 am

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?



--

From: Jarek Poplawski
Date: Saturday, August 21, 2010 - 1:07 am

Yes, it seems it should be possible at least to handle the bug with
a warning and error return, considering Plamen's problems with getting
the trace.

Jarek P.
--

From: Plamen Petrov
Date: Monday, August 23, 2010 - 4:47 am

Well, here is the current status:

Last I promised I will stay on 2.6.36-rc1-git for as long as possible,

Yeah, 3 days and counting, right until I decided to try the freshly
announced 2.6.36-rc2.

So I upgraded the kernel, but left the scripts that turn GRO off for
the tg3 card still run at system startup. This way the system ran for
2 and a half hours, when I decided its time to try turning GRO on.

I first tried to turn GRO on for the tg3 nic, and the system oopsed
immediately (if the panic screen is necessary - please, ask for it).

After the system came back, I tried turning GRO on for the 2 RealTek
8139 nics, too, but ethtool only accepted turning GRO off.

And unfortunately, I can't test if other nics will fail the same way
as the motherboard integrated tg3 I have does, so for now, this is
only a tg3 + GRO on problem; I don't have any other hardware to test
with available.

Thanks,
Plamen
--

From: Jarek Poplawski
Date: Monday, August 23, 2010 - 5:47 am

A little misunderstanding: I was intersted with turning off some
features on realteks to change the packet path from tg3 with gro
to realtek without gro and without tx-checksumming etc.

But maybe you could try the patch below instead (so the patched
kernel, tg3 with gro on, and realteks without any change).

Thanks,
Jarek P.

--- (for debugging only)

diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..51823cd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
 					      illegal_highdma(dev, skb))));
 }
 
+static int skb_csum_start_bug(struct sk_buff *skb)
+{
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		long csstart;
+
+		csstart = skb->csum_start - skb_headroom(skb);
+		if (WARN_ON(csstart > skb_headlen(skb))) {
+			pr_warning("csum_start %d, headroom %d, headlen %d\n",
+				   skb->csum_start, skb_headroom(skb),
+				   skb_headlen(skb));
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		skb_orphan_try(skb);
 
 		if (netif_needs_gso(dev, skb)) {
+			skb_csum_start_bug(skb);
 			if (unlikely(dev_gso_segment(skb)))
 				goto out_kfree_skb;
 			if (skb->next)
 				goto gso;
 		} else {
+			skb_csum_start_bug(skb);
 			if (skb_needs_linearize(skb, dev) &&
 			    __skb_linearize(skb))
 				goto out_kfree_skb;
@@ -1997,7 +2016,12 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
-		rc = ops->ndo_start_xmit(nskb, dev);
+		if (skb_csum_start_bug(skb)) {
+			kfree_skb(skb);
+			rc = NETDEV_TX_OK;
+		} else
+			rc = ops->ndo_start_xmit(nskb, dev);
+
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
--

From: Jarek Poplawski
Date: Monday, August 23, 2010 - 6:10 am

On Mon, Aug 23, 2010 at 03:00:43PM +0200, Eric Dumazet wrote:

Feel free to send it: I'm a bit in hurry now...

Jarek P.
--

From: Plamen Petrov
Date: Monday, August 23, 2010 - 6:43 am

Currently compiling the kernel with the attached patch.

And, by the way, if there are any patches to follow, would you
please, send them as attachments? I think my mail client is
line-wrapping the patches...

The results will be sent soon, too.

Plamen
From: Plamen Petrov
Date: Monday, August 23, 2010 - 7:05 am

Well, so far - so good. No oopses.

I think I'm hitting some compiler related issue here...

The kernel with the patch I sent applied is working pretty fine
even with "generic-receive-offload" ON for my tg3 nic...

What is the big difference there?

Or do we need to dig into the object files now? Because that's
what's too scary for me...

Plamen

P.S. The compiler I am using is :
Anything obviously special about it?
--

From: Jarek Poplawski
Date: Monday, August 23, 2010 - 7:14 am

Oopses should be replaced with warnings, so please check the
syslog etc from time to time.

Jarek P.
--

From: Plamen Petrov
Date: Monday, August 23, 2010 - 9:51 pm

So far - not so good. :(

After I left the machine, i rebooted once. Then I logged in
via ssh, and turned gro back ON for the tg3 nic - and in the
next 30 minutes - another reboot.

Today I will try to capture a picture of the oops/warning
with my phone camera, because contrary to what you suggest,
Jarek, there are no messages in the system logs.

Plamen
--

From: Eric Dumazet
Date: Monday, August 23, 2010 - 10:01 pm

Hmm... I was thinking adding a call __skb_linearize(), just in case...

diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index f5166dc..10928a2 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -1696,7 +1696,7 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff *skb,
 	entry = tp->cur_tx % NUM_TX_DESC;
 
 	/* Note: the chip doesn't have auto-pad! */
-	if (likely(len < TX_BUF_SIZE)) {
+	if (likely(len < TX_BUF_SIZE && !__skb_linearize(skb))) {
 		if (len < ETH_ZLEN)
 			memset(tp->tx_buf[entry], 0, ETH_ZLEN);
 		skb_copy_and_csum_dev(skb, tp->tx_buf[entry]);




--

From: Plamen Petrov
Date: Tuesday, August 24, 2010 - 1:43 am

Here is what I've got while running the kernel with the patch
I attached earlier plus the above one, after turning GRO on for
my tg3 nic:

[picture 10]
http://picpaste.com/37e37f9ff9504e3a003f49092f9b1be6.jpg

[picture 11]
http://picpaste.com/5663ca7c5041c0ed7a1f3e6a9aa17a9e.jpg

[picture 12]
http://picpaste.com/17f2ecaa409a1ebbf835a2e0519d3099.jpg

Now moving on to the second proposed patch from Jarek.

Thanks,
Plamen
--

From: Plamen Petrov
Date: Tuesday, August 24, 2010 - 6:27 am

The current status: if I enable GRO on the tg3 - the kernel oopses.
It just takes a different amount of time to trigger: somewhere from
30 seconds to 30 minutes.

The oopses looks the same, and here are the latest:

[picture 13]
http://picpaste.com/c8dbda8f5c15d9ce3e050dd7f245f5d0.jpg

[picture 14]
http://picpaste.com/646cca586b704c5b72d3cf9fa54c7344.jpg

I was wondering which debug options could help us track this down?

Thanks,
Plamen
--

From: Eric Dumazet
Date: Tuesday, August 24, 2010 - 8:08 am

Thanks, here is an updated patch (against linux-2.6)

diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..77c8eb7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1935,6 +1935,32 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
 					      illegal_highdma(dev, skb))));
 }
 
+int skb_csum_start_bug(const struct sk_buff *skb, int pos)
+{
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		long csstart;
+
+		csstart = skb->csum_start - skb_headroom(skb);
+		if (WARN_ON(csstart > skb_headlen(skb))) {
+			int i;
+
+			pr_err("%d: csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
+				   pos, skb->csum_start, skb->csum_offset, skb_headroom(skb),
+				   skb_headlen(skb), skb->len);
+			pr_err("nr_frags=%u gso_size=%u ",
+					skb_shinfo(skb)->nr_frags,
+					skb_shinfo(skb)->gso_size);
+			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+				pr_err("frag_size=%u ", skb_shinfo(skb)->frags[i].size);
+			}
+			pr_err("\n");
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1959,11 +1985,15 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				goto out_kfree_skb;
 			if (skb->next)
 				goto gso;
+			if (skb_csum_start_bug(skb, 10))
+				goto out_kfree_skb;
 		} else {
 			if (skb_needs_linearize(skb, dev) &&
 			    __skb_linearize(skb))
 				goto out_kfree_skb;
 
+			if (skb_csum_start_bug(skb, 20))
+				goto out_kfree_skb;
 			/* If packet is not checksummed and device does not
 			 * support checksumming for this protocol, complete
 			 * checksumming here.
@@ -1974,10 +2004,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				if (!dev_can_checksum(dev, skb) &&
 				     skb_checksum_help(skb))
 					goto out_kfree_skb;
+				if (skb_csum_start_bug(skb, 30))
+					goto out_kfree_skb;
 			}
 		}
 
-		rc = ops->ndo_start_xmit(skb, dev);
+		if (skb_csum_start_bug(skb, 40)) ...
From: Plamen Petrov
Date: Tuesday, August 24, 2010 - 10:25 am

Above patch applied, and happy to report the machine now spits data
in the logs instead of oopsing. Here is what we have now: 

[   10.721802] Ending clean XFS mount for filesystem: md12
[   11.669013] IPv4 FIB: Using LC-trie version 0.409
[   11.669101] eth2: link up, 100Mbps, full-duplex, lpa 0x45E1
[   11.746792] eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
[   11.757230] tg3 0000:04:00.0: irq 44 for MSI/MSI-X
[   11.810133] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   11.957523] sixxs_t: Disabled Privacy Extensions
[   14.843711] tg3 0000:04:00.0: eth1: Link is up at 1000 Mbps, full duplex
[   14.843717] tg3 0000:04:00.0: eth1: Flow control is on for TX and on for 
RX
[   14.843753] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   15.854861] tun0: Disabled Privacy Extensions
[  699.375620] ------------[ cut here ]------------
[  699.475648] WARNING: at net/core/dev.c:1945 
skb_csum_start_bug+0x46/0xf2()
[  699.575667] Hardware name: PowerEdge SC440
[  699.675688] Pid: 2963, comm: FahCore_78.exe Not tainted 
2.6.36-rc2-FS-00103-g2d6fa25 #1
[  699.775706] Call Trace:
[  699.975744]  [<c102d86c>] ? warn_slowpath_common+0x67/0x8c
[  700.175779]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.375813]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.575848]  [<c102d8ac>] ? warn_slowpath_null+0x1b/0x1f
[  700.775882]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.975918]  [<c1024569>] ? __wake_up_sync_key+0x3c/0x52
[  701.175953]  [<c12a7bab>] ? skb_copy_and_csum_dev+0x2a/0xaf
[  701.375989]  [<c122483b>] ? rtl8139_start_xmit+0x4a/0x13a
[  701.576026]  [<c12ae29e>] ? dev_hard_start_xmit+0x220/0x4cc
[  701.776062]  [<c12bfbed>] ? sch_direct_xmit+0xac/0x174
[  701.976096]  [<c12c3f69>] ? nf_iterate+0x69/0x7c
[  702.176131]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[  702.376165]  [<c12b00eb>] ? dev_queue_xmit+0xc7/0x355
[  702.576198]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[  702.776232]  [<c12e8a92>] ? ip_finish_output+0x11c/0x2b6
[  702.976266]  ...
From: Jarek Poplawski
Date: Tuesday, August 24, 2010 - 11:22 am

Good question. IMHO it looks like skbs are overwritten, so better turn

Thanks a lot, Plamen and Eric, too!

Jarek P.
--

From: Eric Dumazet
Date: Tuesday, August 24, 2010 - 12:19 pm

Thanks !

csum_offset = 16.

so its offsetof(struct tcphdr, check)

maybe a bug in net/ipv4/netfilter/nf_nat_helper.c ?

We should trace all spots where we set csum_start/csum_offset

Or/And trace the skb content.

Please add a :

print_hex_dump(KERN_ERR, "skb data:", DUMP_PREFIX_OFFSET, 
               16, 1, skb->head, skb_end_pointer(skb)-skb->head,true);


call in skb_csum_start_bug(), right after the pr_err("\n") and before
the "return 1;"


int skb_csum_start_bug(const struct sk_buff *skb, int pos)
{

        if (skb->ip_summed == CHECKSUM_PARTIAL) {
                long csstart;

                csstart = skb->csum_start - skb_headroom(skb);
                if (WARN_ON(csstart > skb_headlen(skb))) {
                        int i;

                        pr_err("%d: csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
                                   pos, skb->csum_start, skb->csum_offset, skb_headroom(skb),
                                   skb_headlen(skb), skb->len);
                        pr_err("nr_frags=%u gso_size=%u ",
                                        skb_shinfo(skb)->nr_frags,
                                        skb_shinfo(skb)->gso_size);
                        for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
                                pr_err("frag_size=%u ", skb_shinfo(skb)->frags[i].size);
                        }
                        pr_err("\n");
                        print_hex_dump(KERN_ERR, "skb data:", DUMP_PREFIX_OFFSET,
                                16, 1, skb->head, skb_end_pointer(skb) - skb->head, true);
                        return 1;
                }
        }
        return 0;
}


--

From: Plamen Petrov
Date: Wednesday, August 25, 2010 - 12:05 am

I see you liked the previous one, here's some more. ;)

This one is based on Linus' latest tree,

The rest is in the attached file, in case you need to run it trough some 
debugging app...

Thanks,
Plamen
From: Eric Dumazet
Date: Sunday, August 29, 2010 - 12:48 am

It seems we forget to reset skb->ip_summed to CHECKSUM_NONE in various
tunnels (gre, ipip, sit, ip6_tunnel), before re-entering stack.

Add ip_summed initialization in skb_tunnel_rx(), and remove it from
ipmr / ip6mr

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/net/dst.h |    1 +
net/ipv4/ipmr.c   |    1 -
net/ipv6/ip6mr.c  |    1 -
3 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 81d1413..2ef5580 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -237,6 +237,7 @@ static inline void skb_dst_force(struct sk_buff *skb)
  */
 static inline void skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 {
+	skb->ip_summed = CHECKSUM_NONE;
 	skb->dev = dev;
 	/* TODO : stats should be SMP safe */
 	dev->stats.rx_packets++;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 179fcab..70fa43b 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1837,7 +1837,6 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
 	skb_pull(skb, (u8*)encap - skb->data);
 	skb_reset_network_header(skb);
 	skb->protocol = htons(ETH_P_IP);
-	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
 
 	skb_tunnel_rx(skb, reg_dev);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 66078da..c4d0146 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -661,7 +661,6 @@ static int pim6_rcv(struct sk_buff *skb)
 	skb_pull(skb, (u8 *)encap - skb->data);
 	skb_reset_network_header(skb);
 	skb->protocol = htons(ETH_P_IPV6);
-	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
 
 	skb_tunnel_rx(skb, reg_dev);



--

From: Jesse Gross
Date: Sunday, August 29, 2010 - 8:35 am

This is intentional behavior.  For CHECKSUM_COMPLETE packets we
subtract off the checksum of the encapsulating headers that we remove
so we can still get the benefit of checksum hardware for the inner
packet as well.  GRE does this using skb_postpull_rcsum(), for IPIP it
is implicit because the outer IP header was already removed, etc.
--

From: Plamen Petrov
Date: Friday, August 27, 2010 - 1:44 am

Current status refresh:

Just tried 2.6.35.4, but without luck.

See the oops at http://picpaste.com/8240d73e92bd2ba25a9b4019010fcabf.jpg

Eric Dumazet and I are continuing to try and find a solution for my
problem via private email.

Thanks,
Plamen
--

From: Plamen Petrov
Date: Monday, August 23, 2010 - 10:19 pm

And here is what I've got:

[picture 7]
http://picpaste.com/31d6a54fec9e87de0d1550ee02d9c336.jpg

[picture 8]
http://picpaste.com/02db6ad8abec6281065328fb52d328cf.jpg

[picture 9]
http://picpaste.com/9fbaaa14c679f57c82e96884d3274090.jpg

Sorry for the really bad quality, even for a phone, but the problem
is that I don't know when its going to happen, so... you see
the results.


Ideas?

Thanks,
Plamen
--

From: Jarek Poplawski
Date: Monday, August 23, 2010 - 11:18 pm

On Tue, Aug 24, 2010 at 08:19:37AM +0300, Plamen Petrov wrote:


Try Eric's patch, and later maybe this one.

Thanks,
Jarek P.
From: Eric Dumazet
Date: Monday, August 23, 2010 - 6:00 am

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),


--

From: Eric Dumazet
Date: Monday, August 23, 2010 - 5:35 am

There was no change in latest kernel in this area.

Should you have only tg3 cards, I guess there would be no bug.

Bug is probably a combination of :

1) tg3 + GRO , or any card enabling GRO
2) Some network code (netfilter ?)
3) a 8139too, or any card calling skb_copy_and_csum_dev()



--

Previous thread: ip_tables.h can't be used in C++ programs. by Ben Greear on Friday, August 20, 2010 - 11:54 am. (3 messages)

Next thread: by pavel potoplyak on Friday, August 20, 2010 - 2:51 pm. (1 message)