Re: Kernel panic during stress with igb in the upstream kernel

Previous thread: Re: no IPv6 Routers present by feldmaus on Friday, March 13, 2009 - 1:36 am. (2 messages)

Next thread: Re: no IPv6 Routers present by feldmaus on Friday, March 13, 2009 - 6:22 am. (2 messages)
From: Herbert Xu
Date: Friday, March 13, 2009 - 4:52 am

Sorry, it seems that I've managed to get the net/net-next patches
mixed up.  Does this help?

GRO: Move netpoll checks to correct location

As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 493b065..be3ebd7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -330,6 +330,14 @@ enum
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
 };
 
+enum {
+	GRO_MERGED,
+	GRO_MERGED_FREE,
+	GRO_HELD,
+	GRO_NORMAL,
+	GRO_DROP,
+};
+
 extern void __napi_schedule(struct napi_struct *n);
 
 static inline int napi_disable_pending(struct napi_struct *n)
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index e38d3c9..5f239dc 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -63,6 +63,13 @@ static inline int netpoll_rx(struct sk_buff *skb)
 	return ret;
 }
 
+static inline int netpoll_rx_on(struct sk_buff *skb)
+{
+	struct netpoll_info *npinfo = skb->dev->npinfo;
+
+	return npinfo && (npinfo->rx_np || npinfo->rx_flags);
+}
+
 static inline int netpoll_receive_skb(struct sk_buff *skb)
 {
 	if (!list_empty(&skb->dev->napi_list))
diff --git ...
From: Tantilov, Emil S
Date: Friday, March 13, 2009 - 2:53 pm

Thanks Herbert,

With this patch the stress test has been going for more than 4 hours now (compared to minutes until panic before). I will leave the test running over the weekend, but at least for now looks like it's fixed.

Thanks,
Emil 

-----Original Message-----
From: Herbert Xu [mailto:herbert@gondor.apana.org.au] 
Sent: Friday, March 13, 2009 4:53 AM
To: Tantilov, Emil S
Cc: Duyck, Alexander H; Brandeburg, Jesse; Kirsher, Jeffrey T; David S. Miller; netdev@vger.kernel.org
Subject: Re: FW: Kernel panic during stress with igb in the upstream kernel


Sorry, it seems that I've managed to get the net/net-next patches
mixed up.  Does this help?

GRO: Move netpoll checks to correct location

As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 493b065..be3ebd7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -330,6 +330,14 @@ enum
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
 };
 
+enum {
+	GRO_MERGED,
+	GRO_MERGED_FREE,
+	GRO_HELD,
+	GRO_NORMAL,
+	GRO_DROP,
+};
+
 extern void __napi_schedule(struct napi_struct *n);
 
 static inline int napi_disable_pending(struct ...
From: David Miller
Date: Friday, March 13, 2009 - 2:54 pm

From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>

Thanks for testing.

Herbert, want me to toss this into net-2.6 now?
--

From: Herbert Xu
Date: Friday, March 13, 2009 - 4:59 pm

No, the bug only exists in net-next-2.6 :)

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Sunday, March 15, 2009 - 8:13 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Actually I introduced this problem when I did the net-2.6

Applied, thanks!
--

From: David Miller
Date: Sunday, March 15, 2009 - 8:18 pm

RnJvbTogRGF2aWQgTWlsbGVyIDxkYXZlbUBkYXZlbWxvZnQubmV0Pg0KRGF0ZTogU3VuLCAxNSBN
YXIgMjAwOSAyMDoxMzo1MyAtMDcwMCAoUERUKQ0KDQo+IEZyb206IEhlcmJlcnQgWHUgPGhlcmJl
cnRAZ29uZG9yLmFwYW5hLm9yZy5hdT4NCj4gRGF0ZTogRnJpLCAxMyBNYXIgMjAwOSAxOTo1Mjo0
NSArMDgwMA0KPiANCj4gPiBHUk86IE1vdmUgbmV0cG9sbCBjaGVja3MgdG8gY29ycmVjdCBsb2Nh
dGlvbg0KIC4uLg0KPiANCj4gQXBwbGllZCwgdGhhbmtzIQ0KDQpBY3R1YWxseSwgSSBoYWQgdG8g
cmV2ZXJ0LCBpdCBkb2Vzbid0IGJ1aWxkIHdpdGggbmV0cG9sbA0KZGlzYWJsZWQ6DQoNCm5ldC84
MDIxcS92bGFuX2NvcmUuYzogSW4gZnVuY3Rpb24g4oCYdmxhbl9ncm9fY29tbW9u4oCZOg0KbmV0
LzgwMjFxL3ZsYW5fY29yZS5jOjgyOiBlcnJvcjogaW1wbGljaXQgZGVjbGFyYXRpb24gb2YgZnVu
Y3Rpb24g4oCYbmV0cG9sbF9yeF9vbuKAmQ0KbWFrZVsyXTogKioqIFtuZXQvODAyMXEvdmxhbl9j
b3JlLm9dIEVycm9yIDENCm1ha2VbMV06ICoqKiBbbmV0LzgwMjFxXSBFcnJvciAyDQptYWtlOiAq
KiogW25ldF0gRXJyb3IgMg0KbWFrZTogKioqIFdhaXRpbmcgZm9yIHVuZmluaXNoZWQgam9icy4u
Li4NCg0KUGxlYXNlIGZpeCB0aGlzIHVwIGFuZCByZXN1Ym1pdCwgdGhhbmtzIQ0K
--

From: Herbert Xu
Date: Monday, March 16, 2009 - 6:22 am

Oops, forgot the #else case.

GRO: Move netpoll checks to correct location

As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 493b065..be3ebd7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -330,6 +330,14 @@ enum
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
 };
 
+enum {
+	GRO_MERGED,
+	GRO_MERGED_FREE,
+	GRO_HELD,
+	GRO_NORMAL,
+	GRO_DROP,
+};
+
 extern void __napi_schedule(struct napi_struct *n);
 
 static inline int napi_disable_pending(struct napi_struct *n)
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index e38d3c9..de99025 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -63,6 +63,13 @@ static inline int netpoll_rx(struct sk_buff *skb)
 	return ret;
 }
 
+static inline int netpoll_rx_on(struct sk_buff *skb)
+{
+	struct netpoll_info *npinfo = skb->dev->npinfo;
+
+	return npinfo && (npinfo->rx_np || npinfo->rx_flags);
+}
+
 static inline int netpoll_receive_skb(struct sk_buff *skb)
 {
 	if (!list_empty(&skb->dev->napi_list))
@@ -99,6 +106,10 @@ static inline int netpoll_rx(struct sk_buff *skb)
 {
 	return 0;
 ...
From: David Miller
Date: Monday, March 16, 2009 - 10:50 am

RnJvbTogSGVyYmVydCBYdSA8aGVyYmVydEBnb25kb3IuYXBhbmEub3JnLmF1Pg0KRGF0ZTogTW9u
LCAxNiBNYXIgMjAwOSAyMToyMjo0MCArMDgwMA0KDQo+IE9uIFN1biwgTWFyIDE1LCAyMDA5IGF0
IDA4OjE4OjAyUE0gLTA3MDAsIERhdmlkIE1pbGxlciB3cm90ZToNCj4gPiANCj4gPiBBY3R1YWxs
eSwgSSBoYWQgdG8gcmV2ZXJ0LCBpdCBkb2Vzbid0IGJ1aWxkIHdpdGggbmV0cG9sbA0KPiA+IGRp
c2FibGVkOg0KPiA+IA0KPiA+IG5ldC84MDIxcS92bGFuX2NvcmUuYzogSW4gZnVuY3Rpb24g4oCY
dmxhbl9ncm9fY29tbW9u4oCZOg0KPiA+IG5ldC84MDIxcS92bGFuX2NvcmUuYzo4MjogZXJyb3I6
IGltcGxpY2l0IGRlY2xhcmF0aW9uIG9mIGZ1bmN0aW9uIOKAmG5ldHBvbGxfcnhfb27igJkNCj4g
PiBtYWtlWzJdOiAqKiogW25ldC84MDIxcS92bGFuX2NvcmUub10gRXJyb3IgMQ0KPiA+IG1ha2Vb
MV06ICoqKiBbbmV0LzgwMjFxXSBFcnJvciAyDQo+ID4gbWFrZTogKioqIFtuZXRdIEVycm9yIDIN
Cj4gPiBtYWtlOiAqKiogV2FpdGluZyBmb3IgdW5maW5pc2hlZCBqb2JzLi4uLg0KPiANCj4gT29w
cywgZm9yZ290IHRoZSAjZWxzZSBjYXNlLg0KPiANCj4gR1JPOiBNb3ZlIG5ldHBvbGwgY2hlY2tz
IHRvIGNvcnJlY3QgbG9jYXRpb24NCg0KQXBwbGllZCwgdGhhbmtzIQ0K
--

From: Jarek Poplawski
Date: Tuesday, March 17, 2009 - 2:37 am

On 16-03-2009 14:22, Herbert Xu wrote:

Probably I miss something, but you seem to assume here this skb will
be taken by netpoll later. But if it's not active (trapped) vlan
packet will be passed as "normal"?

Thanks,
Jarek P.
--

From: Herbert Xu
Date: Tuesday, March 17, 2009 - 3:02 am

Good point.  I'll see if this stuff can be simplified further,
but for now, let's go for the obvious fix.

gro: Fix vlan/netpoll check again

Jarek Poplawski pointed out that my previous fix is broken for
VLAN+netpoll as if netpoll is enabled we'd end up in the normal
receive path instead of the VLAN receive path.

This patch fixes it by calling the VLAN receive hook.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 6227248..654e45f 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -79,9 +79,6 @@ static int vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
 {
 	struct sk_buff *p;
 
-	if (netpoll_rx_on(skb))
-		return GRO_NORMAL;
-
 	if (skb_bond_should_drop(skb))
 		goto drop;
 
@@ -107,6 +104,9 @@ drop:
 int vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
 		     unsigned int vlan_tci, struct sk_buff *skb)
 {
+	if (netpoll_rx_on(skb))
+		return vlan_hwaccel_receive_skb(skb, grp, vlan_tci);
+
 	skb_gro_reset_offset(skb);
 
 	return napi_skb_finish(vlan_gro_common(napi, grp, vlan_tci, skb), skb);
@@ -121,6 +121,9 @@ int vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
 	if (!skb)
 		return NET_RX_DROP;
 
+	if (netpoll_rx_on(skb))
+		return vlan_hwaccel_receive_skb(skb, grp, vlan_tci);
+
 	return napi_frags_finish(napi, skb,
 				 vlan_gro_common(napi, grp, vlan_tci, skb));
 }

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Tuesday, March 17, 2009 - 1:11 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks Herbert.
--

Previous thread: Re: no IPv6 Routers present by feldmaus on Friday, March 13, 2009 - 1:36 am. (2 messages)

Next thread: Re: no IPv6 Routers present by feldmaus on Friday, March 13, 2009 - 6:22 am. (2 messages)