Re: [PATCH] ip: Fix ip_dev_loopback_xmit()

Previous thread: [PATCH] dst: don't inline dst_ifdown by Stephen Hemminger on Monday, April 12, 2010 - 10:38 am. (2 messages)

Next thread: Receive issues with bonding and vlans by Chris Leech on Monday, April 12, 2010 - 3:17 pm. (12 messages)
From: Eric Paris
Date: Monday, April 12, 2010 - 12:20 pm

I get about 9 of these near the start of the system while userspace is
first coming up.  The  virtual machine in question is mostly Fedora 14
with a kernel from linux-next as of today.  The hardware is RHEL 5.5
with KVM:

tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
tag: next-20100412
commit: bbeecf185fe464ccd7ee97ce6d3646ad686995b4

I'm not sure what else to collect, give, test, try, show, debug or what
have you so let me know.  Attached is my full dmesg and config.

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204032] Pid: 2093, comm: avahi-daemon Tainted: G        W  2.6.34-rc3-next-20100412+ #65
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204177]  [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0
[   14.204184]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204205]  [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110
[   14.204211]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b
From: Eric Dumazet
Date: Monday, April 12, 2010 - 12:40 pm

Good spot, RPS changed a bit netif_rx() requirements.

I would change ip_dev_loopback_xmit() to call netif_rx_ni() instead...

David, Tom ?

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
 	newskb->pkt_type = PACKET_LOOPBACK;
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
-	netif_rx(newskb);
+	netif_rx_ni(newskb);
 	return 0;
 }
 



--

From: Tom Herbert
Date: Monday, April 12, 2010 - 1:54 pm

Would it be better to disable preemption in netif_rx?  Also note that
with RFS we would be taking rcu_read_lock in netif_rx anyway and that
--

From: Eric Dumazet
Date: Tuesday, April 13, 2010 - 12:14 am

Ok that makes sense.

What do you think applying a small fix before RFS integration, it is
better to have smaller patches anyway :)

[PATCH net-next-2.6] net: netif_rx() must disable preemption

Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..a96ea6a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
  */
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 {
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	u8 ip_proto;
 	u32 addr1, addr2, ports, ihl;
 
-	rcu_read_lock();
-
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
 		if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
 	}
 
 done:
-	rcu_read_unlock();
 	return cpu;
 }
 
@@ -2392,7 +2390,7 @@ enqueue:
 
 int netif_rx(struct sk_buff *skb)
 {
-	int cpu;
+	int ret;
 
 	/* if netpoll wants it, pretend we never saw it */
 	if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
 		net_timestamp(skb);
 
 #ifdef CONFIG_RPS
+	{
+	int cpu;
+
+	rcu_read_lock();
 	cpu = get_rps_cpu(skb->dev, skb);
 	if (cpu < 0)
 		cpu = smp_processor_id();
+	ret = enqueue_to_backlog(skb, ...
From: David Miller
Date: Thursday, April 15, 2010 - 12:14 am

From: Eric Dumazet <eric.dumazet@gmail.com>

I've applied this with some coding style fixups.

Thanks!

--------------------
net: netif_rx() must disable preemption

Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..e8041eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
  */
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 {
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	u8 ip_proto;
 	u32 addr1, addr2, ports, ihl;
 
-	rcu_read_lock();
-
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
 		if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
 	}
 
 done:
-	rcu_read_unlock();
 	return cpu;
 }
 
@@ -2392,7 +2390,7 @@ enqueue:
 
 int netif_rx(struct sk_buff *skb)
 {
-	int cpu;
+	int ret;
 
 	/* if netpoll wants it, pretend we never saw it */
 	if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
 		net_timestamp(skb);
 
 #ifdef CONFIG_RPS
-	cpu = get_rps_cpu(skb->dev, skb);
-	if (cpu < 0)
-		cpu = smp_processor_id();
+	{
+		int cpu;
+
+		rcu_read_lock();
+		cpu = ...
From: Changli Gao
Date: Thursday, April 15, 2010 - 12:30 am

Should netif_rx() be used only when preemption is disabled? If not,
netif_rx_ni() should be used instead.?


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Miller
Date: Thursday, April 15, 2010 - 12:37 am

From: Changli Gao <xiaosuo@gmail.com>

netif_rx() must be invoked from a hardware or software interrupt,
which implies preemption disabled.

In netif_rx_ni(), the "ni" means "not interrupt".
--

From: Changli Gao
Date: Thursday, April 15, 2010 - 12:47 am

yea, I know netif_rx_ni()'s meaning. It means that the following
changes aren't necessary.

 #else
-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.

 #endif
-
-       return enqueue_to_backlog(skb, cpu);
+       return ret;
 }
--

From: David Miller
Date: Thursday, April 15, 2010 - 12:57 am

From: Changli Gao <xiaosuo@gmail.com>

Why?  If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.

Changli, I find it very frustrating to communicate with you, you are
very terse in your descriptions and analysis and you make many simple
errors that would be avoided if you spent more time thinking about
things before sending your emails. :-/

Instead of just showing some pseudo patch, state WHY it is needed.
Talk about the execution state of environment and what rules or other
things are being violated which must be corrected.
--

From: Changli Gao
Date: Thursday, April 15, 2010 - 1:27 am

Sorry. English isn't my native language, so sometimes I can't describe
myself clearly.

I think the following patch from Eric should be applied instead.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
       newskb->pkt_type = PACKET_LOOPBACK;
       newskb->ip_summed = CHECKSUM_UNNECESSARY;
       WARN_ON(!skb_dst(newskb));
-       netif_rx(newskb);
+       netif_rx_ni(newskb);
       return 0;
 }

As you know  "netif_rx() must be invoked from a hardware or software
interrupt, which implies preemption disabled.", obviously
ip_dev_loopback_xmit() doesn't obey this rule, so the crash isn't the
fault of net_rx(). If there are other users, who don't obey this rule,
they should be fixed too.

For this patch:

-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

You said: " If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.". so we don't need to call
get_cpu() instead of smp_processor_id(). get_cpu() brings no good but
additional cost preempt_disable().

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Miller
Date: Thursday, April 15, 2010 - 1:33 am

From: Changli Gao <xiaosuo@gmail.com>

Yes, this looks more reasonable.  Eric if you agree please (re-)submit
this formally, I must have missed this somehow, sorry.

And this is a bug fix in any kernel, not just one's that have RPS
patches applied.

If we are not called from some interrupt context, there is no sure
trigger to make sure software interrupts will be executed after the
packet is queued locally.  netif_rx_ni() makes sure that any pending
software interrupts will run in such cases.
--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 1:58 am

Our mails crossed ;)

Yes I think it's more reasonable to fix it like this, I'll submit a
patch after fully testing it :)



--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 1:49 am

netif_rx is meant to be called from interrupts because it doesn't wake
up ksoftirqd.  For calling from outside interrupts, netif_rx_ni exists,
to make _sure_ do_softirq() is called.

However, netif_rx() _could_ be called from process context, it was safe,
but sofirq was a bit delayed.

Now, after RPS changes this can trigger this :

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204032] Pid: 2093, comm: avahi-daemon Tainted: G        W
2.6.34-rc3-next-20100412+ #65
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204177]  [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0
[   14.204184]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204205]  [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110
[   14.204211]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b


We have two possibilities :

1) Make sure no netif_rx() caller is in process context, preemption
enabled.

2) Change netif_rx() to meet its initial behavior : It _can_ be called
from process context, preemption enabled. Frame might be delayed a bit
as before.

We chose 2), but David, I am not sure this is OK given git history, some
calling points were changed to avoid "'NOHZ: ...
From: David Miller
Date: Thursday, April 15, 2010 - 2:02 am

From: Eric Dumazet <eric.dumazet@gmail.com>

Ok, thanks for the analysis.

Since we keep coming back to this issue why don't we simply
solve it forever?  Let's make netif_rx() work in all contexts
and get rid of netif_rx_ni().

I think this is the thing to do because this whole netif_rx_ni()
vs. netif_rx() thing was meant to be an optimization of sorts (this
goes back to like 8+ years ago :-), and really I doubt it really
matters on that level any more.

What do you think?
--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 3:29 am

I was about to come to same idea indeed.

netif_receive_skb() is supposed to be used for modern devices anyway,
avoiding netif_rx() overhead...




--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 6:16 am

After some confusion, it seems this was the right fix after all :)

[PATCH] ip: Fix ip_dev_loopback_xmit()

Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
 	newskb->pkt_type = PACKET_LOOPBACK;
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
-	netif_rx(newskb);
+	netif_rx_ni(newskb);
 	return 0;
 }
 


--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 12:07 pm

Oops silly me, I forgot ipv6

updated patch in a couple of minutes




--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 12:13 pm

[PATCH] ip: Fix ip_dev_loopback_xmit()

Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Same change for ip6_dev_loopback_xmit()

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_output.c  |    2 +-
 net/ipv6/ip6_output.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
 	newskb->pkt_type = PACKET_LOOPBACK;
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
-	netif_rx(newskb);
+	netif_rx_ni(newskb);
 	return 0;
 }
 
diff --git ...
From: David Miller
Date: Thursday, April 15, 2010 - 2:26 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-2.6, thanks Eric.
--

From: Changli Gao
Date: Thursday, April 15, 2010 - 5:03 pm

Now, I am doubting the correctness of  the following comment:

/*
 * The higher levels take care of making this non-reentrant (it's
 * called with bh's disabled).
 */
static netdev_tx_t loopback_xmit(struct sk_buff *skb,
                                 struct net_device *dev)
{
        struct pcpu_lstats __percpu *pcpu_lstats;
        struct pcpu_lstats *lb_stats;
        int len;

        skb_orphan(skb);

        skb->protocol = eth_type_trans(skb, dev);

And these lines:

        /* it's OK to use per_cpu_ptr() because BHs are off */
        pcpu_lstats = (void __percpu __force *)dev->ml_priv;
        lb_stats = this_cpu_ptr(pcpu_lstats);

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Miller
Date: Thursday, April 15, 2010 - 5:15 pm

From: Changli Gao <xiaosuo@gmail.com>

The ->hard_start_xmit() method always executes with software
interrupts disabled.
--

From: Changli Gao
Date: Thursday, April 15, 2010 - 5:19 pm

Oh, sorry. I traced to the wrong function.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

Previous thread: [PATCH] dst: don't inline dst_ifdown by Stephen Hemminger on Monday, April 12, 2010 - 10:38 am. (2 messages)

Next thread: Receive issues with bonding and vlans by Chris Leech on Monday, April 12, 2010 - 3:17 pm. (12 messages)