I get about 9 of these near the start of the system while userspace is first coming up. The virtual machine in question is mostly Fedora 14 with a kernel from linux-next as of today. The hardware is RHEL 5.5 with KVM: tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git tag: next-20100412 commit: bbeecf185fe464ccd7ee97ce6d3646ad686995b4 I'm not sure what else to collect, give, test, try, show, debug or what have you so let me know. Attached is my full dmesg and config. [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204032] Pid: 2093, comm: avahi-daemon Tainted: G W 2.6.34-rc3-next-20100412+ #65 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204177] [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0 [ 14.204184] [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204205] [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110 [ 14.204211] [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b
Good spot, RPS changed a bit netif_rx() requirements. I would change ip_dev_loopback_xmit() to call netif_rx_ni() instead... David, Tom ? diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c65f18e..d1bcc9f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb) newskb->pkt_type = PACKET_LOOPBACK; newskb->ip_summed = CHECKSUM_UNNECESSARY; WARN_ON(!skb_dst(newskb)); - netif_rx(newskb); + netif_rx_ni(newskb); return 0; } --
Would it be better to disable preemption in netif_rx? Also note that with RFS we would be taking rcu_read_lock in netif_rx anyway and that --
Ok that makes sense.
What do you think applying a small fix before RFS integration, it is
better to have smaller patches anyway :)
[PATCH net-next-2.6] net: netif_rx() must disable preemption
Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().
RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/core/dev.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..a96ea6a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
/*
* get_rps_cpu is called from netif_receive_skb and returns the target
* CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
*/
static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
{
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
u8 ip_proto;
u32 addr1, addr2, ports, ihl;
- rcu_read_lock();
-
if (skb_rx_queue_recorded(skb)) {
u16 index = skb_get_rx_queue(skb);
if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
}
done:
- rcu_read_unlock();
return cpu;
}
@@ -2392,7 +2390,7 @@ enqueue:
int netif_rx(struct sk_buff *skb)
{
- int cpu;
+ int ret;
/* if netpoll wants it, pretend we never saw it */
if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
net_timestamp(skb);
#ifdef CONFIG_RPS
+ {
+ int cpu;
+
+ rcu_read_lock();
cpu = get_rps_cpu(skb->dev, skb);
if (cpu < 0)
cpu = smp_processor_id();
+ ret = enqueue_to_backlog(skb, ...From: Eric Dumazet <eric.dumazet@gmail.com>
I've applied this with some coding style fixups.
Thanks!
--------------------
net: netif_rx() must disable preemption
Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().
RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/dev.c | 25 +++++++++++++++----------
1 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..e8041eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
/*
* get_rps_cpu is called from netif_receive_skb and returns the target
* CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
*/
static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
{
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
u8 ip_proto;
u32 addr1, addr2, ports, ihl;
- rcu_read_lock();
-
if (skb_rx_queue_recorded(skb)) {
u16 index = skb_get_rx_queue(skb);
if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
}
done:
- rcu_read_unlock();
return cpu;
}
@@ -2392,7 +2390,7 @@ enqueue:
int netif_rx(struct sk_buff *skb)
{
- int cpu;
+ int ret;
/* if netpoll wants it, pretend we never saw it */
if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
net_timestamp(skb);
#ifdef CONFIG_RPS
- cpu = get_rps_cpu(skb->dev, skb);
- if (cpu < 0)
- cpu = smp_processor_id();
+ {
+ int cpu;
+
+ rcu_read_lock();
+ cpu = ...Should netif_rx() be used only when preemption is disabled? If not, netif_rx_ni() should be used instead.? -- Regards, Changli Gao(xiaosuo@gmail.com) --
From: Changli Gao <xiaosuo@gmail.com> netif_rx() must be invoked from a hardware or software interrupt, which implies preemption disabled. In netif_rx_ni(), the "ni" means "not interrupt". --
yea, I know netif_rx_ni()'s meaning. It means that the following changes aren't necessary. #else - cpu = smp_processor_id(); + ret = enqueue_to_backlog(skb, get_cpu()); + put_cpu(); ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK. #endif - - return enqueue_to_backlog(skb, cpu); + return ret; } --
From: Changli Gao <xiaosuo@gmail.com> Why? If we are in an interrupt (either soft or hard) then smp_processor_id() is stable, and valid. Changli, I find it very frustrating to communicate with you, you are very terse in your descriptions and analysis and you make many simple errors that would be avoided if you spent more time thinking about things before sending your emails. :-/ Instead of just showing some pseudo patch, state WHY it is needed. Talk about the execution state of environment and what rules or other things are being violated which must be corrected. --
Sorry. English isn't my native language, so sometimes I can't describe
myself clearly.
I think the following patch from Eric should be applied instead.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
newskb->pkt_type = PACKET_LOOPBACK;
newskb->ip_summed = CHECKSUM_UNNECESSARY;
WARN_ON(!skb_dst(newskb));
- netif_rx(newskb);
+ netif_rx_ni(newskb);
return 0;
}
As you know "netif_rx() must be invoked from a hardware or software
interrupt, which implies preemption disabled.", obviously
ip_dev_loopback_xmit() doesn't obey this rule, so the crash isn't the
fault of net_rx(). If there are other users, who don't obey this rule,
they should be fixed too.
For this patch:
- cpu = smp_processor_id();
+ ret = enqueue_to_backlog(skb, get_cpu());
+ put_cpu();
You said: " If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.". so we don't need to call
get_cpu() instead of smp_processor_id(). get_cpu() brings no good but
additional cost preempt_disable().
--
Regards,
Changli Gao(xiaosuo@gmail.com)
--
From: Changli Gao <xiaosuo@gmail.com> Yes, this looks more reasonable. Eric if you agree please (re-)submit this formally, I must have missed this somehow, sorry. And this is a bug fix in any kernel, not just one's that have RPS patches applied. If we are not called from some interrupt context, there is no sure trigger to make sure software interrupts will be executed after the packet is queued locally. netif_rx_ni() makes sure that any pending software interrupts will run in such cases. --
Our mails crossed ;) Yes I think it's more reasonable to fix it like this, I'll submit a patch after fully testing it :) --
netif_rx is meant to be called from interrupts because it doesn't wake up ksoftirqd. For calling from outside interrupts, netif_rx_ni exists, to make _sure_ do_softirq() is called. However, netif_rx() _could_ be called from process context, it was safe, but sofirq was a bit delayed. Now, after RPS changes this can trigger this : [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204032] Pid: 2093, comm: avahi-daemon Tainted: G W 2.6.34-rc3-next-20100412+ #65 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204177] [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0 [ 14.204184] [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204205] [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110 [ 14.204211] [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b We have two possibilities : 1) Make sure no netif_rx() caller is in process context, preemption enabled. 2) Change netif_rx() to meet its initial behavior : It _can_ be called from process context, preemption enabled. Frame might be delayed a bit as before. We chose 2), but David, I am not sure this is OK given git history, some calling points were changed to avoid "'NOHZ: ...
From: Eric Dumazet <eric.dumazet@gmail.com> Ok, thanks for the analysis. Since we keep coming back to this issue why don't we simply solve it forever? Let's make netif_rx() work in all contexts and get rid of netif_rx_ni(). I think this is the thing to do because this whole netif_rx_ni() vs. netif_rx() thing was meant to be an optimization of sorts (this goes back to like 8+ years ago :-), and really I doubt it really matters on that level any more. What do you think? --
I was about to come to same idea indeed. netif_receive_skb() is supposed to be used for modern devices anyway, avoiding netif_rx() overhead... --
After some confusion, it seems this was the right fix after all :) [PATCH] ip: Fix ip_dev_loopback_xmit() Eric Paris got following trace with a linux-next kernel [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b While current linux-2.6 kernel doesnt emit this warning, bug is latent and might cause unexpected failures. ip_dev_loopback_xmit() runs in process context, preemption enabled, so must call netif_rx_ni() instead of netif_rx(), to make sure that we process pending software interrupt. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c65f18e..d1bcc9f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb) newskb->pkt_type = PACKET_LOOPBACK; newskb->ip_summed = CHECKSUM_UNNECESSARY; WARN_ON(!skb_dst(newskb)); - netif_rx(newskb); + netif_rx_ni(newskb); return 0; } --
Oops silly me, I forgot ipv6 updated patch in a couple of minutes --
[PATCH] ip: Fix ip_dev_loopback_xmit() Eric Paris got following trace with a linux-next kernel [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b While current linux-2.6 kernel doesnt emit this warning, bug is latent and might cause unexpected failures. ip_dev_loopback_xmit() runs in process context, preemption enabled, so must call netif_rx_ni() instead of netif_rx(), to make sure that we process pending software interrupt. Same change for ip6_dev_loopback_xmit() Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/ip_output.c | 2 +- net/ipv6/ip6_output.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c65f18e..d1bcc9f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb) newskb->pkt_type = PACKET_LOOPBACK; newskb->ip_summed = CHECKSUM_UNNECESSARY; WARN_ON(!skb_dst(newskb)); - netif_rx(newskb); + netif_rx_ni(newskb); return 0; } diff --git ...
From: Eric Dumazet <eric.dumazet@gmail.com> Applied to net-2.6, thanks Eric. --
Now, I am doubting the correctness of the following comment:
/*
* The higher levels take care of making this non-reentrant (it's
* called with bh's disabled).
*/
static netdev_tx_t loopback_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct pcpu_lstats __percpu *pcpu_lstats;
struct pcpu_lstats *lb_stats;
int len;
skb_orphan(skb);
skb->protocol = eth_type_trans(skb, dev);
And these lines:
/* it's OK to use per_cpu_ptr() because BHs are off */
pcpu_lstats = (void __percpu __force *)dev->ml_priv;
lb_stats = this_cpu_ptr(pcpu_lstats);
--
Regards,
Changli Gao(xiaosuo@gmail.com)
--
From: Changli Gao <xiaosuo@gmail.com> The ->hard_start_xmit() method always executes with software interrupts disabled. --
Oh, sorry. I traced to the wrong function. -- Regards, Changli Gao(xiaosuo@gmail.com) --
