Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression

Previous thread: [PATCH net-next 17/17] tcp: get rid of two unnecessary u16s in TCP skb flags copying by Ilpo Järvinen on Saturday, February 28, 2009 - 7:44 am. (5 messages)

Next thread: [PATCH] Assign linklocal addresses to isatap from link dev by Sascha Hlusiak on Saturday, February 28, 2009 - 5:34 pm. (4 messages)
From: Bart Trojanowski
Date: Saturday, February 28, 2009 - 11:05 am

Hi all,

2.6.29-rc* introduces a bug that causes a crash when a vlan is put into
another vlan, so called vlan trunking or QinQ.

I can reproduce it reliably with:

  $ modprobe 8021q
  $ vconfig add eth1 5
  $ ifconfig eth1.5 up
  $ vconfig add eth1.5 4

I have seen crashes on all 2.6.29-rc kernels, but it does work for me on
2.6.28 (and prior).  All my testing was done on ia32 and amd64 systems.
I've reproduced it with various configs, but if you need my config let
me know.

I was tired of crashing my devel box, so this Ooops came from a kvm VM
(with the default 8139cp driver) where I attempted a bisect to find the
source of the bug.  Unfortunately I was unable to bisect it because of
other unrelated crashes in the history that made it too time consuming.

I have discovered that by doing:

  $ git reset --hard origin/master            # to HEAD of torvalds/linux-2.6.git
  $ git revert cc883d16c3b7434c7da2c45b54a49c2a99e83db7
  $ git revert f7d1b9f5aafa371d7f51f644aa3c38bc914e9205
  $ git revert 656299f706e52e0409733d704c2761f1b12d6954

... the crash goes away.  I just validated the procedure with Linus'
778ef1e6cbb049c9bcbf405936ee6f2b6e451892.  And other than seeing...

  [  154.094561] eth1.5 (): not using net_device_ops yet
  [  154.220840] eth1.5.4 (): not using net_device_ops yet

... there is no trace of this bug.  I suspect that only 656299f need to
be reverted/fixed, but the other two patches are prerequisites for it to
apply cleanly.

Hope that makes the source of the bug apparent to someone.

Cheers,
-Bart

PS: I see another problem with my KVM setup, but I think that has
something to do with my KVM host kernel, not the guest.  More
specifically, when I do QinQ I only see the inner VLAN tags on the
underlying bridge device (under the KVM), but the outer most VLAN tag is
missing.  But that's an exercise for another day.

--- 8< ---

[ 1201.822546] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[ 1201.830944] All bugs added by David ...
From: David Miller
Date: Wednesday, March 4, 2009 - 12:43 am

From: Bart Trojanowski <bart@jukie.net>

Stephen please fix this.
--

From: Patrick McHardy
Date: Wednesday, March 4, 2009 - 2:57 am

I'm maintaining vlan :) I haven't been able to look into this yet,
but I should be later today.


--

From: David Miller
Date: Wednesday, March 4, 2009 - 3:59 am

From: Patrick McHardy <kaber@trash.net>

Ok, I don't actually care who fixes it :-)

I asked Stephen because this appears like it might be netdev_ops
fallout.
--

From: Patrick McHardy
Date: Wednesday, March 4, 2009 - 4:45 am

Good point. I think I know whats happening. The VLAN devices are
registered with vlan_netdev_ops, the ->init function then potentially
replaces them based on whether the underlying device supports HW
acceleration. At this point the register_netdev() function already
has used the first netdev_ops structure to initialize the compat
pointers, meaning the new assignment is largely without effect and
causes incorrect ops to be used with HW acceleration

This probably causes the slab corruption since the HW accelerated
VLAN device doesn't increase the headroom, but the non-accelerated
functions try to add a hard header anyways.

This is a bit tricky to fix since we actually need some valid
ops before invoking ->init(). One way would be to move the compat
ops initialization to a seperate function and have VLAN use it to
switch its ops.
--

From: David Miller
Date: Wednesday, March 4, 2009 - 8:53 pm

From: Patrick McHardy <kaber@trash.net>

Mind if I push this into net-2.6?

vlan: Fix vlan-in-vlan crashes.

As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |    1 +
 net/8021q/vlan_dev.c      |    1 +
 net/core/dev.c            |   54 +++++++++++++++++++++++++++-----------------
 3 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ec54785..6593667 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1079,6 +1079,7 @@ extern void		synchronize_net(void);
 extern int 		register_netdevice_notifier(struct notifier_block *nb);
 extern int		unregister_netdevice_notifier(struct notifier_block *nb);
 extern int		init_dummy_netdev(struct net_device *dev);
+extern void		netdev_resync_ops(struct net_device *dev);
 
 extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 extern struct net_device	*dev_get_by_index(struct net *net, int ifindex);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4a19acd..b6e1f9e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -639,6 +639,7 @@ static int vlan_dev_init(struct net_device *dev)
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
 		dev->netdev_ops         = &vlan_netdev_ops;
 	}
+	netdev_resync_ops(dev);
 
 	if (is_vlan_dev(real_dev))
 		subclass = 1;
diff --git a/net/core/dev.c b/net/core/dev.c
index 9e4afe6..a06c6fa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4282,6 +4282,38 @@ unsigned long netdev_fix_features(unsigned long features, const char *name)
 }
 EXPORT_SYMBOL(netdev_fix_features);
 
+/* Some devices need to (re-)set their netdev_ops inside
+ * ->init() or similar.  If that happens, we have to setup
+ ...
From: Bart Trojanowski
Date: Wednesday, March 4, 2009 - 9:54 pm

David,


I tried this patch onto v2.6.29-rc7-3-g559595a, but I still get a crash.
I assume that this worked for you, so I am not putting much faith in my
results at this late hour.  I'll confirm tomorrow morning that it's not
something else.

Cheers,
-Bart

-- 
				WebSig: http://www.jukie.net/~bart/sig/
--

From: Bart Trojanowski
Date: Wednesday, March 4, 2009 - 9:59 pm

David,


... if you're interested, here is the Oops.  And like I said, I'll
retest tomorrow.

-Bart

[  231.748126] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[  231.751563] All bugs added by David S. Miller <davem@redhat.com>
[  231.876164] PANIC: double fault, gdt at c364c000 [255 bytes]
[  231.876271] double fault, tss at c364fae0
[  231.876271] eip = f8163c65, esp = f3045000
[  231.876271] eax = f3d78800, ebx = f3d78800, ecx = f8163c62, edx = f3e4ce40
[  231.876271] esi = f3e4ce40, edi = c05228fc
[  232.156110] BUG: unable to handle kernel paging request at a0ad0eb4
[  232.159226] IP: [<c011fd49>] hrtick_start_fair+0x1f/0x17d
[  232.160050] *pde = 00000000 
[  232.160050] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  232.160050] last sysfs file: /sys/class/net/lo/operstate
[  232.160050] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[  232.160050] 
[  232.160050] Pid: 0, comm: swapper Tainted: G S         (2.6.29-rc7-bisect-00004-g39fc204 #1) 
[  232.160050] EIP: 0060:[<c011fd49>] EFLAGS: 00010092 CPU: 0
[  232.160050] EIP is at hrtick_start_fair+0x1f/0x17d
[  232.160050] EAX: c0599d80 EBX: c364e200 ECX: c3651dd4 EDX: f8163c85
[  232.160050] ESI: f52f6198 EDI: f622b708 EBP: c0545ddc ESP: c0545dc4
[  232.160050]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  232.160050] Process swapper (pid: 0, ti=c0544000 task=c04fd380 task.ti=c0544000)
[  232.160050] Stack:
[  232.160050]  c3651d80 5f684afd 00000002 c3651dd4 f52f6198 c3651d80 c0545df0 c011ff82
[  232.160050]  f4796708 c03d6ad8 00000000 c0545e14 c011da63 0d94c928 00000036 00000000
[  232.160050]  c3651d80 c3651d80 c3648d80 00000522 c0545e20 c011daaa f4796708 c0545e34
[  232.160050] Call Trace:
[  232.160050]  [<c011ff82>] ? dequeue_task_fair+0x51/0x56
[  232.160050]  [<c011da63>] ? dequeue_task+0xd5/0xe4
[  232.160050]  [<c011daaa>] ? deactivate_task+0x19/0x1f
[  232.160050]  [<c011e504>] ? pull_task+0x11/0x3d
[  232.160050]  [<c011e6fc>] ? load_balance_fair+0x134/0x1ce
[  232.160050]  ...
From: Patrick McHardy
Date: Wednesday, March 4, 2009 - 10:51 pm

Thanks, I'll try to reproduce it here.
--

From: David Miller
Date: Wednesday, March 4, 2009 - 10:21 pm

From: Bart Trojanowski <bart@jukie.net>

I didn't test, so you the bug must be a different problem or
my patch is wrong :)

--

From: Patrick McHardy
Date: Wednesday, March 4, 2009 - 10:51 pm

This looks fine, thanks. Even if it doesn't fix this particular
report, I think its appropriate for net-2.6.
--

From: David Miller
Date: Wednesday, March 4, 2009 - 11:57 pm

From: Patrick McHardy <kaber@trash.net>

Thanks for looking at it, but I don't want to push it out until we
fully resolve this bug.
--

From: David Miller
Date: Thursday, March 5, 2009 - 12:00 am

I think this will fix it.

diff --git a/net/core/dev.c b/net/core/dev.c
index 9e4afe6..2dd484e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4339,6 +4339,7 @@ int register_netdevice(struct net_device *dev)
 		dev->do_ioctl = ops->ndo_do_ioctl;
 		dev->set_config = ops->ndo_set_config;
 		dev->change_mtu = ops->ndo_change_mtu;
+		dev->neigh_setup = ops->ndo_neigh_setup;
 		dev->tx_timeout = ops->ndo_tx_timeout;
 		dev->get_stats = ops->ndo_get_stats;
 		dev->vlan_rx_register = ops->ndo_vlan_rx_register;
--

From: Patrick McHardy
Date: Thursday, March 5, 2009 - 12:05 am

Good catch, that looks like a likely cause. Will test in a minute ...
--

From: David Miller
Date: Thursday, March 5, 2009 - 12:11 am

From: Patrick McHardy <kaber@trash.net>

We probably need both fixes to cover everything.
--

From: Patrick McHardy
Date: Thursday, March 5, 2009 - 12:12 am

Yes, just the second one still crashes. I'm about to retry using both.
--

From: David Miller
Date: Thursday, March 5, 2009 - 12:19 am

From: Patrick McHardy <kaber@trash.net>

Here is the updated version just for the record:

vlan: Fix vlan-in-vlan crashes.

As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |    1 +
 net/8021q/vlan_dev.c      |    1 +
 net/core/dev.c            |   56 +++++++++++++++++++++++++++-----------------
 3 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ec54785..6593667 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1079,6 +1079,7 @@ extern void		synchronize_net(void);
 extern int 		register_netdevice_notifier(struct notifier_block *nb);
 extern int		unregister_netdevice_notifier(struct notifier_block *nb);
 extern int		init_dummy_netdev(struct net_device *dev);
+extern void		netdev_resync_ops(struct net_device *dev);
 
 extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 extern struct net_device	*dev_get_by_index(struct net *net, int ifindex);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4a19acd..b6e1f9e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -639,6 +639,7 @@ static int vlan_dev_init(struct net_device *dev)
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
 		dev->netdev_ops         = &vlan_netdev_ops;
 	}
+	netdev_resync_ops(dev);
 
 	if (is_vlan_dev(real_dev))
 		subclass = 1;
diff --git a/net/core/dev.c b/net/core/dev.c
index 2dd484e..f112970 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4282,6 +4282,39 @@ unsigned long netdev_fix_features(unsigned long features, const char *name)
 }
 ...
From: Patrick McHardy
Date: Thursday, March 5, 2009 - 12:26 am

This still crashes. I'll have another look at the code.
--

From: Patrick McHardy
Date: Thursday, March 5, 2009 - 12:31 am

This one combined with your patch fixes the crash. The code was calling
vlan_dev_neigh_setup recursively.

Signed-off-by: Patrick McHardy <kaber@trash.net>

(or Tested-by: in case you want to roll it into your patch).
From: David Miller
Date: Thursday, March 5, 2009 - 12:45 am

From: Patrick McHardy <kaber@trash.net>

Great work, thanks Patrick!

I'll get this all integrated and push it out to Linus.
--

From: Frank Blaschka
Date: Thursday, March 5, 2009 - 1:05 am

Hi Dave, Patrick,

sorry I could not follow the complete discussion of the fixes done for this problem
but does

 	if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
-		err = ops->ndo_neigh_setup(dev, pa);
+		err = ops->ndo_neigh_setup(real_dev, pa);

not change the idea of the neigh_setup? Remind we want the neigh_setup of the
real device as the neigh setup function for the vlan device.

Frank



--

From: Patrick McHardy
Date: Thursday, March 5, 2009 - 1:27 am

An we still use it. The only difference is that we pass it the
correct device reference, which not only fixes the recursion,
but is also expected by the callbacks. Look at bonding or simply
vlan itself.

The setup itself is still done using the neigh_params passed to
VLAN, which appears to be what was originally intended.

--

From: David Miller
Date: Thursday, March 5, 2009 - 1:56 am

From: Patrick McHardy <kaber@trash.net>

Then bond_neigh_setup() has the same bug, doesn't it?
--

From: David Miller
Date: Thursday, March 5, 2009 - 1:59 am

From: David Miller <davem@davemloft.net>

Looking at the bond_main.c changes in:

commit 008298231abbeb91bc7be9e8b078607b816d1a4a
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Thu Nov 20 20:14:53 2008 -0800

    netdev: add more functions to netdevice ops

shows that it always behaved that way.
--

From: Patrick McHardy
Date: Thursday, March 5, 2009 - 2:08 am

Yes, but that patch introduced the requirement to pass the correct
device down since now the handlers need it to get to the ops of the
underlying device. Previously they all relied on the handlers not
using their private data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
From: Patrick McHardy
Date: Thursday, March 5, 2009 - 2:09 am

Oops, the last patch was broken.


From: David Miller
Date: Thursday, March 5, 2009 - 2:58 am

From: Patrick McHardy <kaber@trash.net>


Applied, thanks!
--

From: Maxime Bizon
Date: Thursday, March 5, 2009 - 5:30 am

On Wed, 2009-03-04 at 19:53 -0800, David Miller wrote:


Any reason dev->stop is not in this list ?

Regards,

-- 
Maxime


--

From: David Miller
Date: Thursday, March 5, 2009 - 5:55 am

From: Maxime Bizon <mbizon@freebox.fr>

dev->stop is only called from net/core/dev.c and it is
netdev_ops aware :-)
--

Previous thread: [PATCH net-next 17/17] tcp: get rid of two unnecessary u16s in TCP skb flags copying by Ilpo Järvinen on Saturday, February 28, 2009 - 7:44 am. (5 messages)

Next thread: [PATCH] Assign linklocal addresses to isatap from link dev by Sascha Hlusiak on Saturday, February 28, 2009 - 5:34 pm. (4 messages)