RE: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?

Previous thread: [GIT Pull] Timer fix for 36-rc8 by Thomas Gleixner on Friday, October 15, 2010 - 9:27 am. (1 message)

Next thread: [GIT PULL] MMC fix for 2.6.36 by Chris Ball on Friday, October 15, 2010 - 10:10 am. (1 message)
From: Tantilov, Emil S
Date: Friday, October 15, 2010 - 9:58 am

Could you provide more details about the exact setup and the sequence of 
commands that lead to the crash? If you can narrow it down to the actual 
command that caused the crash that would be very helpful.

Also - are you testing from Linus or net-next tree? If you are not using 

Could you provide the output of lspci -vvv and a kernel config?

Thanks,
Emil
--

From: Nikola Ciprich
Date: Sunday, October 17, 2010 - 11:12 pm

Hello Emil,
sure, I'll gladly provide all the required data. I'll prepare it during Wednesday,
sorry I can't do it sooner.
in the meantime, Rafael created bugzilla entry, here's the link for the reference:

https://bugzilla.kernel.org/show_bug.cgi?id=20462

have a nice day
BR

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------
--

From: Nikola Ciprich
Date: Wednesday, October 20, 2010 - 7:46 am

Hello Emil,

I tried it now, I can still 100% reproduce with 2.6.36-rc7-git2, but
with 2.6.36-rc8-git5 it works OK. So it certainly got fixed in the meantime!
I'll therefore close the bug in BZ.

have a nice day!

best regards

nik



-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------
--

From: Nikola Ciprich
Date: Wednesday, October 20, 2010 - 9:36 am

so unfortunately I have to take back what I just wrote :(
the problem still persists, it just seems to be more random
so I'll try to separate the exact command that causes the panic..
n.


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------
--

From: Nikola Ciprich
Date: Thursday, October 21, 2010 - 12:02 pm

Ok, here're the steps to reproduce the problem:

ip link set up dev eth0
vconfig add eth0 10
ip link set up dev eth0.10
brctl addbr brtest
# to bylo ok, sundá to až:
brctl add brtest eth0.10

last command causes panic in few seconds..

Interesting thing is that it're reproducible only for eth0, not for
eth1 (both are onboard 80003ES2LAN)

here's the lspci for those:
06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
	Subsystem: Super Micro Computer Inc Unknown device 0000
	Flags: bus master, fast devsel, latency 0, IRQ 65
	Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at 3000 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
	Capabilities: [e0] Express Endpoint IRQ 0
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00

06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
	Subsystem: Super Micro Computer Inc Unknown device 0000
	Flags: bus master, fast devsel, latency 0, IRQ 66
	Memory at d8320000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at 3020 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
	Capabilities: [e0] Express Endpoint IRQ 0
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00

last but not least, I've done bisect and it leads to:
commit ae878ae280bea286ff2b1e1cb6e609dd8cb4501d
Author: Maciej Żenczykowski <maze@google.com>
Date:   Sun Oct 3 14:49:00 2010 -0700                                                                                                                                                                                                      net: Fix IPv6 PMTU disc. w/ asymmetric routes

doesn't seem to me to be related at all, but ...
From: Brandeburg, Jesse
Date: Thursday, October 21, 2010 - 12:09 pm

Adding netdev... beware the top post ordering in the thread.

--

From: Ben Greear
Date: Friday, October 22, 2010 - 11:15 am

Is there any more info, like a stack trace?  We just saw this on
one of our more complex setups.  Kernel is 2.6.36, with some patches,
including a proprietary module:

general protection fault: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0f:01.0/class
CPU 2
Modules linked in: 8021q garp bridge veth arc4 michael_mic macvlan wanlink(P) pktgen iscsi_tcp libiscsi_]

Pid: 0, comm: kworker/0:1 Tainted: P            2.6.36-rc8+ #3 X7DBU/X7DBU
RIP: 0010:[<ffffffff813ccc35>]  [<ffffffff813ccc35>] vlan_hwaccel_do_receive+0x64/0xca
RSP: 0018:ffff880001a83c00  EFLAGS: 00010283
RAX: 0000000000000002 RBX: ffff880047c9ee00 RCX: ffff880074c18000
RDX: ffff8800ffffffff RSI: 0000000000004359 RDI: 0000000000000001
RBP: ffff880001a83c20 R08: 00000000000003eb R09: ffffffff810620af
R10: ffff880047c9ee28 R11: 00000000ffffffff R12: ffff880074c18000
R13: ffff1000766988d0 R14: ffffc900037e1dd8 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000462073 CR3: 0000000074219000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff88007d030000, task ffff88007d76f700)
Stack:
  0000000000014400 ffff880047c9ee00 ffff880074c18948 ffff880047c9ee08
<0> ffff880001a83c90 ffffffff813456ed ffff880001a83c40 ffffffff8100fbba
<0> ffff880001a83c70 ffffffff81061dad ffff880001b102c0 ffff880047c9ee00
Call Trace:
  <IRQ>
  [<ffffffff813456ed>] __netif_receive_skb+0x4b/0x444
  [<ffffffff8100fbba>] ? read_tsc+0x9/0x1b
  [<ffffffff81061dad>] ? getnstimeofday+0x5e/0xb4
  [<ffffffff8134697a>] netif_receive_skb+0x7c/0x83
  [<ffffffff813470b5>] napi_skb_finish+0x24/0x3b
  [<ffffffff813ccf16>] vlan_gro_receive+0x7b/0x7d
  [<ffffffffa02bff4b>] e1000_receive_skb+0x54/0x70 [e1000e]
  [<ffffffffa02c1cc9>] e1000_clean_rx_irq+0x1fe/0x2aa ...
Previous thread: [GIT Pull] Timer fix for 36-rc8 by Thomas Gleixner on Friday, October 15, 2010 - 9:27 am. (1 message)

Next thread: [GIT PULL] MMC fix for 2.6.36 by Chris Ball on Friday, October 15, 2010 - 10:10 am. (1 message)