Re: [PATCH] SLUB: clear c->freelist in __slab_alloc()/load_freelist:/SlabDebug path

Previous thread: [PATCH 6/6, revised] drivers/net/pppol2tp.c: remove null pointer dereference by Julia Lawall on Monday, May 12, 2008 - 4:21 pm. (2 messages)

Next thread: [PATCH]Make the intel-iommu_wait_op macro work when jiffies are not running by mark gross on Monday, May 12, 2008 - 4:41 pm. (2 messages)
To: Christoph Lameter <clameter@...>
Cc: Linux Kernel <linux-kernel@...>
Date: Monday, May 12, 2008 - 4:32 pm

In the __slab_alloc()/load_freelist:/SlabDebug(c->page) path we only
use the object at the head of c->page->freelist
and the tail goes back to c->page->freelist.
We then set c->node = -1 to force __slab_alloc in next allocation.
c->freelist therefore needs to be cleared as it is invalid at this point.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
mm/slub.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

Hit while running cthon04 test from an IBM AIX client
against my nfs41 tree.

Stack trace excerpt:

May 12 11:18:19 client kernel: general protection fault: 0000 [2] SMP
May 12 11:18:19 client kernel: CPU 3
May 12 11:18:19 client kernel: Modules linked in: panfs(P) nfsd auth_rpcgss exportfs autofs4 hidp nfs lockd nfs_acl fuse rfcomm l2cap bluetooth sunrpc nf_conntrack_netbios_ns nf_conntrack_ipv4 ipt_REJECT iptable_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 dm_multipath video output sbs sbshc battery ac e1000e i5000_edac iTCO_wdt iTCO_vendor_support i2c_i801 edac_core button sr_mod pcspkr i2c_core sg cdrom floppy dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
May 12 11:18:19 client kernel: Pid: 2815, comm: nfsd Tainted: P D 2.6.25-nfs41 #2
May 12 11:18:19 client kernel: RIP: 0010:[<ffffffff8108c0c8>] [<ffffffff8108c0c8>] kmem_cache_alloc+0x3d/0x65
May 12 11:18:19 client kernel: RSP: 0018:ffff8104212c3de0 EFLAGS: 00010006
May 12 11:18:19 client kernel: RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff883546df
May 12 11:18:19 client kernel: RDX: 3200100010100000 RSI: 00000000000080d0 RDI: ffffffff813eadb8
May 12 11:18:19 client kernel: RBP: ffff810001029e60 R08: 0000000000000000 R09: ffff8103f118d130
May 12 11:18:19 client kernel: R10: ffff81041b076018 R11: ffffffff8826c31...

To: Benny Halevy <bhalevy@...>
Cc: Christoph Lameter <clameter@...>, Linux Kernel <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 2:40 pm

Hi Benny,

But for debug pages, we never load c->page->freelist to c->freelist so

Looking at this, we're oopsing at:

0: 48 8b 04 c2 mov (%rdx,%rax,8),%rax

where rdx is c->freelist and rax c->offset. The the value for
c->freelist ("3200100010100000") doesn't make much sense. Furthermore,
we never if this really were a bug in __slab_alloc() shouldn't we be
hitting it more often?

How did you make SLUB hit the debug path since you have
CONFIG_SLUB_DEBUG_ON disabled?

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Christoph Lameter <clameter@...>, Linux Kernel <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 3:34 pm

Hmm, I see. Then it might have got corrupted...
I'll keep looking for the root cause.

--

To: Benny Halevy <bhalevy@...>
Cc: Pekka Enberg <penberg@...>, Linux Kernel <linux-kernel@...>
Date: Wednesday, May 14, 2008 - 1:44 pm

I guess he passed slub_debug on the kernel command line.

--

To: Christoph Lameter <clameter@...>, Pekka Enberg <penberg@...>
Cc: Linux Kernel <linux-kernel@...>
Date: Wednesday, May 14, 2008 - 1:54 pm

Yeah, I've moved to SLAB and the mem corruption now pops up at a different

I did not.

I probably have misunderstood how the slub debugging infrastructure works
and did not execute the debug path at all.

Thanks for your help!

Benny
--

To: Benny Halevy <bhalevy@...>
Cc: Pekka Enberg <penberg@...>, Linux Kernel <linux-kernel@...>
Date: Wednesday, May 14, 2008 - 1:58 pm

Ahh.. So for some reason you set PG_error on a slab page which caused it
to go into the debug path? Doing I/O on slab objects?

--

To: Benny Halevy <bhalevy@...>
Cc: Christoph Lameter <clameter@...>, Linux Kernel <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 2:14 am

Makes sense. Christoph?
--

Previous thread: [PATCH 6/6, revised] drivers/net/pppol2tp.c: remove null pointer dereference by Julia Lawall on Monday, May 12, 2008 - 4:21 pm. (2 messages)

Next thread: [PATCH]Make the intel-iommu_wait_op macro work when jiffies are not running by mark gross on Monday, May 12, 2008 - 4:41 pm. (2 messages)