Re: kmemcheck: Caught 32-bit read from uninitialized memory (f6038ec0)

Previous thread: [REQUEST] Clarification from Copyright Holders on FUSE/NDISWRAPPER by jmerkey on Sunday, September 28, 2008 - 10:15 am. (10 messages)

Next thread: [PATCH] x86: merge winchip-2 and winchip-2a cpu choices by Krzysztof Helt on Sunday, September 28, 2008 - 12:28 pm. (3 messages)
From: Vegard Nossum
Date: Sunday, September 28, 2008 - 11:06 am

Hi Ingo,

Here is the fixlets branch, including bitfields API. I think it would
be very nice if you could make this a *separate* branch in -tip, say
kmemcheck-fixlets or so, as it may touch any part of the kernel and
doesn't carry the acks of those maintainers.

With these patches, you should be able to include kmemcheck in auto-testing
again. At least it works for me :-)

You may also kill tip/kmemcheck-dma, which has been integrated already.


The following changes since commit 9f24b3ed3058f022dad24f8304b9ad53adf5aa71:
  Vegard Nossum (1):
        x86: use REP MOVS instruction for memcpy if kmemcheck is enabled

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck.git fixlets-for-tip

Vegard Nossum (6):
      kmemcheck: bitfield API
      net: use kmemcheck bitfields API for skbuff
      net: annotate bitfields in struct inet_sock
      ieee1394: annotate bitfield
      ieee1394/csr1212: fix false positive kmemcheck report
      kmemcheck: allocate struct siginfo with SLAB_NOTRACK

 drivers/ieee1394/csr1212.c |    2 ++
 drivers/ieee1394/nodemgr.c |    7 ++++++-
 include/linux/kmemcheck.h  |   39 +++++++++++++++++++++++++++++++++++++++
 include/linux/skbuff.h     |   31 +++++++++++++++++++------------
 include/net/inet_sock.h    |   24 +++++++++++++++---------
 kernel/signal.c            |    8 +++++++-
 net/core/skbuff.c          |    8 ++++++++
 7 files changed, 96 insertions(+), 23 deletions(-)

Thanks,


Vegard
--

From: Ingo Molnar
Date: Monday, September 29, 2008 - 12:54 am

branches are hard to kill in Git (anyone pushing back from a tree that 
still has the old branch will re-create it). The name is sufficiently 
generic so we keep keep it. The name can be reused in the future 

pulled into tip/kmemcheck-bitfields, thanks Vegard. I've also propagated 
this new branch into the integration rules of auto-kmemcheck-next.

	Ingo
--

From: Ingo Molnar
Date: Monday, September 29, 2008 - 12:59 am

Ingo
--

From: Ingo Molnar
Date: Monday, September 29, 2008 - 1:55 am

FYI, i've reactivated kmemcheck on one of the -tip auto-test boxes 
earlier today and it's looking good so far - for example a 32-bit 
allyesconfig-ish config booted in just fine with kmemcheck enabled. 
Also, the box is very usable interactively - while previous one could 
always tell whether there's kmemcheck active.

[ only one CPU is active, but we knew that. We've still got this 
  test-commit:

     21d01a4: x86: add functions for duplicating page tables

  it's not in tip/master but we still have it around. ]

btw., is there any easy way to tell from within a script what the 
current status of kmemcheck is? In particular, whether it's running. 
Normally i have this in the syslog:

 [    0.448022] kmemcheck: "Bugs, beware!"
 [    0.452002] kmemcheck: Limiting number of CPUs to 1.

but this time the log was too large so this bit was snipped out and i 
was unsure about it - needed a second bootup with a larger buffer to 
make sure. With lockdep we've got the 'debug_locks' /proc/lockdep_stats.

also, all kmemcheck warnings follow the usual WARN_ON() format, so that 
automated tests can pick it up, correct? -tip testing does so many 
bootups that there's no chance to notice non-system-crashing bugs and 
printouts but via automated means.

	Ingo
--

From: Vegard Nossum
Date: Monday, September 29, 2008 - 2:13 am

[Empty message]
From: Ingo Molnar
Date: Monday, September 29, 2008 - 2:25 am

a kernel log line beginning with:

 INFO:
 WARNING:
 BUG:

would be noticed. (That pattern has to be at the beginning of the line. 
Otherwise we'd match on things like 'DEBUG: ' - such printouts exist)

	Ingo
--

From: Ingo Molnar
Date: Monday, September 29, 2008 - 2:43 am

ok, it crashed with that, see below. Config attached - kernel was latest 
tip/master. Let me know if you need more debug info.

	Ingo

[  372.600318] memmap_init+0x0/0xc0 called from 1 returned 0 after 7 msecs
[  372.600318] pci_mmcfg_late_insert_resources+0x0/0x150 called from 1 returned 0 after 0 msecs
[  372.600318] tcp_congestion_default+0x0/0x10 called from 1 returned 0 after 0 msecs
[  372.600318] ip_auto_config+0x0/0xff0 called from 1 returned 0 after 0 msecs
[  372.600318] ---------------------------------
[  372.600318] Modules linked in:
[  372.600318] 
[  372.600318] Pid: 1306, comm: rcu_torture_fak Tainted: G        W (2.6.27-rc7-tip-00980-g072062d-dirty #37075) System Product Name
[  372.600318] EIP: 0060:[<c0123265>] EFLAGS: 00010086 CPU: 0
[  372.600318] EIP is at math_state_restore+0x15/0xa0
[  372.600318] EAX: 8005003b EBX: 00000000 ECX: 0000007b EDX: ffffffff
[  372.600318] ESI: f712d000 EDI: c18dad80 EBP: f712d74c ESP: f712d740
[  372.600318]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  372.600318] Process rcu_torture_fak (pid: 1306, ti=f712d000 task=f6a8a980 task.ti=f67d5000)
[  372.600318] Stack: f712d764 018ed000 c18dad80 f712d75c c18dadca ffffffff c27a1880 f712d7b4 
[  372.600318]        c18da04f c27a1880 c18d6d54 f6a8a980 018ed000 f7138000 f712d7b4 f6758600 
[  372.600318]        0000007b 0000007b 000000d8 ffffffff c0120f37 00000060 00010046 00000000 
[  372.600318] Call Trace:
[  372.600318]  [<c18dad80>] ? do_device_not_available+0x0/0x70
[  372.600318]  [<c18dadca>] ? do_device_not_available+0x4a/0x70
[  372.600318]  [<c18da04f>] ? error_code+0x77/0x7c
[  372.600318]  [<c18d6d54>] ? schedule+0x3a4/0x910
[  372.600318]  [<c0120f37>] ? __switch_to+0x47/0x220
[  372.600318]  [<c18d6d91>] ? schedule+0x3e1/0x910
[  372.600318]  [<c018e596>] ? trace_hardirqs_on_caller+0x16/0x1a0
[  372.600318]  [<c0b166ec>] ? lbs_process_event+0x3c/0x730
[  372.600318]  [<c0147e00>] ? native_flush_tlb_single+0x0/0x10
[  372.600318]  [<c01220e6>] ? ...
From: Ingo Molnar
Date: Monday, September 29, 2008 - 4:13 am

got another type of warning:

PM: Adding info for No Bus:vcs1
PM: Adding info for No Bus:vcsa1
------------[ cut here ]------------
WARNING: at kernel/smp.c:331 smp_call_function_mask+0x182/0x190()
Modules linked in:
Pid: 1261, comm: gzip Not tainted 2.6.27-rc7-tip-00980-g072062d-dirty #37099
 [<c012cdb4>] warn_on_slowpath+0x54/0x70
 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30
 [<c011e744>] ? kmemcheck_show_addr+0x44/0x60
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30
 [<c011e6e4>] ? kmemcheck_hide_addr+0x44/0x60
 [<c011e9df>] ? kmemcheck_hide+0x8f/0x120
 [<c011eb68>] ? kmemcheck_trap+0x88/0xd0
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c014cff2>] smp_call_function_mask+0x182/0x190
 [<c0113360>] ? do_flush_tlb_all+0x0/0x50
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30
 [<c011e744>] ? kmemcheck_show_addr+0x44/0x60
 [<c011ae3f>] ? lookup_address+0x7f/0xa0
 [<c01199c8>] ? default_spin_lock_flags+0x8/0x10
 [<c0113360>] ? do_flush_tlb_all+0x0/0x50
 [<c014d017>] smp_call_function+0x17/0x20
 [<c01319da>] on_each_cpu+0x1a/0x50
 [<c0113314>] flush_tlb_all+0x14/0x20
 [<c016ac05>] flush_all_zero_pkmaps+0xb5/0xd0
 [<c016ac32>] kmap_flush_unused+0x12/0x20
 [<c011b6b7>] change_page_attr_set_clr+0x57/0x320
 [<c0162eee>] ? __alloc_pages_internal+0x9e/0x400
 [<c011b9aa>] set_memory_4k+0x2a/0x30
 [<c011ec3b>] kmemcheck_hide_pages+0x1b/0xc0
 [<c011e9df>] ? kmemcheck_hide+0x8f/0x120
 [<c01802da>] kmemcheck_alloc_shadow+0x6a/0xc0
 [<c017eaef>] __slab_alloc+0x12f/0x3d0
 [<c017ee6b>] kmem_cache_alloc+0xbb/0xc0
 [<c016003e>] ? mempool_alloc_slab+0xe/0x10
 [<c016003e>] mempool_alloc_slab+0xe/0x10
 [<c016013c>] mempool_alloc+0x2c/0xd0
 [<c01a6d71>] bvec_alloc_bs+0x51/0xf0
 [<c01a70c5>] bio_alloc_bioset+0x45/0x80
 [<c01a710e>] bio_alloc+0xe/0x20
 [<c01aac50>] mpage_alloc+0x20/0x80
 ...
From: Ingo Molnar
Date: Monday, September 29, 2008 - 4:57 am

got the report below - config attached. This is probably bitfield 
initialization related. tip-072062d5.

	Ingo

----------------->
kmemcheck: Caught 32-bit read from uninitialized memory (f6038ec0)
0000000000000000d0e86bc0d0fa6bc070446cc060fa6bc0205c71c0103772c0
 u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u
 ^

Pid: 0, comm: swapper Tainted: G        W (2.6.27-rc7-tip-00980-g072062d-dirty #37099) System Product Name
EIP: 0060:[<c06beba6>] EFLAGS: 00010282 CPU: 0
EIP is at sk_clone+0x26/0x1d0
EAX: f6038d80 EBX: f6038d80 ECX: 00000020 EDX: c0ac19c0
ESI: f6038000 EDI: f6038d80 EBP: c0b67d54 ESP: c0b860e8
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: f6015800 CR3: 36105000 CR4: 00000680
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
 [<c0704f7a>] inet_csk_clone+0x1a/0x90
 [<c07185aa>] tcp_create_openreq_child+0x1a/0x3a0
 [<c071637f>] tcp_v4_syn_recv_sock+0x3f/0x1c0
 [<c07183c3>] tcp_check_req+0x283/0x450
 [<c0715e81>] tcp_v4_do_rcv+0x261/0x360
 [<c0717fa4>] tcp_v4_rcv+0x5d4/0x6b0
 [<c06fc66c>] ip_local_deliver+0x6c/0x1a0
 [<c06fca85>] ip_rcv+0x2e5/0x5a0
 [<c06c7621>] netif_receive_skb+0x3f1/0x480
 [<c047030f>] nv_napi_poll+0x38f/0x670
 [<c06c9699>] net_rx_action+0x129/0x1b0
 [<c0131eb2>] __do_softirq+0x82/0x100
 [<c0105fc5>] call_on_stack+0x15/0x30
 [<ffffffff>] 0xffffffff
From: Ingo Molnar
Date: Monday, September 29, 2008 - 6:14 am

another crash is below - config attached.

	Ingo

----------------->
initcall 0xc08eabf0 returned 0 after 7 msecs
calling  0xc08eae70 @ 1
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc550
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc580, dseg 0xf0000
BUG: unable to handle kernel paging request at 0000c6ef
IP: [<c011ee33>]
*pde = 00000000 
Oops: 0000 [#1] SMP 
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.27-rc7-tip-00980-g072062d-dirty #37107) System Product Name
EIP: 0060:[<c011ee33>] EFLAGS: 00010046 CPU: 0
EAX: 0000c6ef EBX: 0000c6ef ECX: 00000001 EDX: 00fd6000
ESI: 0000c6ef EDI: c190e220 EBP: f703fd96 ESP: f703fd8e
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f703e000 task=f7040000 task.ti=f703e000)
Stack: 0000c6ef f703fe2e f703fdc6 c011e7c4 fdb4f708 00000001 f7080580 afeff703 
       7018c011 f703fdc6 c011afef f7080580 f703fe2e 00000002 f703fdde c011ebde 
       00000001 0000c695 00000034 c011a110 f703fe26 c011a56f fdf80000 ef22f703 
Call Trace:
 [<c011e7c4>] ? <0> [<c011afef>] ? <0> [<c011ebde>] ? <0> [<c011a110>] ? <0> [<c011a56f>] ? <0> [<c011a110>] ? <0> [<c06f2d12>] ? <0> =======================
Code: c7 02 02 00 00 00 eb df 8b 45 08 c7 00 01 00 00 00 eb d4 90 8d 74 26 00 55 89 e5 56 89 c6 53 eb 0a 8d b4 26 00 00 00 00 83 c6 01 <0f> b6 1e 89 d8 e8 f3 fe ff ff 84 c0 75 ef 81 e3 f0 00 00 00 31 
EIP: [<c011ee33>]  SS:ESP 0068:f703fd8e
Kernel panic - not syncing: Fatal exception

	Ingo
From: Vegard Nossum
Date: Sunday, October 5, 2008 - 5:42 am

Thanks, I can reproduce it. And that's very interesting. Look at the
faulting address and the line just above:

    PnP BIOS version 1.0, entry 0xf0000:0xc580

When I run it, I get something similar:

    PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0x67ba, dseg 0xf0000
    BUG: unable to handle kernel paging request at 000068ac

Enabling debug information also gives me the location of the crash. It
happens in kmemcheck's opcode decoder, when we dereference regs->eip.
Is it possible that PnPBIOS is executing some code which is located at
these weird addresses (like 16-bit code) and requires a different
segment register?

Maybe we should just check regs->flags to make sure that we are not in
VM86 mode or similar. Will experiment a bit with it. Thanks.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

Previous thread: [REQUEST] Clarification from Copyright Holders on FUSE/NDISWRAPPER by jmerkey on Sunday, September 28, 2008 - 10:15 am. (10 messages)

Next thread: [PATCH] x86: merge winchip-2 and winchip-2a cpu choices by Krzysztof Helt on Sunday, September 28, 2008 - 12:28 pm. (3 messages)