Hi Ingo,
Here is the fixlets branch, including bitfields API. I think it would
be very nice if you could make this a *separate* branch in -tip, say
kmemcheck-fixlets or so, as it may touch any part of the kernel and
doesn't carry the acks of those maintainers.
With these patches, you should be able to include kmemcheck in auto-testing
again. At least it works for me :-)
You may also kill tip/kmemcheck-dma, which has been integrated already.
The following changes since commit 9f24b3ed3058f022dad24f8304b9ad53adf5aa71:
Vegard Nossum (1):
x86: use REP MOVS instruction for memcpy if kmemcheck is enabled
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck.git fixlets-for-tip
Vegard Nossum (6):
kmemcheck: bitfield API
net: use kmemcheck bitfields API for skbuff
net: annotate bitfields in struct inet_sock
ieee1394: annotate bitfield
ieee1394/csr1212: fix false positive kmemcheck report
kmemcheck: allocate struct siginfo with SLAB_NOTRACK
drivers/ieee1394/csr1212.c | 2 ++
drivers/ieee1394/nodemgr.c | 7 ++++++-
include/linux/kmemcheck.h | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/skbuff.h | 31 +++++++++++++++++++------------
include/net/inet_sock.h | 24 +++++++++++++++---------
kernel/signal.c | 8 +++++++-
net/core/skbuff.c | 8 ++++++++
7 files changed, 96 insertions(+), 23 deletions(-)
Thanks,
Vegard
--
branches are hard to kill in Git (anyone pushing back from a tree that still has the old branch will re-create it). The name is sufficiently generic so we keep keep it. The name can be reused in the future pulled into tip/kmemcheck-bitfields, thanks Vegard. I've also propagated this new branch into the integration rules of auto-kmemcheck-next. Ingo --
FYI, i've reactivated kmemcheck on one of the -tip auto-test boxes
earlier today and it's looking good so far - for example a 32-bit
allyesconfig-ish config booted in just fine with kmemcheck enabled.
Also, the box is very usable interactively - while previous one could
always tell whether there's kmemcheck active.
[ only one CPU is active, but we knew that. We've still got this
test-commit:
21d01a4: x86: add functions for duplicating page tables
it's not in tip/master but we still have it around. ]
btw., is there any easy way to tell from within a script what the
current status of kmemcheck is? In particular, whether it's running.
Normally i have this in the syslog:
[ 0.448022] kmemcheck: "Bugs, beware!"
[ 0.452002] kmemcheck: Limiting number of CPUs to 1.
but this time the log was too large so this bit was snipped out and i
was unsure about it - needed a second bootup with a larger buffer to
make sure. With lockdep we've got the 'debug_locks' /proc/lockdep_stats.
also, all kmemcheck warnings follow the usual WARN_ON() format, so that
automated tests can pick it up, correct? -tip testing does so many
bootups that there's no chance to notice non-system-crashing bugs and
printouts but via automated means.
Ingo
--
a kernel log line beginning with: INFO: WARNING: BUG: would be noticed. (That pattern has to be at the beginning of the line. Otherwise we'd match on things like 'DEBUG: ' - such printouts exist) Ingo --
ok, it crashed with that, see below. Config attached - kernel was latest tip/master. Let me know if you need more debug info. Ingo [ 372.600318] memmap_init+0x0/0xc0 called from 1 returned 0 after 7 msecs [ 372.600318] pci_mmcfg_late_insert_resources+0x0/0x150 called from 1 returned 0 after 0 msecs [ 372.600318] tcp_congestion_default+0x0/0x10 called from 1 returned 0 after 0 msecs [ 372.600318] ip_auto_config+0x0/0xff0 called from 1 returned 0 after 0 msecs [ 372.600318] --------------------------------- [ 372.600318] Modules linked in: [ 372.600318] [ 372.600318] Pid: 1306, comm: rcu_torture_fak Tainted: G W (2.6.27-rc7-tip-00980-g072062d-dirty #37075) System Product Name [ 372.600318] EIP: 0060:[<c0123265>] EFLAGS: 00010086 CPU: 0 [ 372.600318] EIP is at math_state_restore+0x15/0xa0 [ 372.600318] EAX: 8005003b EBX: 00000000 ECX: 0000007b EDX: ffffffff [ 372.600318] ESI: f712d000 EDI: c18dad80 EBP: f712d74c ESP: f712d740 [ 372.600318] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 372.600318] Process rcu_torture_fak (pid: 1306, ti=f712d000 task=f6a8a980 task.ti=f67d5000) [ 372.600318] Stack: f712d764 018ed000 c18dad80 f712d75c c18dadca ffffffff c27a1880 f712d7b4 [ 372.600318] c18da04f c27a1880 c18d6d54 f6a8a980 018ed000 f7138000 f712d7b4 f6758600 [ 372.600318] 0000007b 0000007b 000000d8 ffffffff c0120f37 00000060 00010046 00000000 [ 372.600318] Call Trace: [ 372.600318] [<c18dad80>] ? do_device_not_available+0x0/0x70 [ 372.600318] [<c18dadca>] ? do_device_not_available+0x4a/0x70 [ 372.600318] [<c18da04f>] ? error_code+0x77/0x7c [ 372.600318] [<c18d6d54>] ? schedule+0x3a4/0x910 [ 372.600318] [<c0120f37>] ? __switch_to+0x47/0x220 [ 372.600318] [<c18d6d91>] ? schedule+0x3e1/0x910 [ 372.600318] [<c018e596>] ? trace_hardirqs_on_caller+0x16/0x1a0 [ 372.600318] [<c0b166ec>] ? lbs_process_event+0x3c/0x730 [ 372.600318] [<c0147e00>] ? native_flush_tlb_single+0x0/0x10 [ 372.600318] [<c01220e6>] ? ...
got another type of warning: PM: Adding info for No Bus:vcs1 PM: Adding info for No Bus:vcsa1 ------------[ cut here ]------------ WARNING: at kernel/smp.c:331 smp_call_function_mask+0x182/0x190() Modules linked in: Pid: 1261, comm: gzip Not tainted 2.6.27-rc7-tip-00980-g072062d-dirty #37099 [<c012cdb4>] warn_on_slowpath+0x54/0x70 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30 [<c011e744>] ? kmemcheck_show_addr+0x44/0x60 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30 [<c011e6e4>] ? kmemcheck_hide_addr+0x44/0x60 [<c011e9df>] ? kmemcheck_hide+0x8f/0x120 [<c011eb68>] ? kmemcheck_trap+0x88/0xd0 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c014cff2>] smp_call_function_mask+0x182/0x190 [<c0113360>] ? do_flush_tlb_all+0x0/0x50 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c011eebe>] ? kmemcheck_pte_lookup+0xe/0x30 [<c011e744>] ? kmemcheck_show_addr+0x44/0x60 [<c011ae3f>] ? lookup_address+0x7f/0xa0 [<c01199c8>] ? default_spin_lock_flags+0x8/0x10 [<c0113360>] ? do_flush_tlb_all+0x0/0x50 [<c014d017>] smp_call_function+0x17/0x20 [<c01319da>] on_each_cpu+0x1a/0x50 [<c0113314>] flush_tlb_all+0x14/0x20 [<c016ac05>] flush_all_zero_pkmaps+0xb5/0xd0 [<c016ac32>] kmap_flush_unused+0x12/0x20 [<c011b6b7>] change_page_attr_set_clr+0x57/0x320 [<c0162eee>] ? __alloc_pages_internal+0x9e/0x400 [<c011b9aa>] set_memory_4k+0x2a/0x30 [<c011ec3b>] kmemcheck_hide_pages+0x1b/0xc0 [<c011e9df>] ? kmemcheck_hide+0x8f/0x120 [<c01802da>] kmemcheck_alloc_shadow+0x6a/0xc0 [<c017eaef>] __slab_alloc+0x12f/0x3d0 [<c017ee6b>] kmem_cache_alloc+0xbb/0xc0 [<c016003e>] ? mempool_alloc_slab+0xe/0x10 [<c016003e>] mempool_alloc_slab+0xe/0x10 [<c016013c>] mempool_alloc+0x2c/0xd0 [<c01a6d71>] bvec_alloc_bs+0x51/0xf0 [<c01a70c5>] bio_alloc_bioset+0x45/0x80 [<c01a710e>] bio_alloc+0xe/0x20 [<c01aac50>] mpage_alloc+0x20/0x80 ...
got the report below - config attached. This is probably bitfield initialization related. tip-072062d5. Ingo -----------------> kmemcheck: Caught 32-bit read from uninitialized memory (f6038ec0) 0000000000000000d0e86bc0d0fa6bc070446cc060fa6bc0205c71c0103772c0 u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u ^ Pid: 0, comm: swapper Tainted: G W (2.6.27-rc7-tip-00980-g072062d-dirty #37099) System Product Name EIP: 0060:[<c06beba6>] EFLAGS: 00010282 CPU: 0 EIP is at sk_clone+0x26/0x1d0 EAX: f6038d80 EBX: f6038d80 ECX: 00000020 EDX: c0ac19c0 ESI: f6038000 EDI: f6038d80 EBP: c0b67d54 ESP: c0b860e8 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: f6015800 CR3: 36105000 CR4: 00000680 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff4ff0 DR7: 00000400 [<c0704f7a>] inet_csk_clone+0x1a/0x90 [<c07185aa>] tcp_create_openreq_child+0x1a/0x3a0 [<c071637f>] tcp_v4_syn_recv_sock+0x3f/0x1c0 [<c07183c3>] tcp_check_req+0x283/0x450 [<c0715e81>] tcp_v4_do_rcv+0x261/0x360 [<c0717fa4>] tcp_v4_rcv+0x5d4/0x6b0 [<c06fc66c>] ip_local_deliver+0x6c/0x1a0 [<c06fca85>] ip_rcv+0x2e5/0x5a0 [<c06c7621>] netif_receive_skb+0x3f1/0x480 [<c047030f>] nv_napi_poll+0x38f/0x670 [<c06c9699>] net_rx_action+0x129/0x1b0 [<c0131eb2>] __do_softirq+0x82/0x100 [<c0105fc5>] call_on_stack+0x15/0x30 [<ffffffff>] 0xffffffff
another crash is below - config attached.
Ingo
----------------->
initcall 0xc08eabf0 returned 0 after 7 msecs
calling 0xc08eae70 @ 1
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc550
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc580, dseg 0xf0000
BUG: unable to handle kernel paging request at 0000c6ef
IP: [<c011ee33>]
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1, comm: swapper Not tainted (2.6.27-rc7-tip-00980-g072062d-dirty #37107) System Product Name
EIP: 0060:[<c011ee33>] EFLAGS: 00010046 CPU: 0
EAX: 0000c6ef EBX: 0000c6ef ECX: 00000001 EDX: 00fd6000
ESI: 0000c6ef EDI: c190e220 EBP: f703fd96 ESP: f703fd8e
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f703e000 task=f7040000 task.ti=f703e000)
Stack: 0000c6ef f703fe2e f703fdc6 c011e7c4 fdb4f708 00000001 f7080580 afeff703
7018c011 f703fdc6 c011afef f7080580 f703fe2e 00000002 f703fdde c011ebde
00000001 0000c695 00000034 c011a110 f703fe26 c011a56f fdf80000 ef22f703
Call Trace:
[<c011e7c4>] ? <0> [<c011afef>] ? <0> [<c011ebde>] ? <0> [<c011a110>] ? <0> [<c011a56f>] ? <0> [<c011a110>] ? <0> [<c06f2d12>] ? <0> =======================
Code: c7 02 02 00 00 00 eb df 8b 45 08 c7 00 01 00 00 00 eb d4 90 8d 74 26 00 55 89 e5 56 89 c6 53 eb 0a 8d b4 26 00 00 00 00 83 c6 01 <0f> b6 1e 89 d8 e8 f3 fe ff ff 84 c0 75 ef 81 e3 f0 00 00 00 31
EIP: [<c011ee33>] SS:ESP 0068:f703fd8e
Kernel panic - not syncing: Fatal exception
Ingo
Thanks, I can reproduce it. And that's very interesting. Look at the
faulting address and the line just above:
PnP BIOS version 1.0, entry 0xf0000:0xc580
When I run it, I get something similar:
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0x67ba, dseg 0xf0000
BUG: unable to handle kernel paging request at 000068ac
Enabling debug information also gives me the location of the crash. It
happens in kmemcheck's opcode decoder, when we dereference regs->eip.
Is it possible that PnPBIOS is executing some code which is located at
these weird addresses (like 16-bit code) and requires a different
segment register?
Maybe we should just check regs->flags to make sure that we are not in
VM86 mode or similar. Will experiment a bit with it. Thanks.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
