x86_64 SMP kernel v2.6.22.6 (not using callgraph).
sometimes oprofile works for a longer time... but not this time.2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP:
2007-09-22 13:53:32.527245948 <1>[ 3372.390195] [<ffffffff80652f44>] _spin_lock+0x4/0x20
2007-09-22 13:53:32.527247694 <4>[ 3372.390211] PGD 13a6c067 PUD 3ad91067 PMD 0
2007-09-22 13:53:32.527249371 <0>[ 3372.390216] Oops: 0002 [1] SMP
2007-09-22 13:53:32.527250837 <4>[ 3372.390220] CPU 0
2007-09-22 13:53:32.527252304 <4>[ 3372.390227] Modules linked in: i915 oprofile xt_CLASSIFY ipt_ECN xt_CONNMARK xt_connlimit xt_length ipt_set xt_multiport ip_set_iphash ip_set_nethash ip_set_portmap sch_esfq ipt_REJECT ip6t_LOG xt_limit ipt_LOG xt_hashlimit ipt_owner nf_conntrack_ipv4 xt_state xt_tcpudp ip6table_filter ip6table_mangle ip6_tables iptable_filter iptable_mangle iptable_raw ip_tables tcp_highspeed tcp_htcp tcp_hybla tcp_scalable tcp_vegas tcp_westwood ip_set sch_netem sch_hfsc sch_htb sch_sfq cls_fw cls_u32 cls_route sch_ingress sch_red sch_tbf sch_teql sch_prio sch_gred cls_rsvp cls_rsvp6 cls_tcindex sch_cbq sch_dsmark nf_conntrack xt_TARPIT x_tables perfctr dccp_diag dccp ioatdma cmtp kernelcapi l2cap bluetooth intelfb i2c_algo_bit i810 lp i2c_dev ftdi_sio usbserial usb_storage eeprom ohci_hcd irlan irda crc_ccitt binfmt_misc loop dm_mod video dock button battery ac tcp_cubic nvram snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_dev
ice snd_pcm_oss snd_mixer_oss parport_p
2007-09-22 13:53:32.527328363 c snd_pcm parport snd_timer ohci1394 ieee1394 snd i2c_i801 i2c_core soundcore ehci_hcd uhci_hcd e1000 snd_page_alloc iTCO_wdt iTCO_vendor_support
2007-09-22 13:53:32.527330808 <4>[ 3372.390363] Pid: 7, comm: events/0 Not tainted 2.6.22.6-cfs-v20.5-64-2 #7
2007-09-22 13:53:32.527332623 <4>[ 3372.390366] RIP: 0010:[<ffffffff80652f44>] [<ffffffff80652f44>] _spin_lock+0x4/0x...
Can you reproduce it with a non tainted kernel without any patches like CFS applied?
The oops happens because oprofile tries to reference a NULL task in
its event buffer for a context switch, which might well be a minor
merging mistake or similar in the CFS backport.-
CFS does not change that aspect of the oprofile code.
Ingo
-
I tried without CFS and it Oopsed just as nicely,
Call Trace:
[<ffffffff80231988>] get_task_mm+0x18/0x50
[<ffffffff88316b64>] :oprofile:sync_buffer+0xf4/0x450
[<ffffffff88316650>] :oprofile:wq_sync_buffer+0x0/0x60
[<ffffffff88316683>] :oprofile:wq_sync_buffer+0x33/0x60
[<ffffffff80289f10>] cache_reap+0x0/0x130
[<ffffffff80243269>] run_workqueue+0x89/0x120
[<ffffffff80243cf7>] worker_thread+0xc7/0x130
[<ffffffff802474d0>] autoremove_wake_function+0x0/0x40
[<ffffffff80243c30>] worker_thread+0x0/0x130
[<ffffffff802470cd>] kthread+0x4d/0x80
[<ffffffff8020aa78>] child_rip+0xa/0x12
[<ffffffff80247080>] kthread+0x0/0x80--
Do what you love because life is too short for anything else.-
This is easy to trigger with "ping -f 127.0.0.1" (as root).
Oopses inside 0.1s. If you can't reproduce, try more events and/or smaller
sample count.May I suggest something: if system is too slow to process events,
find out which event is triggering too often, double the sample count
for it, reset all counters, and log an informative kernel message.--
Do what you love because life is too short for anything else.-
On the per cpu buffer writer side oprofile_add_sample() use profile_pc()
to get the eip, profile_pc() can return ~0lu, but an eip == ~0lu is a
magic value = ESCAPE_CODE. The per cpu reader side in buffer_sync.c use
this value to know that the associated data is a task pointer but here
the associated data is a counter number.This has already been reported two years ago by Jesse Barnes on the same
sort of box, pentium D. This is not reproducible on a duo core nor I was
able to on a P4 box two years ago, I dunno why. Anyway profile_pc() is
broken() on both i386/x86_64, w/o frame pointer. For i386:000000b0 <_spin_lock_bh>:
b0: 53 push %ebx /* break profile_pc() */
b1: 89 c3 mov %eax,%ebxOn x86_64 it's broken with or w/o frame pointer.
I understand the motivation to get the eip calling a spinlock function,
but that's a cheat and it has a price. Beside that, the trouble is also
on oprofile side, magic value are evil. This bug exists since at least
2.6.13.Sami can you try the attached patch, the chunk in buffer_sync.c is
here only to avoid oopsing if another problem exists somewhere.--
Phe
I applied this patch and I'm not able to reproduce Oops anymore
with 2.6.22.10 or 2.6.23.
Haven't gotten "oprofile: Invalid event", either.Thanks a bunch!
--
Do what you love because life is too short for anything else.-
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 007/196] Chinese: add translation of stable_kernel_rules.txt |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [BUG] New Kernel Bugs |
