Re: [>=2.6.23][BUG] Oops on power disconnection

Previous thread: [PATCH 1/1] Char: moxa, fix compiler warning by Jiri Slaby on Sunday, March 30, 2008 - 1:42 am. (1 message)

Next thread: 业务咨询 by wang on Sunday, March 30, 2008 - 7:15 am. (1 message)
From: Sanjeev Aditya Naga
Date: Sunday, March 30, 2008 - 6:51 am

Hi,

 This happens everytime there is a power disconnection
 (switching to battery). Complete dmesg attached. This
 in particular is of kernel 2.6.24.3.

 Greetings,
 Kind Regards,
 Sanjeev

 BUG: unable to handle kernel NULL pointer dereference at virtual
 address 00000020
 printing eip: c04c4716 *pde = 578c2067
 Oops: 0000 [#1] SMP
 Modules linked in: cbc(U) geode_aes(U) blkcipher(U) aes_i586(U)
 aes_generic(U) dm_crypt(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U)
 bridge(U) autofs4(U) nf_conntrack_ipv4(U) xt_state(U) nf_conntrack(U)
 xt_tcpudp(U) ipt_REJECT(U) iptable_filter(U) ip_tables(U) x_tables(U)
 cpufreq_ondemand(U) acpi_cpufreq(U) fuse(U) loop(U) dm_mirror(U)
 dm_multipath(U) dm_mod(U) ipv6(U) snd_hda_intel(U) snd_seq_dummy(U)
 snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U)
 snd_pcm_oss(U) sr_mod(U) snd_mixer_oss(U) snd_pcm(U) 8139cp(U)
 snd_timer(U) button(U) 8139too(U) mii(U) snd_page_alloc(U) cdrom(U)
 video(U) output(U) snd_hwdep(U) ac(U) snd(U) pcspkr(U) i2c_piix4(U)
 i2c_core(U) battery(U) joydev(U) soundcore(U) sg(U) pata_atiixp(U)
 pata_acpi(U) sata_sil(U) ata_generic(U) libata(U) sd_mod(U)
 scsi_mod(U) ext3(U) jbd(U) mbcache(U) uhci_hcd(U) ohci_hcd(U)
 ehci_hcd(U)

 Pid: 69, comm: kacpi_notify Not tainted (2.6.24.3 #5)
 EIP: 0060:[<c04c4716>] EFLAGS: 00010246 CPU: 0
 EIP is at sysfs_addrm_start+0x21/0x81
 EAX: c04c47d7 EBX: 00000000 ECX: 00000000 EDX: f78b8000
 ESI: f78b8eb8 EDI: f78b8ec8 EBP: 00000000 ESP: f78b8ea4
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process kacpi_notify (pid: 69, ti=f78b8000 task=f78d8000 task.ti=f78b8000)
 Stack: f229ff54 f229ff54 f5b8d390 fffffff4 c04c4b45 00000000 00000000 00000000
       00000000 f229ff54 00000000 00000000 f782601c c04c4bab f78b8ee0 c04fe65d
       f229ff54 c04fe8a8 f722e67f ffffffff ffffffff 00000007 f722e678 f78261d8
 Call Trace:
  [<c04c4b45>] create_dir+0x33/0x6c
  [<c04c4bab>] sysfs_create_dir+0x2d/0x40
  [<c04fe65d>] kobject_get+0xf/0x13
  [<c04fe8a8>] ...
From: Sanjeev Aditya Naga
Date: Sunday, March 30, 2008 - 9:00 pm

Hi,

Please find the dmesg below:
BTW, I tried basically with kernels after 2.6.21 and I get the
same result.

Kind Regards,
Sanjeev

Initializing cgroup subsys cpuset
Linux version 2.6.24.3 (root@draksha.cultuzz.in) (gcc version 4.3.0
20080222 (Red Hat 4.3.0-0.11) (GCC) ) #5 SMP Tue Apr 1 00:04:55 IST
2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
 BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ce000 - 00000000000d0000 (reserved)
 BIOS-e820: 00000000000dc000 - 00000000000e0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000057e80000 (usable)
 BIOS-e820: 0000000057e80000 - 0000000057e96000 (ACPI data)
 BIOS-e820: 0000000057e96000 - 0000000057f00000 (ACPI NVS)
 BIOS-e820: 0000000057f00000 - 0000000058000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
510MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f7130
Using x86 segment limits to approximate NX protection
Entering add_active_range(0, 0, 360064) 0 entries of 256 used
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
  HighMem    229376 ->   360064
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0:        0 ->   360064
On node 0 totalpages: 360064
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 1021 pages used for memmap
  HighMem zone: 129667 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
DMI present.
Using APIC driver default
ACPI: RSDP 000F7070, 0014 (r0 TOSQCI)
ACPI: RSDT 57E8F4EF, 0038 (r1 TOSQCI TOSQCI00  6040000  ...
From: Andrew Morton
Date: Wednesday, April 2, 2008 - 12:14 am

Looks like a cpuidle problem (or at least acpi).

I seem to recall having seen other reports of this?
--

From: Sanjeev Aditya Naga
Date: Wednesday, April 2, 2008 - 6:08 am

Hello Andrew Morton,

Greetings!

Thank you for the update. Is there anything I can do
from my side?

I thought it was a acpi (dsdt) problem. And based on
a tutorial, I have tried to extract, fix, recompile the dsdt
and use it with the kernel. But still I have the same problem.
Let me know if I shall attach the dsdt (original) decompiled
code, if that helps.

Kind Regards,
Sanjeev

On Wed, Apr 2, 2008 at 12:44 PM, Andrew Morton
--

From: Thomas Renninger
Date: Wednesday, April 2, 2008 - 7:48 am

Hi,

	
this could be due a general memory corruption problem through ACPICA.
If you get different backtraces on reboots even you only modified things
that do not have to do with the problem, it's probably that and related
to:
http://bugzilla.kernel.org/show_bug.cgi?id=10339

You might want to try the latest kernel or the patch posted there.

Then it might be something else...

   Thomas

--

From: Sanjeev Aditya Naga
Date: Wednesday, April 2, 2008 - 9:13 am

Thank you for the update.
I have checked the bug and unfortunately its not the
same issue. Things work absolutely fine, when I'm
running on AC power. It even displays the exact
battery (and charging) status to me. It messes up
when suddenly AC power gets disconnected and
switches to battery mode (The time when I get Oops).
The system is still usable after switching to battery
mode and I still get correct battery stats until its
completely discharged. However most of the commands
like kill, poweroff, java doesn't work after the Oops.

BTW there is one similarity with the referenced bug.
If I boot the computer without AC Power, it gives the
same Oops and stops during booting itself.

I shall try the latest kernel once and shall update you.

Kind Regards,
From: Thomas Renninger
Date: Wednesday, April 2, 2008 - 11:12 am

The bug is not related to battery, but to AML parsing and can therefore
That would be great.
If it works, please give the patch there a try, IMO this one should see
2.6.2[34].X stable kernels soon.

Thanks,


--

From: Sanjeev Aditya Naga
Date: Thursday, April 3, 2008 - 9:15 am

Hello Thomas,


I have got the lastest kernel 2.6.25-rc8 today. I observed that
the referenced patch is already in the kernel. However this didn't
solve the problem in question. I get the same Oops on this kernel
as well. Find the latest dmesg along with the Oops:

Regards,
Sanjeev

------------[ cut here ]------------
WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
Modules linked in: wlan_scan_sta ath_rate_sample ath_pci wlan
ath_hal(P) sit tunnel4 ipv6 cbc aes_i586 aes_generic dm_crypt
ipt_MASQUERADE iptable_nat nf_nat bridge autofs4 nf_conntrack_ipv4
xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables
x_tables cpufreq_ondemand acpi_cpufreq fuse loop dm_mirror
dm_multipath dm_mod snd_hda_intel snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer sr_mod video 8139cp snd_page_alloc 8139too snd_hwdep
i2c_piix4 i2c_core pcspkr output snd battery soundcore ac mii joydev
sg cdrom button pata_atiixp pata_acpi sata_sil ata_generic libata
sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
unloaded: microcode]
Pid: 70, comm: kacpi_notify Tainted: P         2.6.25-rc8 #2
 [<c0427dcc>] warn_on_slowpath+0x40/0x65
 [<c043007b>] switch_uid+0x5a/0x70
 [<c04147bc>] smp_call_function_single+0x27/0x47
 [<c04f1610>] number+0x120/0x1e6
 [<c04f1ed1>] vsnprintf+0x40a/0x447
 [<c04ef103>] kref_get+0x17/0x1c
 [<c04ee6a0>] kobject_get+0xf/0x13
 [<c04ee72f>] kobject_add_internal+0x42/0x13b
 [<c04ee8dc>] kobject_init_and_add+0x23/0x25
 [<c0596b90>] cpuidle_add_state_sysfs+0x63/0xd7
 [<c05144d4>] acpi_os_execute_deferred+0x0/0x25
 [<c0596498>] cpuidle_enable_device+0x35/0xac
 [<c0531c28>] acpi_processor_cst_has_changed+0x40/0x54
 [<c052fac1>] acpi_processor_notify+0x83/0xde
 [<c0519549>] acpi_ev_notify_dispatch+0x4c/0x57
 [<c05144f1>] acpi_os_execute_deferred+0x1d/0x25
 [<c0435441>] run_workqueue+0x74/0xef
 [<c0435572>] worker_thread+0xb6/0xc2
 [<c0437f8a>] autoremove_wake_function+0x0/0x2d
 [<c04354bc>] worker_thread+0x0/0xc2
 [<c0437d35>] ...
From: Thomas Renninger
Date: Thursday, April 3, 2008 - 2:04 pm

Ok.
Maybe best is you document the backtraces/oopses with kernel versions at
bugzilla.kernel.org and add dmesg and acpidump.
It seems your machine notifies OS that the C-state table changed.
AFAIK this is rare and there might be a general bug in the cpuidle layer
which I do not know well.
Best you add Venkatesh and Shaohua Li <shaohua.li@intel.com> to CC of
the bug.
While the backtrace shows a lot, cpuidle IMO is missing a general debug
option like in the cpufreq layer.
I couldn't find a single printk in the whole cpuidle/{cpuidle,sysfs}.c
files, even on error paths. Also in the cpuidle specific parts of
drivers/acpi/processor_idle.c some debug printks may help for future bug
reports. It is very hard to guess what happened...


--

From: Pallipadi, Venkatesh
Date: Thursday, April 3, 2008 - 5:31 pm

This looks like cpuidle and kobject interaction.
The latest oops looks different from the original one. Latest one is a
warn_on in lib/kref.c:43

We (Me or Shaohua) will take a deeper look at get back on this.

Thanks,
Venki

From: Sanjeev Aditya Naga
Date: Saturday, April 5, 2008 - 11:05 am

Hello Venki,

On Fri, Apr 4, 2008 at 6:01 AM, Pallipadi, Venkatesh


Thank you for the update. Let me know if I can be of any help!

Kind Regards,
From: Sanjeev Aditya Naga
Date: Saturday, April 5, 2008 - 6:15 am

Hello Thomas,


Thank you for the update:

I have just registered a bug at
http://bugzilla.kernel.org/show_bug.cgi?id=10394
as directed by you.

Kind Regards,
Sanjeev
From: Sanjeev Aditya Naga
Date: Saturday, May 3, 2008 - 2:55 am

Hello Thomas, Andrew,

The recent patch given by Venkatesh at
http://bugzilla.kernel.org/show_bug.cgi?id=10394
has fixed the problem. Thank you all for the support
extended regarding the same.

Kind Regards,
Sanjeev

On Sat, Apr 5, 2008 at 6:45 PM, Sanjeev Aditya Naga
Previous thread: [PATCH 1/1] Char: moxa, fix compiler warning by Jiri Slaby on Sunday, March 30, 2008 - 1:42 am. (1 message)

Next thread: 业务咨询 by wang on Sunday, March 30, 2008 - 7:15 am. (1 message)