Hello This has appeared in my kernel log: Zdenek ---- hwsleep-0322 [00] enter_sleep_state : Entering sleep state [S3] BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791 caller is do_machine_check+0xa9/0x500 Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32 Call Trace: [debug_smp_processor_id+196/208] debug_smp_processor_id+0xc4/0xd0 [do_machine_check+169/1280] do_machine_check+0xa9/0x500 [init_8259A+27/256] ? init_8259A+0x1b/0x100 [mce_init+86/240] mce_init+0x56/0xf0 [mce_resume+11/16] mce_resume+0xb/0x10 [__sysdev_resume+32/96] __sysdev_resume+0x20/0x60 [sysdev_resume+88/144] sysdev_resume+0x58/0x90 [device_power_up+9/16] device_power_up+0x9/0x10 [suspend_devices_and_enter+327/416] suspend_devices_and_enter+0x147/0x1a0 [enter_state+326/464] enter_state+0x146/0x1d0 [state_store+186/256] state_store+0xba/0x100 [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20 [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140 [vfs_write+203/368] vfs_write+0xcb/0x170 [sys_write+80/144] sys_write+0x50/0x90 [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80 Extended CMOS year: 2000 BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791 caller is retrigger_next_event+0x1c/0xf0 Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32 Call Trace: [debug_smp_processor_id+196/208] debug_smp_processor_id+0xc4/0xd0 [retrigger_next_event+28/240] retrigger_next_event+0x1c/0xf0 [_spin_unlock+48/96] ? _spin_unlock+0x30/0x60 [hres_timers_resume+30/80] hres_timers_resume+0x1e/0x50 [timekeeping_resume+293/352] timekeeping_resume+0x125/0x160 [__sysdev_resume+32/96] __sysdev_resume+0x20/0x60 [sysdev_resume+88/144] sysdev_resume+0x58/0x90 [device_power_up+9/16] device_power_up+0x9/0x10 [suspend_devices_and_enter+327/416] suspend_devices_and_enter+0x147/0x1a0 [enter_state+326/464] enter_state+0x146/0x1d0 [state_store+186/256] state_store+0xba/0x100 [kobj_attr_store+23/32] kobj_attr_stor...
[ CCs added ] From: Jiri Kosina <jkosina@suse.cz> x86: fix possible race in do_machine_check() It is not safe to call smp_processor_id() in cases we are not sure someone will not reschedule us. Signed-off-by: Jiri Kosina <jkosina@suse.cz> diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c index 9a699ed..bf92375 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_64.c +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -189,6 +189,7 @@ void do_machine_check(struct pt_regs * regs, long error_code) * error. */ int kill_it = 0; + unsigned long flags; atomic_inc(&mce_entry); @@ -199,6 +200,8 @@ void do_machine_check(struct pt_regs * regs, long error_code) goto out2; memset(&m, 0, sizeof(struct mce)); + + local_irq_save(flags); m.cpu = smp_processor_id(); rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); /* if the restart IP is not valid, we're done for */ @@ -315,6 +318,7 @@ void do_machine_check(struct pt_regs * regs, long error_code) wrmsrl(MSR_IA32_MCG_STATUS, 0); out2: atomic_dec(&mce_entry); + local_irq_restore(flags); } From: Jiri Kosina <jkosina@suse.cz> hrtimer: fix possible race in retrigger_next_event() It is not safe to perform per-cpu variable read and use it later in cases we are not sure someone will not reschedule us. Signed-off-by: Jiri Kosina <jkosina@suse.cz> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 98bee01..c008096 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -486,6 +486,7 @@ static void retrigger_next_event(void *arg) struct hrtimer_cpu_base *base; struct timespec realtime_offset; unsigned long seq; + unsigned long flags; if (!hrtimer_hres_active()) return; @@ -497,6 +498,7 @@ static void retrigger_next_event(void *arg) -wall_to_monotonic.tv_nsec); } while (read_seqretry(&xtime_lock, seq)); + local_irq_save(flags); base = &__get_cpu_var(hrtimer_bases); /* Adjust CLOCK_REA...
thanks Jiri, i've applied both patches of yours. (let me know if there are any updated patches) do they solve the problem? Ingo --
Please unapply them, as discussed in the earlier thread they are not the correct fix. The original bug is still not found unfortunately, but it's most likely somewhere in the KVM suspend/resume code. -Andi --
Hi Ingo, thanks, but I don't think they are the proper fix anymore. We'd rather figure out why interrupts were enabled in the first place while the code was running. Which is quite difficult, because Zdenek doesn't seem to be able to reproduce it, so we even don't know whether it has been caused by KVM or not. -- Jiri Kosina SUSE Labs --
if sticking a WARN_ON_ONCE(!irqs_disabled()) somewhere would help in making the breakage more visible i'd be glad to apply such a patch. Ingo --
We are sure. Machine checks always run with interrupts off. Also BTW in a non preempt-rt kernel all exceptions run non preemptable anyways. -Andi --
I know. However preempt_count is a little bit inconsistent in such cases though. -- Jiri Kosina --
And? interrupts off beats preempt count anyways. Why did you write the patch? Was there a (incorrect) warning triggered? -Andi --
Reported at http://lkml.org/lkml/2008/4/7/130 BTW is also mce_init() (called from mce_resume()) guaranteed to run with IRQs off? -- Jiri Kosina --
... I wouldn't say so, otherwise the BUG wouldn't trigger, right? debug_smp_processor_id() is explicitly checking both for preempt_count value and irqs_disabled(). -- Jiri Kosina SUSE Labs --
[cc rafael] The mce resume is a sysdev. sysdevs were always supposed to run completely with interrupts off. If they don't anymore that's some kind of higher level resume code bug which you need to fix there, not hack around in the low level code. If it does that it likely broke more code too. Obviously turning on preemption anywhere around the machine check is fatal because it touches CPU state and if you reschedule you could switch to another CPU and change or access the wrong CPU's state. -Andi --
FWIW, at the point when sysdevs are resumed we are single-threaded. Thanks, Rafael --
So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() triggers the warning? Apparently preempt_count is zero, irqs_disabled() returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed. Is that really relevant here? We still could be switched over to another CPU, and that would break things. -- Jiri Kosina SUSE Labs --
We are single-threaded because we 'unplugged' all the other cpus... There are no other CPUs. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
That's unrelated, right? The real problem here is that irqs_disabled() call inside debug_smp_processor_id() signalized that interrupts are enabled, and that (correctly) triggered the bug. -- Jiri Kosina SUSE Labs --
Well, we can discuss it personally. Yes, it is probably bug that interrupts are enabled there. No, the bug can't potentially bite the user because there should be just one CPU "plugged in"... so it is kind of false-positive, too.. Does the warning go away with this? Pavel diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c index 6c90fb9..8195c37 100644 --- a/lib/smp_processor_id.c +++ b/lib/smp_processor_id.c @@ -35,6 +35,13 @@ unsigned int debug_smp_processor_id(void goto out; /* + * It is valid to assume CPU-locality if there's just one + * CPU active + */ + if (num_online_cpus() == 1) + goto out; + + /* * Avoid recursion: */ preempt_disable(); -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
thanks Pavel, i picked this up into sched-devel.git - it makes sense independently of whether it solves the warning. Ingo --
This means the CPU preempt sanity checking is effectively disabled on UP kernels. Even though more and more people should run multi-core now there is still a sizeable user base left on single core. Don't think this is a good idea. We need even the UP testers. If he wants this for suspend or machine check/oops (always useful there to disable such warnings) then there should be separate checks for this. Perhaps system_state could be enhanced for this? -Andi --
Agreed. Besides, I'd like to learn what caused the problem to appear in the first That would make sense. Thanks, Rafael --
Thanks!
(I just want to fix the underlying problem in suspend, too. I guess
I'll just do something like
diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index 8e13fd9..adb7850 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -367,6 +367,7 @@ int sysdev_suspend(pm_message_t state)
/* Call auxillary drivers first */
list_for_each_entry(drv, &cls->drivers, entry) {
if (drv->suspend) {
+ BUG_ON(!in_interrupt());
ret = drv->suspend(sysdev, state);
if (ret)
goto aux_driver;
@@ -442,6 +443,7 @@ int sysdev_resume(void)
list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
pr_debug(" %s\n", kobject_name(&sysdev->kobj));
+ BUG_ON(!in_interrupt());
__sysdev_resume(sysdev);
}
}
...to catch this kind of problems early).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--Well, I'd say the BUG_ON()s are over the top in here (why to crash the system even if it wouldn't have crashed otherwise?). WARN_ON_ONCE() should be enough. Thanks, Rafael --
small note: please always use WARN_ON_ONCE() in such cases. That way people's boxes dont blow up and the problem will be debuggable if the box survives despite the error condition. (and thus it can be relayed to kerneloops.org - all major distros have the package included already and Fedora enables the client by default) Ingo --
Agreed, BUG_ON() was bad idea and I got the condition wrong. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
No, we couldn't, because the other CPUs are off at this point. Thanks, Rafael --
To avoid some mystery - these two warns precede warnings I've initially posted here - the reason why I've not posted also these 2 kvm related warns is that kvm maintainers already know about this problem and said in the past that they will eventually fix them (If I remember correctly) Warnings and bug looked somewhat unrelated - but I could be wrong and the mystery irq could be mangles in kvm.... ACPI: Preparing to enter system sleep state S3 Disabling non-boot CPUs ... kvm: disabling virtualization on CPU1 ------------[ cut here ]------------ WARNING: at arch/x86/kernel/smp_64.c:424 smp_call_function_single+0xea/0x100() Modules linked in: cpufreq_userspace tun i915 drm ipt_MASQUERADE iptable_nat nf_na t nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge llc nfsd lockd nfs_acl auth _rpcgss exportfs autofs4 sunrpc binfmt_misc dm_mirror dm_log dm_multipath dm_mod uinput kvm_intel kvm snd_hda_intel snd_seq_oss snd _seq_midi_event snd_seq arc4 snd_seq_device snd_pcm_oss ecb snd_mixer_oss snd_pcm crypto_blkcipher video snd_timer iTCO_wdt cryptom gr crypto_algapi rtc_cmos iwl3945 sdhci mmc_core thinkpad_acpi snd mac80211 psmouse evdev iTCO_vendor_support rtc_core i2c_i801 out put rtc_lib soundcore snd_page_alloc e1000e sr_mod intel_agp battery ac backlight nvram serio_raw i2c_core cfg80211 cdrom button uh ci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode] Pid: 18951, comm: kstopmachine Not tainted 2.6.25-rc8 #32 Call Trace: [warn_on_slowpath+95/144] warn_on_slowpath+0x5f/0x90 [vprintk+560/1136] ? vprintk+0x230/0x470 [hrtick_set+57/304] ? hrtick_set+0x39/0x130 [_end+108902941/2109239016] ? :kvm:decache_vcpus_on_cpu+0x65/0xe0 [smp_call_function_single+234/256] smp_call_function_single+0xea/0x100 [_end+109030358/2109239016] :kvm_intel:vmx_vcpu_decache+0x2e/0x40 [_end+108903005/2109239016] :kvm:decache_vcpus_on_cpu+0xa5/0xe0 [_end+108882341/2109239016] :kvm:hardware_disable+0x2d/0x40 [_end+108882503/21092390...
Yes, please try to reproduce without KVM. It could happen that some KVM code wrongly enabled IRQs somewhere. Thanks, -- Jiri Kosina SUSE Labs --
Well I'm unable to simulate this warning message - it happened to me once even with kvm, so for now I have no idea how to easily replicate this problem. But I'll keep an eye on this. Zdenek --
This looks suspiciously. Apparently, kvm_suspend() does something not entirely sane and that may Thanks, Rafael --
You mean single CPUed? Even a single thread could switch to another CPU. -Andi --
Hm, well. I really don't think that ACPI or the core does it, but anyway Zdenek, could you please apply the appended patch and see if it produces any additional warnings? If it doesn't, then probably one of the sysdevs does it in its ->resume() Signle CPUed too. Thanks, Rafael --- --- drivers/acpi/sleep/main.c | 8 ++++++++ 1 file changed, 8 insertions(+) Index: linux-2.6/drivers/acpi/sleep/main.c =================================================================== --- linux-2.6.orig/drivers/acpi/sleep/main.c +++ linux-2.6/drivers/acpi/sleep/main.c @@ -141,9 +141,13 @@ static int acpi_pm_enter(suspend_state_t break; } + WARN_ON(!irqs_disabled()); + /* Reprogram control registers and execute _BFS */ acpi_leave_sleep_state_prep(acpi_state); + WARN_ON(!irqs_disabled()); + /* ACPI 3.0 specs (P62) says that it's the responsibility * of the OSPM to clear the status bit [ implying that the * POWER_BUTTON event should not reach userspace ] @@ -158,6 +162,8 @@ static int acpi_pm_enter(suspend_state_t */ acpi_hw_disable_all_gpes(); + WARN_ON(!irqs_disabled()); + local_irq_restore(flags); printk(KERN_DEBUG "Back to C!\n"); @@ -165,6 +171,8 @@ static int acpi_pm_enter(suspend_state_t if (acpi_state == ACPI_STATE_S3) acpi_restore_state_mem(); + WARN_ON(!irqs_disabled()); + return ACPI_SUCCESS(status) ? 0 : -EFAULT; } --
| Karl Meyer | PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out" |
| Greg Kroah-Hartman | [PATCH 040/196] kobject: add kobject_add_ng function |
| Steven Rostedt | [RFC PATCH v4] Unified trace buffer |
| Dave Airlie | [git pull] drm patches for 2.6.27 final |
| Krzysztof Halasa | Re: [PATCH v2] Re: WAN: new PPP code for generic HDLC |
| David Miller | Re: [PATCH] Expose netdevice dev_id through sysfs |
| Jay Cliburn | Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected) |
| Evgeniy Polyakov | [resend take 2 0/4] Distributed storage. |
git: | |
| Andrew Morton | Untracked working tree files |
| Miklos Vajna | [rfc] git submodules howto |
| Ben Collins | Re: [kernel.org users] [RFD] On deprecating "git-foo" for builtins |
| Jon Smirl | ! [rejected] master -> master (non-fast forward) |
| rancor | How to copy/pipe console buffert to file? |
| Pieter Verberne | File collision while using pkg_add |
| Greg Thomas | Re: Is it possible to fix a stale NFS hadle without rebooting? |
| Didier Wiroth | win32-codecs, avi and amd64 question |
| Netfilter kernel module | 9 hours ago | Linux kernel |
| serial driver xmit problem | 12 hours ago | Linux kernel |
| Why Windows is better than Linux | 12 hours ago | Linux general |
| How can I see my kernel messages in vt12? | 19 hours ago | Linux kernel |
| Grub | 1 day ago | Linux general |
| vmalloc_fault handling in x86_64 | 1 day ago | Linux kernel |
| epoll_wait()ing on epoll FD | 1 day ago | Linux kernel |
| Framebuffer in x86_64 causes problems to multiseat | 1 day ago | Linux kernel |
| Difference between 2.4 and 2.6 regarding thread creation | 1 day ago | Linux general |
| Compiling gfs2 on kernel 2.6.27 | 2 days ago | Linux kernel |
