In general 2.6.25 if looking quite good on my desktop, but there's one
important issue: the system no longer powers off after shutdown.
This works fine with 2.6.24.
If there are any suggestions about patches to try or commits to revert,
please let me know. If not, I'll run a bisect.
Cheers,
FJP
Base Board Information
Manufacturer: Intel Corporation
Product Name: D945GCZ
Version: AAC99567-502
BIOS Information
Vendor: Intel Corp.
Version: NT94510J.86A.4089.2007.0718.0501
Release Date: 07/18/2007
Processor: Intel(R) Pentium(R) D CPU 3.20GHz
$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub [8086:2770] (rev 02)
00:02.0 VGA compatible controller [0300]: Intel Corporation 82945G/GZ Integrated Graphics Controller [8086:2772] (rev 02)
00:1b.0 Audio device [0403]: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller [8086:27d8] (rev 01)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 [8086:27d0] (rev 01)
00:1c.2 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 [8086:27d4] (rev 01)
00:1c.3 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 [8086:27d6] (rev 01)
00:1c.4 PCI bridge [0604]: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 [8086:27e0] (rev 01)
00:1c.5 PCI bridge [0604]: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 [8086:27e2] (rev 01)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 [8086:27c8] (rev 01)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 [8086:27c9] (rev 01)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 [8086:27ca] (rev 01)
00:1d.3 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 [8086:27cb] (rev 01)
00:1d.7 USB Controller [0c03]: Intel Corporation ...(Resending full details as I can't find my previous mail in the archives.)
Don't ask me why, but bisection shows this commit to be the cause of the
failure to power off:
commit c10997f6575f476ff38442fa18fd4a0d80345f9d
Author: Greg Kroah-Hartman <gregkh@suse.de>
Date: Thu Dec 20 08:13:05 2007 -0800
Kobject: convert drivers/* from kobject_unregister() to kobject_put()
Because it seemed somewhat unlikely, I have double checked this by doing an
extra compilation for this commit and its predecessor.
Cheers,
What is the symptom of not powering off? Can you press SysRq-T and see a task list running and waiting when things should be shut down? Do you happen to have a USB storage stick plugged into the system? thanks, greg k-h --
Symptom is that the system shuts down normally and completely, it just does not power off. Here are the last messages on the console: Will now halt. sd 1:0:0:0: [sdb] Synchronizing SCSI cache sd 1:0:0:0: [sdb] Stopping disk sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk ACPI: PCI interrupt for device 0000:01:00.0 disabled Nothing. Only USB kbd/mouse. Note that I've had this issue before with this box: http://bugzilla.kernel.org/show_bug.cgi?id=6879 Somehow it disappeared when I pulled the extra video card that came with the system (no decent driver for it, so no loss). Since then the system has always powered off completely reliably. This time it is a clear and reproducible regression. If we can solve this one we might get a better handle on #6879 too. Cheers, FJP --
I've been struggling with an identically-manifesting regression on one of my test machines for a week. It's due to softlockup changes, and setting CONFIG_DETECT_SOFTLOCKUP=n "fixes" it. It sounds unlikely, but I'd suggest that you see if it's the same on your machine so we're not both chasing the same bug. --
I don't have CONFIG_DETECT_SOFTLOCKUP defined in .config and there's not option using menuconfig to select this. I don't know whether my problem is related or not on Lenovo X60s. I can power-off on shutdown and suspend-to-ram, but screen turns green, and doesn't power-off on suspend-to-disk. I've to manually press and hold the power switch to switch off. System is able to resume later. It was working as recent as last week, but something changed past few days. Jeff. --
Unsetting CONFIG_DETECT_SOFTLOCKUP does not help in my case, but thanks for the suggestion. --
I already noticed yesterday that there's one hunk in that commit that's not
a straight replacement:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 9e102af..5efd555 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1030,8 +1030,6 @@ static int __cpufreq_remove_dev (struct sys_device * sys_dev)
unlock_policy_rwsem_write(cpu);
- kobject_unregister(&data->kobj);
-
kobject_put(&data->kobj);
/* we need to make sure that the underlying kobj is actually
So, just on the off chance, I applied the patch below and bingo, the system
powers off again. I doubt this will be the correct solution, but just in
case it is, here's my signed off. A comment why the double put is needed
would probably be good though.
Signed-off-by: Frans Pop <elendil@planet.nl>
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 64926aa..9dbaac6 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1058,6 +1058,7 @@ static int __cpufreq_remove_dev (struct sys_device * sys_dev)
unlock_policy_rwsem_write(cpu);
kobject_put(&data->kobj);
+ kobject_put(&data->kobj);
/* we need to make sure that the underlying kobj is actually
* not referenced anymore by anybody before we proceed with
--
There is a bug in the cpufreq kref logic that makes this "double put" necessary. A real fix has already been posted to solve this issue, and I think it should be on it's way to Linus for -rc2 already. Please let me know if -rc2 comes out without this needed fix. thanks, greg k-h --
Can you point me to the fix, please? Thanks, Rafael --
I swear someone else sent this in, but my archives don't show it at all.
I think the patch below should solve this, but I need someone to test
it.
thanks,
greg k-h
---
drivers/cpufreq/cpufreq.c | 8 --------
1 file changed, 8 deletions(-)
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1006,14 +1006,6 @@ static int __cpufreq_remove_dev (struct
}
#endif
-
- if (!kobject_get(&data->kobj)) {
- spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
- cpufreq_debug_enable_ratelimit();
- unlock_policy_rwsem_write(cpu);
- return -EFAULT;
- }
-
#ifdef CONFIG_SMP
#ifdef CONFIG_HOTPLUG_CPU
--
I tested but it doesn't fix the problem for me. May be my problem is
different ... as my X60s just doesn't power-off on suspend-to-disk.
My .config says ...
# CONFIG_CPU_FREQ is not set
# CONFIG_CPU_IDLE is not set
On Wed, Feb 13, 2008 at 3:54 PM, Andrew Morton
Also, I've tried CONFIG_DETECT_SOFTLOCKUP=n, but this doesn't fix it either.
Here's the last dmesg after suspend-to-disk and hang there...
CPU 1 is now offline
SMP alternatives: switching to UP code
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
PM: Shrinking memory... ^H-^Hdone (0 pages freed)
PM: Freed 0 kbytes in 0.10 seconds (0.00 MB/s)
ACPI: Preparing to enter system sleep state S4
Suspending console(s)
[ ... it just hangs here ... press power-switch does the job, and
system is able to resume upon powering on ]
Thanks,
Jeff.
--
Wait, this is a suspend-to-disk issue. Totally different than the "will not power off" issue. Can you start a new thread on this, and add the suspend people to it? thanks, greg k-h --
OK, great. Do you think that #6879 could be caused by a similar issue elsewhere in the tree? Can you give me some pointers on how I could find out (debugging to Will do. --
after disable cpufreq, i got ACPI: Preparing to enter system sleep state S5 Disabling non-boot CPUs ... kvm: disabling virtualization on CPU1 CPU 1 is now offline CPU1 is down kvm: disabling virtualization on CPU2 CPU 2 is now offline ================> hang here. but x86.git/mm could go through down all the cpus.... interesting... YH --
i suspect some kobject related race, and i have the feeling this all is
timing dependent.
Andrew started seeing reboot hangs roughly around the time when the
kobject changes went upstream. Given that x86.git had flux in that
timeframe too i couldnt be sure what caused them.
i have the fixlet below in x86.git but it didnt solve Andrew's problem
so it's parking now at the end of the queue, with no clear purpose in
life :-) If it would solve someone's problem it might be revitalized.
Note: this does not fix any particular bug i know about, it's just a
hack.
Ingo
------------------------------>
Subject: x86: highprio shutdown hack
From: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/reboot.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
Index: linux-x86.q/arch/x86/kernel/reboot.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/reboot.c
+++ linux-x86.q/arch/x86/kernel/reboot.c
@@ -396,8 +396,20 @@ static void native_machine_shutdown(void
if (!cpu_isset(reboot_cpu_id, cpu_online_map))
reboot_cpu_id = smp_processor_id();
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+ /*
+ * Make certain we only run on the appropriate processor,
+ * and with sufficient priority:
+ */
+ {
+ struct sched_param schedparm;
+ int ret;
+
+ schedparm.sched_priority = 99;
+ ret = sched_setscheduler(current, SCHED_RR, &schedparm);
+ WARN_ON_ONCE(1);
+
+ set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+ }
/* O.K Now that I'm on the appropriate processor,
* stop all of the others.
--
so I got ------------[ cut here ]------------ WARNING: at arch/x86/kernel/reboot.c:409 native_machine_shutdown+0x5f/0xb4() Modules linked in: Pid: 7173, comm: reboot Not tainted 2.6.25-rc1-smp-00168-g458504f-dirty #33 Call Trace: [<ffffffff802521d8>] warn_on_slowpath+0x64/0x8e [<ffffffff802464c8>] enqueue_task+0x5c/0x7e [<ffffffff8027765c>] rt_mutex_adjust_pi+0x28/0x94 [<ffffffff8024c075>] sched_setscheduler+0x304/0x33c [<ffffffff80237feb>] native_machine_shutdown+0x5f/0xb4 [<ffffffff80237f6e>] native_machine_restart+0x2e/0x4c [<ffffffff8026234b>] sys_reboot+0x140/0x1b2 [<ffffffff8029bbd2>] handle_mm_fault+0x380/0x705 [<ffffffff802cc311>] d_kill+0x50/0x7c [<ffffffff8097d000>] do_page_fault+0x3bd/0x7c9 [<ffffffff804d58ae>] __up_read+0x27/0xb5 [<ffffffff8022432b>] system_call_after_swapgs+0x7b/0x80 ---[ end trace eb0e49090acb42b5 ]--- --
it seems only happen 1. first hang with cpufreq enabled. 2. reboot to kernel with cpufreq disable will have problem. wonder if different cpu freq out sync and next kernel with reboot doesn't have cpufreq so it .... -- with warm reset doesn't do the right job to sync freq again. Greg, where is patch to fix cpufreq problem? YH --
ACPI: Preparing to enter system sleep state S5 Disabling non-boot CPUs ... kvm: disabling virtualization on CPU1 CPU 1 is now offline 1 2 3 4 5 CPU1 is down kvm: disabling virtualization on CPU2 CPU 2 is now offline 1 2 3 4 5 CPU2 is down kvm: disabling virtualization on CPU3 CPU 3 is now offline ========> some time later Clocksource tsc unstable (delta = 515397918052 ns) Time: hpet clocksource has been installed. it hangs in raw_notifier_call_chain(&cpu_chain, CPU_DEAD | mode, hcpu)== NOTIFY_BAD); there are several nb, not sure which one cause hang. 8 hrtimer.c hrtimers_init 1505 register_cpu_notifier(&hrtimers_nb); 9 rcuclassic.c __rcu_init 570 register_cpu_notifier(&rcu_nb); a rcupreempt.c __rcu_init 892 register_cpu_notifier(&rcu_nb); ===> not used b sched.c migration_init 5951 register_cpu_notifier(&migration_notifier); c softirq.c spawn_ksoftirqd 645 register_cpu_notifier(&cpu_nfb); d softlockup.c spawn_softlockup_task 310 register_cpu_notifier(&cpu_nfb); e timer.c init_timers 1367 register_cpu_notifier(&timers_nb); f page-writeback.c page_writeback_init 775 register_cpu_notifier(&ratelimit_nb); g page_alloc.c setup_per_cpu_pageset 2744 register_cpu_notifier(&pageset_notifier); h slab.c kmem_cache_init 1638 register_cpu_notifier(&cpucache_notifier); ==> not used i slub.c kmem_cache_init 3036 register_cpu_notifier(&slab_notifier); j vmstat.c setup_vmstat 855 register_cpu_notifier(&vmstat_notifier); k kvm_main.c kvm_init 1328 r = register_cpu_notifier(&kvm_cpu_notifier); maybe the one in softlockup.c? YH --
Ugh, sorry, I was mistaken, it's not a cpufreq issue, it's a CONFIG_DETECT_SOFTLOCKUP issue. Or that is what I was told before. But the fact that you fixed the problem with an extra kobject_put() makes me worry. There might be a reference issue still there. I'll look into it. thanks, greg k-h --
could be two issues: one in cpufreq, and one in detect softlockup... YH --
I swear someone sent this patch in before. Can you try this one below,
there seems to be an imbalance with kobject_get and _put.
thanks,
greg k-h
---
drivers/cpufreq/cpufreq.c | 8 --------
1 file changed, 8 deletions(-)
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1006,14 +1006,6 @@ static int __cpufreq_remove_dev (struct
}
#endif
-
- if (!kobject_get(&data->kobj)) {
- spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
- cpufreq_debug_enable_ratelimit();
- unlock_policy_rwsem_write(cpu);
- return -EFAULT;
- }
-
#ifdef CONFIG_SMP
#ifdef CONFIG_HOTPLUG_CPU
--
I did remember seeing this patch before [1] and can confirm that it does indeed fix the issue: with this patch applied to 2.6.25 git head my system powers off correctly. --
confirmed, with this patch, i still need disable CONFIG_DETECT_SOFTLOCKUP assume watchdog thread for the dead cpu can not be stopped. hang somewhere. YH --
Ingo, with patch (http://lkml.org/lkml/2008/2/8/342) and following patch, it could power off with CONFIG_DETECT_SOFTLOCKUP config diff --git a/kernel/softlockup.c b/kernel/softlockup.c index 7c2da88..c16a658 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -282,12 +282,12 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_CANCELED_FROZEN: if (!per_cpu(watchdog_task, hotcpu)) break; - /* Unbind so it can run. Fall thru. */ - kthread_bind(per_cpu(watchdog_task, hotcpu), - any_online_cpu(cpu_online_map)); + /* Fall thru. */ case CPU_DEAD: case CPU_DEAD_FROZEN: p = per_cpu(watchdog_task, hotcpu); + /* Unbind so it can run. */ + kthread_bind(p, any_online_cpu(cpu_online_map)); per_cpu(watchdog_task, hotcpu) = NULL; kthread_stop(p); break; but got WARN on every CPU. ACPI: Preparing to enter system sleep state S5 Disabling non-boot CPUs ... kvm: disabling virtualization on CPU1 CPU 1 is now offline ------------[ cut here ]------------ WARNING: at kernel/kthread.c:176 cpu_callback+0x14f/0x177() Modules linked in: Pid: 7224, comm: halt Not tainted 2.6.25-rc1-smp-00266-g4ee29f6-dirty #110 Call Trace: [<ffffffff80243c61>] warn_on_slowpath+0x51/0x63 [<ffffffff80257e00>] ktime_get_ts+0x3d/0x48 [<ffffffff8023b160>] hrtick_start_fair+0xe1/0x129 [<ffffffff8023a049>] enqueue_task+0x4d/0x58 [<ffffffff8023ceda>] try_to_wake_up+0x1ae/0x1bf [<ffffffff80848796>] cpu_callback+0x14f/0x177 [<ffffffff802747e4>] writeback_set_ratelimit+0x17/0x5d [<ffffffff8084dd78>] notifier_call_chain+0x29/0x4c [<ffffffff8026083c>] _cpu_down+0x18e/0x251 [<ffffffff80260a3d>] disable_nonboot_cpus+0x50/0xd7 [<ffffffff8024feb4>] kernel_power_off+0x21/0x3a [<ffffffff802500de>] sys_reboot+0xee/0x187 ...
Great, thanks for testing and letting us know. greg k-h --
Ah, thanks, for some reason I couldn't find this in my archives. I'll add this to my queue to go to Linus. thanks, greg k-h --
I was wrong :-( I'd not really done any real workkkk under 2.6.25 yet, but now while running a kernel compile with -j4 (single processor, dual core Pentium D), I see this behavior. The mouse cursor moves a bit jerky and I sometimes get key presses repeated. While I'm typing this, the load lowers a bit and immediately things become smoother and the key repeats seem to vanish. (The key repeats in the subject and para above are real examples of this, not typo's.) The keyboard repeat issue looks like what was reported in [1], but for me this is very definitely an new issue that did not appear with 2.6.24 or earlier. Cheers, FJP [1] http://lkml.org/lkml/2008/2/6/100 --
I see this one, too... x60, too... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
I have that problem on my Dell Precision WorkStation , as soon I stress the box a bit keyboard is going mad. Gabriel --
Sounds like you may have CONFIG_GROUP_SCHED set? Bisection fingered 6b2d7700266b9402e12824e11e0099ae6a4a6a79 as the source here. -Mike --
I can't confirm is that commit because I cannot revert it clean but I can confirm there is something wrong with CONFIG_GROUP_SCHED. ( maybe Peter or Ingo knows =) ) Turning CONFIG_GROUP_SCHED off on this box fixes the mouse and keyboard problems. Gabriel --
Yeah, looks like it is the same issue. I'd suggest that folks who are hitting this disable CONFIG_GROUP_SCHED, and flog other parts of Funky Weasel's anatomy while it's being sorted. -Mike --
