Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a26389... Commit: a263898f628dd21e59210b547986c154788f628e Parent: 57a04513cb35086d54bcb2cb92e6627fc8fa0fae Author: Ingo Molnar <mingo@elte.hu> AuthorDate: Sun Dec 30 11:58:17 2007 +0100 Committer: Linus Torvalds <torvalds@woody.linux-foundation.org> CommitDate: Sun Jan 6 12:39:42 2008 -0800 CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU make randconfig bootup testing found that the cpufreq code crashes on bootup, if the powernow-k8 driver is enabled and if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU kernel. First lockdep found out that there's an inconsistent unlock sequence: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 but there are no more locks to release! Call Trace: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c [<ffffffff80252f3a>] mark_held_locks+0x56/0x94 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4 ... then shortly afterwards the cpufreq code crashed on an assert: ------------[ cut here ]------------ kernel BUG at drivers/cpufreq/cpufreq.c:1068! invalid opcode: 0000 [1] SMP [...] Call Trace: [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91 [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2 [<ffffffff80cc0596>] powernowk8_init+0x86/0x94 [...] ---[ end trace 1e9219be2b4431de ]--- the bug was caused by maxcpus=1 bootup, which brought up the secondary core as !cpu_online() but !cpu_is_offline() either, which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h): /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */ static inline int cpu_is_offline(int cpu) { return 0; } but the cpufreq code uses cpu_online() and cpu_is_offline() in a mixed way - the low-level drivers use cpu_online(), while the cpufreq core uses cpu_is_offline(). This opened up the possibility to add the non-initialized sysdev device of the secondary core: cpufreq-core: trying to register driver powernow-k8 cpufreq-core: adding CPU 0 powernow-k8: BIOS error - no PSB or ACPI _PSS objects cpufreq-core: initialization failed cpufreq-core: adding CPU 1 cpufreq-core: initialization failed which then blew up. The fix is to make cpu_is_offline() always the negation of cpu_online(). With that fix applied the kernel boots up fine without crashing: Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00) powernow-k8: BIOS error - no PSB or ACPI _PSS objects initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19. initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94() Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39() We could fix this by making CPU enumeration aware of max_cpus, but that would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu) possibility was quite confusing and a continuous source of bugs too. Most distributions have kernels with CPU hotplug enabled, so this bug remained hidden for a long time. Bug forensics: The broken cpu_is_offline() API variant was introduced via: commit a59d2e4e6977e7b94e003c96a41f07e96cddc340 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Mon Mar 8 06:06:03 2004 -0800 [PATCH] minor cleanups for hotplug CPUs ( this predates linux-2.6.git, this commit is available from Thomas's historic git tree. ) Then 1.5 years later the cpufreq code made use of it: commit c32b6b8e524d2c337767d312814484d9289550cf Author: Ashok Raj <ashok.raj@intel.com> Date: Sun Oct 30 14:59:54 2005 -0800 [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers + if (cpu_is_offline(cpu)) + return 0; which is a correct use of the subtly broken new API. v2.6.15 then shipped with this bug included. then it took two more years for random-kernel qa to hit it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- include/linux/cpu.h | 4 ---- include/linux/cpumask.h | 2 ++ 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index b79c575..92f2029 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -107,7 +107,6 @@ extern void unlock_cpu_hotplug(void); #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) int cpu_down(unsigned int cpu); -#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu)) #else /* CONFIG_HOTPLUG_CPU */ @@ -122,9 +121,6 @@ static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex) /* These aren't inline functions due to a GCC bug. */ #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) #define unregister_hotcpu_notifier(nb) ({ (void)(nb); }) - -/* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */ -static inline int cpu_is_offline(int cpu) { return 0; } #endif /* CONFIG_HOTPLUG_CPU */ #ifdef CONFIG_PM_SLEEP_SMP diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 23f5514..85bd790 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -397,6 +397,8 @@ extern cpumask_t cpu_present_map; #define cpu_present(cpu) ((cpu) == 0) #endif +#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu)) + #ifdef CONFIG_SMP extern int nr_cpu_ids; #define any_online_cpu(mask) __any_online_cpu(&(mask)) - To unsubscribe from this list: send the line "unsubscribe git-commits-head" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| David Miller | [GIT]: Networking |
| Thomas Gleixner | Re: Regression in 2.6.27 caused by commit bfc0f59 |
| Rafael J. Wysocki | [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected |
| jmerkey | [ANNOUNCE] mdb: Merkey's Linux Kernel Debugger 2.6.27-rc4 released |
git: | |
| Mike | I don't want the .git directory next to my code. |
| Kevin Ballard | Re: git on MacOSX and files with decomposed utf-8 file names |
| Karl | Re: git-svn should default to --repack |
| Ken Pratt | pack operation is thrashing my server |
| carlopmart | About Xen: maybe a reiterative question but .. |
| NetOne - Doichin Dokov | OpenBSD as Xen domU |
| Nick Guenther | Re: Real men don't attack straw men |
| Paul Barbeau | RAID/Intel Installation Problem |
| Wang Chen | [V#2 PATCH 0/18] netdevice: Fix directly reference of netdev->priv |
| Indan Zupancic | Re: Realtek 8111C transmit timed out |
| Alexey Kuznetsov | Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets |
| Alexey Dobriyan | Re: [GIT]: Networking |
