From: Chuck Ebbert <cebbert@redhat.com> x86: allow number of additional hotplug CPUs to be set at compile time The default number of additional CPU IDs for hotplugging is determined by asking ACPI or mptables how many "disabled" CPUs there are in the system, but many systems get this wrong so that e.g. a uniprocessor machine gets an extra CPU allocated and never switches to single CPU mode. And sometimes CPU hotplugging is enabled only for suspend/hibernate anyway, so the additional CPU IDs are not wanted. Allow the number to be set to zero at compile time. Also, force the number of extra CPUs to zero if hotplugging is disabled which allows removing some conditional code. Tested on uniprocessor x86_64 that ACPI claims has a disabled processor, with CPU hotplugging configured. ("After" has the number of additional CPUs set to 0) Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1 After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1 Signed-off-by: Chuck Ebbert <cebbert@redhat.com> --- Index: linux-2.6.26.noarch/arch/x86/Kconfig =================================================================== --- linux-2.6.26.noarch.orig/arch/x86/Kconfig +++ linux-2.6.26.noarch/arch/x86/Kconfig @@ -1366,6 +1366,24 @@ config HOTPLUG_CPU Say N if you want to disable CPU hotplug and don't need to suspend. +config HOTPLUG_DEFAULT_ADDITIONAL_CPUS + def_bool y + prompt "Allocate extra CPUs for hotplugging after boot" if HOTPLUG_CPU + ---help--- + Say yes here to use the default, which allows as many CPUs as are marked + "disabled" by ACPI or MPTABLES to be hotplugged after bootup. + + Say no if you do not want to allow CPUs to be added after booting, for + example if you only need CPU hotplugging enabled for suspend/resume. + + This value may be overridden at boot time with the "additional_cpus" + kernel parameter, if CPU_HOTPLUG is enabled. + +config HOTPLUG_ADDITIONAL_CPUS + int + default 0 if !HOTPLUG_CPU || !HOTPLUG_DEFAULT_ADDITIONAL_CPUS + default ...
hm, wouldnt this option kill 'real' hot-plug CPUs (how rare they might be) which are properly enumerated in the BIOS tables? i dont mind having a facility to disable real CPU hotplug, but the CONFIG_HOTPLUG_DEFAULT_ADDITIONAL_CPUS does not spell that out clearly IMO. Something like CONFIG_HOTPLUG_RESTRICT_TO_BOOTUP_CPUS=y would be more appropriately named i think? Ingo --
From: Chuck Ebbert <cebbert@redhat.com> x86: allow number of additional hotplug CPUs to be set at compile time, V2 The default number of additional CPU IDs for hotplugging is determined by asking ACPI or mptables how many "disabled" CPUs there are in the system, but many systems get this wrong so that e.g. a uniprocessor machine gets an extra CPU allocated and never switches to single CPU mode. And sometimes CPU hotplugging is enabled only for suspend/hibernate anyway, so the additional CPU IDs are not wanted. Allow the number to be set to zero at compile time. Also, force the number of extra CPUs to zero if hotplugging is disabled which allows removing some conditional code. Tested on uniprocessor x86_64 that ACPI claims has a disabled processor, with CPU hotplugging configured. ("After" has the number of additional CPUs set to 0) Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1 After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1 [Changed the name of the option and the prompt according to Ingo's suggestion.] --- Index: linux-2.6.26.noarch/arch/x86/Kconfig =================================================================== --- linux-2.6.26.noarch.orig/arch/x86/Kconfig +++ linux-2.6.26.noarch/arch/x86/Kconfig @@ -1366,6 +1366,24 @@ config HOTPLUG_CPU Say N if you want to disable CPU hotplug and don't need to suspend. +config HOTPLUG_RESTRICT_TO_BOOTUP_CPUS + def_bool n + prompt "Restrict CPU hotplugging to processors found during boot" if HOTPLUG_CPU + ---help--- + Say no here to use the default, which allows as many CPUs as are marked + "disabled" by ACPI or MPTABLES to be hotplugged after bootup. + + Say yes if you do not want to allow CPUs to be added after booting, for + example if you only need CPU hotplugging enabled for suspend/resume. + + If CPU_HOTPLUG is enabled this value may be overridden at boot time + with the "additional_cpus" kernel parameter. + +config HOTPLUG_ADDITIONAL_CPUS + int + default 0 if !HOTPLUG_CPU ...
ok, that description and naming makes the purpose much clearer, and it's default-disabled as well. Applied to tip/x86/core, thanks Chuck! was already in -tip, by virtue of: | commit 2bd455dbfebfd632a8dcf1d3d1612737986fde0a | Author: Li Zefan <lizf@cn.fujitsu.com> | Date: Mon Aug 4 11:26:38 2008 +0800 | | x86: remove nesting CONFIG_HOTPLUG_CPU please double-check latest tip/master nevertheless: http://people.redhat.com/mingo/tip.git/README to make sure i merged your patch correctly and that it plays well with other changes. Thanks, Ingo --
Wouldn't this be better to have a runtime option? -hpa --
yeah - and we already have the additional_cpus=x boot option, but a boot option is not generally useful to a distribution. Ingo --
You can set this with additional_cpus=... at boot time. I don't think each runtime option needs a CONFIG option too. -Andi --
On Thu, 02 Oct 2008 11:12:51 +0200 Well not all of them, but this one is a good candidate. Either that or we should just change the default to zero. --
What do you mean with single cpu mode? e.g. the lock prefix rewriting is only determined by online CPUs, not possible CPUs. And this only affects the possible ones. I'm not aware of any other special mode with num_possible_cpus() == 1, An extra possible CPU is not that costly. A 64bit kernel with my old defconfig has about 40k of per CPU data which would be duplicated. And having real hotplug always require that option would be nasty. What you could probably do if you really worry about the 40k is to do some special casing, as if you know there is only a single socket (can be determined from the APIC IDs in the ACPI tables) and the CPU is only single core then don't allocate it. For HT it is harder, there is no real way to distingush P4 Celerons with HT fused off. But frankly having such special code just for 40k savings (or likely much less on 32bit) seems a little overkill to me. -Andi -- ak@linux.intel.com --
On Thu, 2 Oct 2008 21:44:09 +0200 The prefix rewriting doesn't happen unless I boot with additional_cpus=0, maxcpus=1, or with this patch applied and the config option set. I think the rules for when/if the rewriting happens changed a while ago to avoid multiple switches and now it's not happening at all on this machine by default. Oh, and with NR_CPUS=512 I am seeing 1.6MB per-cpu data (I'll have to check that, but I remember being surprised at how big the number was.) --
Well then something is broken, but the fix is not to lower num_possible_cpus(),
but to fix the root cause.
How did you measure? And you mean total right? If it's total
then it's ~32KB/CPU which is roughly similar to my 40k number
(probably depends on CONFIG options)
The standard way is __per_cpu_start - __per_cpu_end (on 64bit;
or the other way round on 32bit). That segment is duplicated
per CPU. Ok there are some dynamic data structures that scale too,
so it's possible a little more.
Single per cpu data should not scale with NR_CPUS, but be constant.
-Andi
---
Take disabled cpus into account in alternative.c
Otherwise an UP system with one hotplug CPU will not
go into UP mode.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Index: linux-2.6.27-rc4-misc/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.27-rc4-misc.orig/arch/x86/kernel/alternative.c
+++ linux-2.6.27-rc4-misc/arch/x86/kernel/alternative.c
@@ -454,7 +454,7 @@ void __init alternative_instructions(voi
_text, _etext);
/* Only switch to UP mode if we don't immediately boot others */
- if (num_possible_cpus() == 1 || setup_max_cpus <= 1)
+ if (num_possible_cpus() - additional_cpus == 1 || setup_max_cpus <= 1)
alternatives_smp_switch(0);
}
#endif
Index: linux-2.6.27-rc4-misc/arch/x86/kernel/smpboot.c
===================================================================
--- linux-2.6.27-rc4-misc.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.27-rc4-misc/arch/x86/kernel/smpboot.c
@@ -1276,7 +1276,7 @@ static void remove_siblinginfo(int cpu)
cpu_clear(cpu, cpu_sibling_setup_map);
}
-static int additional_cpus __initdata = -1;
+int additional_cpus = -1;
static __init int setup_additional_cpus(char *s)
{
Index: linux-2.6.27-rc4-misc/include/asm-x86/smp.h
===================================================================
--- linux-2.6.27-rc4-misc.orig/include/asm-x86/smp.h
+++ ...Ping? Can you please test the patch? I think that's the correct fix. I see Ingo unfortunately merged your initial broken hack, but it's wrong and when actually used on a distribution will break real CPU hotplug there. Please don't enable that CONFIG option in Fedora. Ideally drop the CONFIG patch completely because it cannot do much good. It should be replaced with the appended patch, which should go into 2.6.27 after it is confirmed to fix the problem. Thanks. -- ak@linux.intel.com --
On Sat, 04 Oct 2008 18:52:06 +0200 Yes, it works and I don't see how it could cause any problems. Ingo, can we get this in 2.6.27? You can drop my original patch. --
looks good, applied to tip/x86/core, thanks! it's too late for v2.6.27 (and not a serious regression anyway), but i marked it for .27.1 backporting. Your other patch still makes sense - as additional_cpus is an existing boot parameter it can be specified at .config time as well. You should not enable it in a distro kernel though, if you intend to support real hotplug CPU hardware. (which is extremely rare btw.) Ingo --
No, this patch is horrible. The correct check is num_present_cpus(). There is no need to make the weird additional_cpus hackery globally available. Btw, additional_cpus has interesting properties. Providing a negative number < -1 on the kernel command line - happened due to a typo - explodes in early boot, which is not really surprising, but should be sanity checked. Thanks, tglx ----------------> Subject: x86: make UP alternatives switch depend on present cpus From: Thomas Gleixner <tglx@linutronix.de> Date: Sun, 05 Oct 2008 16:45:22 +0200 num_possible_cpus() can be > 1 when disabled CPUs have been accounted. Disabled CPUs are not in the cpu_present_map, so we can use num_present_cpus() as a safe indicator to switch to UP alternatives. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index fb04e49..a84ac7b 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -444,7 +444,7 @@ void __init alternative_instructions(void) _text, _etext); /* Only switch to UP mode if we don't immediately boot others */ - if (num_possible_cpus() == 1 || setup_max_cpus <= 1) + if (num_present_cpus() == 1 || setup_max_cpus <= 1) alternatives_smp_switch(0); } #endif --
ah, indeed! indeed, and that mess was introduced, interestingly, by this commit, three years ago, by Andi: | From 420f8f68c9c5148dddf946bebdbc7eacde2172cb Mon Sep 17 00:00:00 2001 | From: Andi Kleen <ak@suse.de> | Date: Sat, 5 Nov 2005 17:25:54 +0100 | Subject: [PATCH] [PATCH] x86_64: New heuristics to find out hotpluggable CPUs. so to clean up the mess i've removed the additional_cpus= boot parameter and the Kconfig entry as well - see the patch in x86/core below. thanks Thomas for decoding this ... and no way can any of this go into v2.6.27: this is fragile code with a lot of historic baggage and the original error is non-fatal to begin with. It can easily be backported to .27.1 if testing shows that it has no other adverse side-effects. Ingo ------------------> From a3f493ab543d300b3a01ad18bf12f73746ae5c9f Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Sun, 5 Oct 2008 17:12:36 +0200 Subject: [PATCH] x86: remove additional_cpus configurability additional_cpus=<x> parameter is dangerous and broken: for example if we boot additional_cpus=-2 on a stock dual-core system it will crash the box on bootup. So reduce the maze of code a bit by removingthe user-configurability angle. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/Kconfig | 18 ------------------ arch/x86/kernel/smpboot.c | 8 +------- 2 files changed, 1 insertions(+), 25 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7dff081..e2c060e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1365,24 +1365,6 @@ config HOTPLUG_CPU Say N if you want to disable CPU hotplug and don't need to suspend. -config HOTPLUG_RESTRICT_TO_BOOTUP_CPUS - def_bool n - prompt "Restrict CPU hotplugging to processors found during boot" if HOTPLUG_CPU - ---help--- - Say no here to use the default, which allows as many CPUs as are marked - "disabled" by ACPI or MPTABLES to be hotplugged after bootup. - - Say yes if you do ...
Please lets get rid of all this.
Thanks,
tglx
---------------->
From 344707c1f43dd0d080828497aacb60c0cc0a8c13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sun, 5 Oct 2008 17:27:56 +0200
Subject: [PATCH] x86, smpboot: remove additional_cpus
remove remainder of additional_cpus logic. We now just listen to the
disabled_cpus value like we did for years. disabled_cpus is always >=
0 so no need for an extra check.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/smpboot.c | 14 ++------------
1 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3868018..d6a4d95 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1260,8 +1260,6 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
check_nmi_watchdog();
}
-static int additional_cpus = -1;
-
/*
* cpu_possible_map should be static, it cannot change as cpu's
* are onlined, or offlined. The reason is per-cpu data-structures
@@ -1281,21 +1279,13 @@ static int additional_cpus = -1;
*/
__init void prefill_possible_map(void)
{
- int i;
- int possible;
+ int i, possible;
/* no processor from mptable or madt */
if (!num_processors)
num_processors = 1;
- if (additional_cpus == -1) {
- if (disabled_cpus > 0)
- additional_cpus = disabled_cpus;
- else
- additional_cpus = 0;
- }
-
- possible = num_processors + additional_cpus;
+ possible = num_processors + disabled_cpus;
if (possible > NR_CPUS)
possible = NR_CPUS;
--
applied to tip/x86/core - thanks Thomas! Ingo --
The result is that there's no way to override a BIOS now that doesn't declare the number of hotplug CPUs with the Linux hotplug extension. Completely trusting the BIOS seems like a bad idea. So I think you still need the command line parameter back, otherwise there's no way to override the BIOS. -Andi (who wished you guys would check with the original author before removing code you clearly don't understand) --
Sigh, could you please start thinking first before you insult me ?
Chucks problem is that the BIOS advertises
1 present CPU and 1 disabled CPU
Now the current code does not switch to UP alternatives because the
check in alternatives.c is:
if (num_possible_cpus() == 1 ...
which evaluates to false, due to:
num_possible_cpus = num_processors + additional_cpus
where additional_cpus = disabled_cpus when no command line
parameter is given, i.e. the default behaviour of the kernel.
and where num_processors = num_present_cpus()
so in Chucks case num_possible_cpus() == 2
Your proposed solution is:
exporting additional_cpus and change the check to:
if ((num_possible_cpus() - additional_cpus) == 1 ...
This is simply stupid. We have the information already in
num_present_cpus() and that's the correct check in the alternatives
code.
if (num_present_cpus() == 1 ...
This is not a question of trusting the BIOS:
If we can not trust present_cpu_map then additional_cpus does not help
at all.
additional_cpus is useless ballast.
Thanks,
tglx - who refrains from adding a nasty insulting comment
--
Yes that is correct. Using present_cpus is cleaner than my version patch (although it wasn't quite at a "mess" level imho). No argument on that. But that is not what my emails were about. They were about the removal of the additional_cpus=... parameter which was unfortunately done too. additional_cpus is used for more than just alternatives processing. It is also used to size the possible map and declare additional hotplug CPUs. Please read the Documentation cpu-hotplug-spec I referred to earlier for background. Or Documentation/cpu-hotplug.txt which has it too. additional_cpus is used to tell Linux that there are more (or less) CPUs hotpluggable than the BIOS declares at boot time. This is needed because the way the BIOS is declaring it is a Linux extension, not standard. But even if it was standard it would be good policy to be able to override it because as we all know BIOS can be wrong, as in declare far too many. For the alternatives heuristics yes. For hotplug no. Only when you assume that all hotpluggable BIOS implement the "cpu-hotplug-spec" Linux extension and always declare Feel free to speak your mind if the facts are correct. -Andi --
What mess do you mean? That not all parameters are 100% sanity checked? "Unix gives you enough rope to shot yourself into the foot." Normally people who set kernel parameters are expected to know what they are doing. That said the parameter should probably use some unsigned form of get_option, but unfortunately there isn't any. But if you describe any kernel parameter that isn't 100% sanity checked as mess I'm afraid there won't be many non messy parameters This also breaks real CPU hotplug on systems that do not follow the Documentation/x86/x86_64/cpu-hotplug-spec specification. This extension is not part of the standard ACPI spec (I invented it), so I don't think everyone requiring to follow such a Linux specific extension is a good idea. I tried to lobby a few vendors to follow this extension, but I doubt I was 100% successfull. So please readd the boot parameter. Or if you don't chose it at least modify the document clearly stating that all CPU hotplug requires a non ACPI standard extension now and that the users is screwed if their vendor doesn't follow it. In my experience CPU discovery in ACPI/mptables isn't particularly fragile. That is true, it's just a performance issue especially on systems with slow LOCK prefix (like P4s) which commonly have the bogus extra CPU in the BIOS tables when they don't support HT directly. -Andi --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST |
