Re: [patch x86/core] x86: allow number of additional hotplug CPUs to be set at compile time, V2

Previous thread: [RFC patch 5/5] genirq: make irq threading robust by Thomas Gleixner on Wednesday, October 1, 2008 - 4:02 pm. (3 messages)

Next thread: patch x86-fix-smp-alternatives-use-mutex-instead-of-spinlock-text_poke-is-sleepable.patch added to 2.6.26-stable tree by gregkh on Wednesday, October 1, 2008 - 4:59 pm. (1 message)
From: Chuck Ebbert
Date: Wednesday, October 1, 2008 - 4:19 pm

From: Chuck Ebbert <cebbert@redhat.com>

x86: allow number of additional hotplug CPUs to be set at compile time

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>

---

Index: linux-2.6.26.noarch/arch/x86/Kconfig
===================================================================
--- linux-2.6.26.noarch.orig/arch/x86/Kconfig
+++ linux-2.6.26.noarch/arch/x86/Kconfig
@@ -1366,6 +1366,24 @@ config HOTPLUG_CPU
 	  Say N if you want to disable CPU hotplug and don't need to
 	  suspend.
 
+config HOTPLUG_DEFAULT_ADDITIONAL_CPUS
+	def_bool y
+	prompt "Allocate extra CPUs for hotplugging after boot" if HOTPLUG_CPU
+	---help---
+	  Say yes here to use the default, which allows as many CPUs as are marked
+	  "disabled" by ACPI or MPTABLES to be hotplugged after bootup.
+
+	  Say no if you do not want to allow CPUs to be added after booting, for
+	  example if you only need CPU hotplugging enabled for suspend/resume.
+
+	  This value may be overridden at boot time with the "additional_cpus"
+	  kernel parameter, if CPU_HOTPLUG is enabled.
+
+config HOTPLUG_ADDITIONAL_CPUS
+	int
+	default  0 if !HOTPLUG_CPU || !HOTPLUG_DEFAULT_ADDITIONAL_CPUS
+	default ...
From: Ingo Molnar
Date: Thursday, October 2, 2008 - 1:12 am

hm, wouldnt this option kill 'real' hot-plug CPUs (how rare they might 
be) which are properly enumerated in the BIOS tables?

i dont mind having a facility to disable real CPU hotplug, but the 
CONFIG_HOTPLUG_DEFAULT_ADDITIONAL_CPUS does not spell that out clearly 
IMO. Something like CONFIG_HOTPLUG_RESTRICT_TO_BOOTUP_CPUS=y would be 
more appropriately named i think?

	Ingo
--

From: Chuck Ebbert
Date: Thursday, October 2, 2008 - 12:30 pm

From: Chuck Ebbert <cebbert@redhat.com>

x86: allow number of additional hotplug CPUs to be set at compile time, V2

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

[Changed the name of the option and the prompt according to Ingo's
 suggestion.]

---

Index: linux-2.6.26.noarch/arch/x86/Kconfig
===================================================================
--- linux-2.6.26.noarch.orig/arch/x86/Kconfig
+++ linux-2.6.26.noarch/arch/x86/Kconfig
@@ -1366,6 +1366,24 @@ config HOTPLUG_CPU
 	  Say N if you want to disable CPU hotplug and don't need to
 	  suspend.
 
+config HOTPLUG_RESTRICT_TO_BOOTUP_CPUS
+	def_bool n
+	prompt "Restrict CPU hotplugging to processors found during boot" if HOTPLUG_CPU
+	---help---
+	  Say no here to use the default, which allows as many CPUs as are marked
+	  "disabled" by ACPI or MPTABLES to be hotplugged after bootup.
+
+	  Say yes if you do not want to allow CPUs to be added after booting, for
+	  example if you only need CPU hotplugging enabled for suspend/resume.
+
+	  If CPU_HOTPLUG is enabled this value may be overridden at boot time
+	  with the "additional_cpus" kernel parameter.
+
+config HOTPLUG_ADDITIONAL_CPUS
+	int
+	default  0 if !HOTPLUG_CPU ...
From: Ingo Molnar
Date: Thursday, October 2, 2008 - 12:42 pm

ok, that description and naming makes the purpose much clearer, and it's 
default-disabled as well.

Applied to tip/x86/core, thanks Chuck!


was already in -tip, by virtue of:

| commit 2bd455dbfebfd632a8dcf1d3d1612737986fde0a
| Author: Li Zefan <lizf@cn.fujitsu.com>
| Date:   Mon Aug 4 11:26:38 2008 +0800
|
|     x86: remove nesting CONFIG_HOTPLUG_CPU

please double-check latest tip/master nevertheless:

  http://people.redhat.com/mingo/tip.git/README

to make sure i merged your patch correctly and that it plays well with 
other changes.

Thanks,

	Ingo
--

From: H. Peter Anvin
Date: Thursday, October 2, 2008 - 12:48 pm

Wouldn't this be better to have a runtime option?

	-hpa
--

From: Ingo Molnar
Date: Thursday, October 2, 2008 - 12:50 pm

yeah - and we already have the additional_cpus=x boot option, but a boot 
option is not generally useful to a distribution.

	Ingo
--

From: Andi Kleen
Date: Thursday, October 2, 2008 - 2:12 am

You can set this with additional_cpus=... at boot time.
I don't think each runtime option needs a CONFIG option too.

-Andi
--

From: Chuck Ebbert
Date: Thursday, October 2, 2008 - 12:25 pm

On Thu, 02 Oct 2008 11:12:51 +0200

Well not all of them, but this one is a good candidate.  Either that or
we should just change the default to zero.
--

From: Andi Kleen
Date: Thursday, October 2, 2008 - 12:44 pm

What do you mean with single cpu mode?

e.g. the lock prefix rewriting is only determined by online CPUs,
not possible CPUs. And this only affects the possible ones.

I'm not aware of any other special mode with num_possible_cpus() == 1,

An extra possible CPU is not that costly.  A 64bit kernel with my old
defconfig has about 40k of per CPU data which would be duplicated.
And having real hotplug always require that option would be nasty.

What you could probably do if you really worry about the 40k is to do
some special casing, as if you know there is only a single socket 
(can be determined from the APIC IDs in the ACPI tables) and the CPU
is only single core then don't allocate it.
For HT it is harder, there is no real way to distingush P4 Celerons
with HT fused off. 

But frankly having such special code just for 40k savings (or 
likely much less on 32bit) seems a little overkill to me.

-Andi
-- 
ak@linux.intel.com
--

From: Chuck Ebbert
Date: Thursday, October 2, 2008 - 1:09 pm

On Thu, 2 Oct 2008 21:44:09 +0200


The prefix rewriting doesn't happen unless I boot with additional_cpus=0,
maxcpus=1, or with this patch applied and the config option set. I think
the rules for when/if the rewriting happens changed a while ago to avoid
multiple switches and now it's not happening at all on this machine by
default.

Oh, and with NR_CPUS=512 I am seeing 1.6MB per-cpu data (I'll have to
check that, but I remember being surprised at how big the number was.)
--

From: Andi Kleen
Date: Thursday, October 2, 2008 - 1:40 pm

Well then something is broken, but the fix is not to lower num_possible_cpus(),
but to fix the root cause.


How did you measure? And you mean total right? If it's total
then it's ~32KB/CPU which is roughly similar to my 40k number
(probably depends on CONFIG options) 

The standard way is  __per_cpu_start - __per_cpu_end (on 64bit;
or the other way round on 32bit). That segment is duplicated
per CPU. Ok there are some dynamic data structures that scale too,
so it's possible a little more.

Single per cpu data should not scale with NR_CPUS, but be constant.

-Andi

---

Take disabled cpus into account in alternative.c

Otherwise an UP system with one hotplug CPU will not 
go into UP mode.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

Index: linux-2.6.27-rc4-misc/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.27-rc4-misc.orig/arch/x86/kernel/alternative.c
+++ linux-2.6.27-rc4-misc/arch/x86/kernel/alternative.c
@@ -454,7 +454,7 @@ void __init alternative_instructions(voi
 					    _text, _etext);
 
 		/* Only switch to UP mode if we don't immediately boot others */
-		if (num_possible_cpus() == 1 || setup_max_cpus <= 1)
+		if (num_possible_cpus() - additional_cpus == 1 || setup_max_cpus <= 1)
 			alternatives_smp_switch(0);
 	}
 #endif
Index: linux-2.6.27-rc4-misc/arch/x86/kernel/smpboot.c
===================================================================
--- linux-2.6.27-rc4-misc.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.27-rc4-misc/arch/x86/kernel/smpboot.c
@@ -1276,7 +1276,7 @@ static void remove_siblinginfo(int cpu)
 	cpu_clear(cpu, cpu_sibling_setup_map);
 }
 
-static int additional_cpus __initdata = -1;
+int additional_cpus = -1;
 
 static __init int setup_additional_cpus(char *s)
 {
Index: linux-2.6.27-rc4-misc/include/asm-x86/smp.h
===================================================================
--- linux-2.6.27-rc4-misc.orig/include/asm-x86/smp.h
+++ ...
From: Andi Kleen
Date: Saturday, October 4, 2008 - 9:52 am

Ping? Can you please test the patch?  I think that's the correct fix.
 
I see Ingo unfortunately merged your initial broken hack, but it's wrong
and when actually used on a distribution will break real CPU hotplug there.
Please don't enable that CONFIG option in Fedora. Ideally drop
the CONFIG patch completely because it cannot do much good.

It should be replaced with the appended patch, which should go into 2.6.27
after it is confirmed to fix the problem.

Thanks.


-- 
ak@linux.intel.com
--

From: Chuck Ebbert
Date: Saturday, October 4, 2008 - 3:30 pm

On Sat, 04 Oct 2008 18:52:06 +0200

Yes, it works and I don't see how it could cause any problems.

Ingo, can we get this in 2.6.27? You can drop my original patch.

--


looks good, applied to tip/x86/core, thanks!

it's too late for v2.6.27 (and not a serious regression anyway), but i 
marked it for .27.1 backporting.

Your other patch still makes sense - as additional_cpus is an existing 
boot parameter it can be specified at .config time as well. You should 
not enable it in a distro kernel though, if you intend to support real 
hotplug CPU hardware. (which is extremely rare btw.)

	Ingo
--

From: Thomas Gleixner
Date: Sunday, October 5, 2008 - 7:52 am

No, this patch is horrible. 

The correct check is num_present_cpus(). There is no need to make the
weird additional_cpus hackery globally available.

Btw, additional_cpus has interesting properties. Providing a negative
number < -1 on the kernel command line - happened due to a typo -
explodes in early boot, which is not really surprising, but should be
sanity checked.

Thanks,

	tglx
---------------->
Subject: x86: make UP alternatives switch depend on present cpus
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sun, 05 Oct 2008 16:45:22 +0200

num_possible_cpus() can be > 1 when disabled CPUs have been accounted.

Disabled CPUs are not in the cpu_present_map, so we can use
num_present_cpus() as a safe indicator to switch to UP alternatives.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index fb04e49..a84ac7b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -444,7 +444,7 @@ void __init alternative_instructions(void)
 					    _text, _etext);
 
 		/* Only switch to UP mode if we don't immediately boot others */
-		if (num_possible_cpus() == 1 || setup_max_cpus <= 1)
+		if (num_present_cpus() == 1 || setup_max_cpus <= 1)
 			alternatives_smp_switch(0);
 	}
 #endif
--


ah, indeed!


indeed, and that mess was introduced, interestingly, by this commit, 
three years ago, by Andi:

| From 420f8f68c9c5148dddf946bebdbc7eacde2172cb Mon Sep 17 00:00:00 2001
| From: Andi Kleen <ak@suse.de>
| Date: Sat, 5 Nov 2005 17:25:54 +0100
| Subject: [PATCH] [PATCH] x86_64: New heuristics to find out hotpluggable CPUs.

so to clean up the mess i've removed the additional_cpus= boot parameter 
and the Kconfig entry as well - see the patch in x86/core below.

thanks Thomas for decoding this ...

and no way can any of this go into v2.6.27: this is fragile code with a 
lot of historic baggage and the original error is non-fatal to begin 
with. It can easily be backported to .27.1 if testing shows that it has 
no other adverse side-effects.

	Ingo

------------------>
From a3f493ab543d300b3a01ad18bf12f73746ae5c9f Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Sun, 5 Oct 2008 17:12:36 +0200
Subject: [PATCH] x86: remove additional_cpus configurability

additional_cpus=<x> parameter is dangerous and broken: for example
if we boot additional_cpus=-2 on a stock dual-core system it will
crash the box on bootup.

So reduce the maze of code a bit by removingthe user-configurability
angle.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/Kconfig          |   18 ------------------
 arch/x86/kernel/smpboot.c |    8 +-------
 2 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7dff081..e2c060e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1365,24 +1365,6 @@ config HOTPLUG_CPU
 	  Say N if you want to disable CPU hotplug and don't need to
 	  suspend.
 
-config HOTPLUG_RESTRICT_TO_BOOTUP_CPUS
-	def_bool n
-	prompt "Restrict CPU hotplugging to processors found during boot" if HOTPLUG_CPU
-	---help---
-	  Say no here to use the default, which allows as many CPUs as are marked
-	  "disabled" by ACPI or MPTABLES to be hotplugged after bootup.
-
-	  Say yes if you do ...
From: Thomas Gleixner
Date: Sunday, October 5, 2008 - 8:51 am

Please lets get rid of all this.

Thanks,

	tglx
---------------->
From 344707c1f43dd0d080828497aacb60c0cc0a8c13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sun, 5 Oct 2008 17:27:56 +0200
Subject: [PATCH] x86, smpboot: remove additional_cpus

remove remainder of additional_cpus logic. We now just listen to the
disabled_cpus value like we did for years. disabled_cpus is always >=
0 so no need for an extra check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/smpboot.c |   14 ++------------
 1 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3868018..d6a4d95 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1260,8 +1260,6 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	check_nmi_watchdog();
 }
 
-static int additional_cpus = -1;
-
 /*
  * cpu_possible_map should be static, it cannot change as cpu's
  * are onlined, or offlined. The reason is per-cpu data-structures
@@ -1281,21 +1279,13 @@ static int additional_cpus = -1;
  */
 __init void prefill_possible_map(void)
 {
-	int i;
-	int possible;
+	int i, possible;
 
 	/* no processor from mptable or madt */
 	if (!num_processors)
 		num_processors = 1;
 
-	if (additional_cpus == -1) {
-		if (disabled_cpus > 0)
-			additional_cpus = disabled_cpus;
-		else
-			additional_cpus = 0;
-	}
-
-	possible = num_processors + additional_cpus;
+	possible = num_processors + disabled_cpus;
 	if (possible > NR_CPUS)
 		possible = NR_CPUS;
 
--


applied to tip/x86/core - thanks Thomas!

	Ingo
--


The result is that there's no way to override a BIOS now that
doesn't declare the number of hotplug CPUs with the Linux hotplug
extension.

Completely trusting the BIOS seems like a bad idea.

So I think you still need the command line parameter back, otherwise there's no
way to override the BIOS.

-Andi (who wished you guys would check with the original author
before removing code you clearly don't understand)

--

From: Thomas Gleixner
Date: Sunday, October 5, 2008 - 2:49 pm

Sigh, could you please start thinking first before you insult me ?

Chucks problem is that the BIOS advertises

1 present CPU and 1 disabled CPU

Now the current code does not switch to UP alternatives because the
check in alternatives.c is:

      if (num_possible_cpus() == 1 ...

      which evaluates to false, due to:

      num_possible_cpus = num_processors + additional_cpus

      where additional_cpus = disabled_cpus when no command line
      parameter is given, i.e. the default behaviour of the kernel.

      and where num_processors = num_present_cpus()

      so in Chucks case num_possible_cpus() == 2

Your proposed solution is:

     exporting additional_cpus  and change the check to:

     if ((num_possible_cpus() - additional_cpus) == 1 ...

This is simply stupid. We have the information already in
num_present_cpus() and that's the correct check in the alternatives
code.

     if (num_present_cpus() == 1 ...

This is not a question of trusting the BIOS: 

If we can not trust present_cpu_map then additional_cpus does not help
at all.

additional_cpus is useless ballast.

Thanks,

	tglx - who refrains from adding a nasty insulting comment
--


Yes that is correct.  Using present_cpus is cleaner than
my version patch  (although it wasn't quite at a "mess" level imho).
No argument on that. But that  is not what my emails were about.

They were about the removal of the additional_cpus=... parameter
which was unfortunately done too.

additional_cpus is used for more than just alternatives processing.
It is also used to size the possible map and declare additional
hotplug CPUs.

Please read the Documentation cpu-hotplug-spec I referred to earlier
for background. Or Documentation/cpu-hotplug.txt which has it too.

additional_cpus is used to tell Linux that there are more (or less)
CPUs hotpluggable than the BIOS declares at boot time. This is needed
because the way the BIOS is declaring it is a Linux extension, not standard.
But even if it was standard it would be good policy to be able
to override it because as we all know BIOS can be wrong, as in
declare far too many.


For the alternatives heuristics yes. For hotplug no.

Only when you assume that all hotpluggable BIOS implement
the "cpu-hotplug-spec" Linux extension and always declare

Feel free to speak your mind if the facts are correct.

-Andi

--


What mess do you mean? That not all parameters are 100% sanity checked?
"Unix gives you enough rope to shot yourself into the foot."
Normally people who set kernel parameters are expected to know what
they are doing. That said the parameter should probably use some
unsigned form of get_option, but unfortunately there isn't any.

But if you describe any kernel parameter that isn't 100% sanity
checked as mess I'm afraid there won't be many non messy parameters

This also breaks real CPU hotplug on systems that do not follow
the Documentation/x86/x86_64/cpu-hotplug-spec specification.

This extension is not part of the standard ACPI spec (I invented it),
so I don't think everyone requiring to follow such a Linux specific extension
is a good idea. I tried to lobby a few vendors to follow this
extension, but I doubt I was 100% successfull.

So please readd the boot parameter.

Or if you don't chose it at least modify the document clearly
stating that all CPU hotplug requires a non ACPI standard extension now
and that the users is screwed if their vendor doesn't follow it.


In my experience CPU discovery in ACPI/mptables isn't particularly fragile.

That is true, it's just a performance issue especially on systems
with slow LOCK prefix (like P4s) which commonly have the bogus extra
CPU in the BIOS tables when they don't support HT directly.

-Andi
--

Previous thread: [RFC patch 5/5] genirq: make irq threading robust by Thomas Gleixner on Wednesday, October 1, 2008 - 4:02 pm. (3 messages)

Next thread: patch x86-fix-smp-alternatives-use-mutex-instead-of-spinlock-text_poke-is-sleepable.patch added to 2.6.26-stable tree by gregkh on Wednesday, October 1, 2008 - 4:59 pm. (1 message)