[PATCH 03/14] [ALPHA] Populate cpu_enabled_map

Previous thread: DMA timeout on PIIX4 by gshan on Monday, July 14, 2008 - 7:17 pm. (1 message)

Next thread: [PATCH 10/14] [SPARC] Populate cpu_enabled_map by Alex Chiang on Monday, July 14, 2008 - 7:55 pm. (1 message)
From: Alex Chiang
Date: Monday, July 14, 2008 - 7:33 pm

The following series presents cpu_enabled_map, which captures
the concept of a physically present and enabled CPU.

The complement of this map gives us present but disabled CPUs.
Perhaps the CPU was disabled by firmware for whatever reason.

For the most part, cpu_enabled_map will follow cpu_present_map.
On x86 and ia64, ACPI can tell us if a CPU is present, enabled,
or both. On those two archs, cpu_enabled_map is a subset of
cpu_present_map. All CPUs described in the MADT are now added
to cpu_present_map, but only ones with ACPI_MADT_ENABLED are
added to cpu_enabled_map.

Other archs may wish to do something similar, so some minimum
levels of support are included with this patch series, enabling
others to fill things in as desired. I certainly acknowledge
that a better job could be done.

Patch 01 introduces the actual map and interface.

Patches 02 -- 12 modify arch code in an attempt to populate the
map correctly.

Patch 12 is the most interesting one, in that we actually
demonstrate how an arch [ia64] would actually use cpu_enabled_map
to control calling cpu_up() on a given CPU.

Patch 13 fixes a potential overflow bug in ia64 ACPI code.

Patch 14 is the money patch. It demonstrates why we might
want to go through all these gyrations. Now that ia64 presents
*all* physically present CPUs in sysfs, even if they have been
disabled by firmware, we give userspace a way to poke at those
CPUs.

This is possible because present CPUs are in the ACPI namespace.
Even though we might not have a valid per_cpu pointer for a
disabled CPU (because we never called cpu_up() on it), we can
still obtain a valid ACPI handle for that CPU by walking the
namespace and executing methods underneath it.

The big picture implication is that we can allow userspace
to interact with disabled CPUs. In this particular example,
we provide a knob that lets a sysadmin schedule any present
CPU for firmware deconfiguration or enablement.

I have compile tested on:

	- arm, sparc, powerpc, alpha, ...
From: Alex Chiang
Date: Monday, July 14, 2008 - 7:33 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
---

 arch/m32r/kernel/smpboot.c |    1 +
 include/asm-m32r/smp.h     |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/m32r/kernel/smpboot.c b/arch/m32r/kernel/smpboot.c
index 2c03ac1..4f2bbde 100644
--- a/arch/m32r/kernel/smpboot.c
+++ b/arch/m32r/kernel/smpboot.c
@@ -184,6 +184,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 		physid_set(phys_id, phys_cpu_present_map);
 #ifndef CONFIG_HOTPLUG_CPU
 	cpu_present_map = cpu_possible_map;
+	cpu_enabled_map = cpu_possible_map;
 #endif
 
 	show_mp_info(nr_cpu);
diff --git a/include/asm-m32r/smp.h b/include/asm-m32r/smp.h
index 078e1a5..4ea1845 100644
--- a/include/asm-m32r/smp.h
+++ b/include/asm-m32r/smp.h
@@ -65,6 +65,7 @@ extern volatile int cpu_2_physid[NR_CPUS];
 extern cpumask_t cpu_callout_map;
 extern cpumask_t cpu_possible_map;
 extern cpumask_t cpu_present_map;
+extern cpumask_t cpu_enabled_map;
 
 static __inline__ int hard_smp_processor_id(void)
 {

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:33 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Richard Henderson <rth@twiddle.net>
---

 arch/alpha/kernel/process.c |    2 ++
 arch/alpha/kernel/smp.c     |    2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 96ed82f..7a261e2 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -94,6 +94,7 @@ common_shutdown_1(void *generic_ptr)
 		flags |= 0x00040000UL; /* "remain halted" */
 		*pflags = flags;
 		cpu_clear(cpuid, cpu_present_map);
+		cpu_clear(cpuid, cpu_enabled_map);
 		halt();
 	}
 #endif
@@ -120,6 +121,7 @@ common_shutdown_1(void *generic_ptr)
 #ifdef CONFIG_SMP
 	/* Wait for the secondaries to halt. */
 	cpu_clear(boot_cpuid, cpu_present_map);
+	cpu_clear(boot_cpuid, cpu_enabled_map);
 	while (cpus_weight(cpu_present_map))
 		barrier();
 #endif
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 2525692..ec53061 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -436,6 +436,7 @@ setup_smp(void)
 			if ((cpu->flags & 0x1cc) == 0x1cc) {
 				smp_num_probed++;
 				cpu_set(i, cpu_present_map);
+				cpu_set(i, cpu_enabled_map);
 				cpu->pal_revision = boot_cpu_palrev;
 			}
 
@@ -469,6 +470,7 @@ smp_prepare_cpus(unsigned int max_cpus)
 	/* Nothing to do on a UP box, or when told not to.  */
 	if (smp_num_probed == 1 || max_cpus == 0) {
 		cpu_present_map = cpumask_of_cpu(boot_cpuid);
+		cpu_enabled_map = cpumask_of_cpu(boot_cpuid);
 		printk(KERN_INFO "SMP mode deactivated.\n");
 		return;
 	}

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@dyn-67.arm.linux.org.uk>
---

 arch/arm/mach-realview/platsmp.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mach-realview/platsmp.c b/arch/arm/mach-realview/platsmp.c
index 8e813ed..c33f298 100644
--- a/arch/arm/mach-realview/platsmp.c
+++ b/arch/arm/mach-realview/platsmp.c
@@ -241,8 +241,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	 * Initialise the present map, which describes the set of CPUs
 	 * actually populated at the present time.
 	 */
-	for (i = 0; i < max_cpus; i++)
+	for (i = 0; i < max_cpus; i++) {
 		cpu_set(i, cpu_present_map);
+		cpu_set(i, cpu_enabled_map);
+	}
 
 	/*
 	 * Initialise the SCU if there are more than one CPU and let

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
---

 arch/mips/kernel/smp.c  |    1 +
 arch/mips/kernel/smtc.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index cdf87a9..8f7f742 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -304,6 +304,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	set_cpu_sibling_map(0);
 #ifndef CONFIG_HOTPLUG_CPU
 	cpu_present_map = cpu_possible_map;
+	cpu_enabled_map = cpu_possible_map;
 #endif
 }
 
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 3e86318..b0c16e0 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -498,6 +498,7 @@ void mipsmt_prepare_cpus(void)
 	while (tc < (((val & MVPCONF0_PTC) >> MVPCONF0_PTC_SHIFT) + 1)) {
 		cpu_clear(tc, phys_cpu_present_map);
 		cpu_clear(tc, cpu_present_map);
+		cpu_clear(tc, cpu_enabled_map);
 		tc++;
 	}
 

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
---

 arch/parisc/kernel/processor.c |    1 +
 arch/parisc/kernel/smp.c       |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/parisc/kernel/processor.c b/arch/parisc/kernel/processor.c
index 370086f..fb81222 100644
--- a/arch/parisc/kernel/processor.c
+++ b/arch/parisc/kernel/processor.c
@@ -201,6 +201,7 @@ static int __cpuinit processor_probe(struct parisc_device *dev)
 #ifdef CONFIG_SMP
 	if (cpuid) {
 		cpu_set(cpuid, cpu_present_map);
+		cpu_set(cpuid, cpu_enabled_map);
 		cpu_up(cpuid);
 	}
 #endif
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 85fc775..a31b0ad 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -535,6 +535,7 @@ void __devinit smp_prepare_boot_cpu(void)
 
 	cpu_set(bootstrap_processor, cpu_online_map);
 	cpu_set(bootstrap_processor, cpu_present_map);
+	cpu_set(bootstrap_processor, cpu_enabled_map);
 }
 
 
@@ -546,7 +547,9 @@ void __devinit smp_prepare_boot_cpu(void)
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
 	cpus_clear(cpu_present_map);
+	cpus_clear(cpu_enabled_map);
 	cpu_set(0, cpu_present_map);
+	cpu_set(0, cpu_enabled_map);
 
 	parisc_max_cpus = max_cpus;
 	if (!max_cpus)

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
---

 arch/s390/kernel/smp.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 5d4fa4b..f7ae20b 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -458,6 +458,7 @@ static int smp_rescan_cpus_sigp(cpumask_t avail)
 		if (!cpu_stopped(logical_cpu))
 			continue;
 		cpu_set(logical_cpu, cpu_present_map);
+		cpu_set(logical_cpu, cpu_enabled_map);
 		smp_cpu_state[logical_cpu] = CPU_STATE_CONFIGURED;
 		logical_cpu = next_cpu(logical_cpu, avail);
 		if (logical_cpu == NR_CPUS)
@@ -490,6 +491,7 @@ static int smp_rescan_cpus_sclp(cpumask_t avail)
 		__cpu_logical_map[logical_cpu] = cpu_id;
 		smp_cpu_polarization[logical_cpu] = POLARIZATION_UNKNWN;
 		cpu_set(logical_cpu, cpu_present_map);
+		cpu_set(logical_cpu, cpu_enabled_map);
 		if (cpu >= info->configured)
 			smp_cpu_state[logical_cpu] = CPU_STATE_STANDBY;
 		else
@@ -844,6 +846,7 @@ void __init smp_prepare_boot_cpu(void)
 
 	current_thread_info()->cpu = 0;
 	cpu_set(0, cpu_present_map);
+	cpu_set(0, cpu_enabled_map);
 	cpu_set(0, cpu_online_map);
 	S390_lowcore.percpu_offset = __per_cpu_offset[0];
 	current_set[0] = current;
@@ -1104,8 +1107,10 @@ int __ref smp_rescan_cpus(void)
 	cpus_andnot(newcpus, cpu_present_map, newcpus);
 	for_each_cpu_mask(cpu, newcpus) {
 		rc = smp_add_present_cpu(cpu);
-		if (rc)
+		if (rc) {
 			cpu_clear(cpu, cpu_present_map);
+			cpu_clear(cpu, cpu_enabled_map);
+		}
 	}
 	rc = 0;
 out:

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---

 arch/sh/kernel/smp.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 5d039d1..b91fdfa 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -60,6 +60,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
 #ifndef CONFIG_HOTPLUG_CPU
 	cpu_present_map = cpu_possible_map;
+	cpu_enabled_map = cpu_possible_map;
 #endif
 }
 

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Modify MADT local SAPIC parsing such that:

	- all present CPUs added to cpu_present_map
	- all enabled CPUs added to cpu_enabled_map

This change allows us to check during __cpu_up() if we should
actually bring up the CPU. That is, if a CPU is present, but not
enabled by firmware, we should not bring it up.

Contrariwise, by creating a sysfs interface for a disabled CPU,
we provide a hook that allows a user to interact with the CPU.

The most visible user interface change with this patch is that more
sysfs entries will appear in /sys/devices/system/cpu/cpuN/ on
multi-threaded systems with threads turned off.

The actual directories will be empty. A later patch in this series
will provide an example of using this new hook to provide a user
interface for disabled CPUs.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
---

 arch/ia64/kernel/acpi.c    |   16 +++++++++-------
 arch/ia64/kernel/smpboot.c |    7 +++++++
 include/asm-ia64/smp.h     |    1 +
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 43687cc..1b4b338 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -212,13 +212,14 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end)
 
 	/*Skip BAD_MADT_ENTRY check, as lsapic size could vary */
 
-	if (lsapic->lapic_flags & ACPI_MADT_ENABLED) {
 #ifdef CONFIG_SMP
-		smp_boot_data.cpu_phys_id[available_cpus] =
-		    (lsapic->id << 8) | lsapic->eid;
+	smp_boot_data.cpu_phys_id[available_cpus] =
+	    (lsapic->id << 8) | lsapic->eid;
+
+	smp_boot_data.cpu_enabled[available_cpus] =
+		lsapic->lapic_flags & ACPI_MADT_ENABLED;
 #endif
-		++available_cpus;
-	}
+	++available_cpus;
 
 	total_cpus++;
 	return 0;
@@ -872,8 +873,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
 
 	lsapic = (struct acpi_madt_local_sapic *)obj->buffer.pointer;
 
-	if ((lsapic->header.type != ACPI_MADT_TYPE_LOCAL_SAPIC) ||
-	 ...
From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.

We need to update the signature of acpi_map_cpu2node() too.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
---

 arch/ia64/kernel/acpi.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 1b4b338..a928b94 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -775,7 +775,7 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 static
-int acpi_map_cpu2node(acpi_handle handle, int cpu, long physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	int pxm_id;
@@ -855,8 +855,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
 	union acpi_object *obj;
 	struct acpi_madt_local_sapic *lsapic;
 	cpumask_t tmp_map;
-	long physid;
-	int cpu;
+	int cpu, physid;
 
 	if (ACPI_FAILURE(acpi_evaluate_object(handle, "_MAT", NULL, &buffer)))
 		return -EINVAL;

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:33 pm

Currently, the following cpu maps exist:

	cpu_possible_map - map of populatable CPUs
	cpu_present_map  - map of populated CPUs
	cpu_online_map   - map of schedulable CPUs

These maps do not provide the concept of populated, but disabled CPUs.

That is, a system may contain CPU modules that are physically plugged
in, but disabled by system firmware. The existence of this class of
CPUs breaks the following assumption in smp_init():

        for_each_present_cpu(cpu) {

		.../...

                if (!cpu_online(cpu))
                        cpu_up(cpu);
        }

The assumption is that the kernel should attempt cpu_up() on every
physically populated CPU, which may not be desirable for present but
disabled CPUs.

By providing cpu_enabled_map, we can keep the above [simplifying]
assumption in smp_init(), and push the knowledge of disabled CPUs
and the decision to bring them up, down into arch specific code.

Signed-off-by: Alex Chiang <achiang@hp.com>
---

 drivers/base/cpu.c      |    9 +++++++--
 include/linux/cpumask.h |   31 ++++++++++++++++++++++++-------
 init/main.c             |    1 +
 kernel/sched.c          |   16 ++++++++++++----
 4 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e38dfed..bc300df 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -57,7 +57,10 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);
 
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
-	sysdev_create_file(&cpu->sysdev, &attr_online);
+	int logical_cpu = cpu->sysdev.id;
+
+	if (cpu_isset(logical_cpu, cpu_enabled_map))
+		sysdev_create_file(&cpu->sysdev, &attr_online);
 }
 void unregister_cpu(struct cpu *cpu)
 {
@@ -125,11 +128,13 @@ struct sysdev_class_attribute attr_##type##_map = 			\
 print_cpus_func(online);
 print_cpus_func(possible);
 print_cpus_func(present);
+print_cpus_func(enabled);
 
 struct sysdev_class_attribute *cpu_state_attr[] = {
 	&attr_online_map,
 ...
From: Matthew Wilcox
Date: Monday, July 14, 2008 - 8:15 pm

I don't understand why we want to know about these CPUs.  Surely they
should be 'possible', but not 'present'?  What useful thing can Linux do
with them?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Andi Kleen
Date: Tuesday, July 15, 2008 - 3:03 am

He explained it in the intro, near the end (I nearly complained about
this too when I hadn't finished reading it completely :):

|The big picture implication is that we can allow userspace
|to interact with disabled CPUs. In this particular example,
|we provide a knob that lets a sysadmin schedule any present
|CPU for firmware deconfiguration or enablement.

The reason sounds pretty exotic, but ok.

-Andi
--

From: Russell King
Date: Tuesday, July 15, 2008 - 3:21 am

I don't see why this needs to be cross architecture then - shouldn't
the generic kernel only be concerning itself with things that are
possible, present and/or online?

If you have an interface which allows you to change the machines
configuration in a machine specific way, shouldn't that be something
for that machine to support and forced upon the entire kernel?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 10:57 am

I suppose that's a fair statement. Touching all the archs for
something 'exotic' like this does seem to be a bit of an
overkill.

My thought was that big SMP systems like ia64, possibly sparc and
ppc, and increasingly, x86, might find something like this
useful, as systems get larger and larger, and vendors are going
to want to do RAS-ish features, like the ability to keep CPUs in
firmware across reboots until told otherwise by the sysadmin.

Right now, a 'present' CPU strongly implies 'online' as well,
since we're calling cpu_up() for all 'present' CPUs in
smp_init(). But this hurts if:

	- you don't actually want to bring up all 'present' CPUs
	- you still want to interact with these weirdo zombie
	  CPUs that are 'present' but not 'online'

That second item refers to creating a sysfs interface for each
'present' CPU in topology_init().

This feature puts a tax on smaller archs like arm, but maybe I
could be smarter about it by using a 

#define cpu_enabled_mask cpu_online_mask


I think that the generic kernel would be the appropriate place to
create a place for these zombie CPUs, and give the vendor specific
stuff a way to hook in.

I'd be interested in learning if any of the other 'big' archs
would have a use for something like this.

Thanks.

/ac

--

From: Matthew Wilcox
Date: Tuesday, July 15, 2008 - 11:16 am

Have you considered simply failing __cpu_up() for CPUs that are
deconfigured by firmware?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Russell King
Date: Tuesday, July 15, 2008 - 11:48 am

But what if you want to have a system boot with, say, 4 CPUs and
then decide at run time to bring up another 4 CPUs when required?

How about having smp_init() call into arch code to query whether
it should bring up a not-already-online CPU?  Architectures that
want to do something special can then make the decision there and
everyone else can define the test completely away.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 12:15 pm

So this is exactly what I'm doing. The ia64 patch has this hunk:

@@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
        if (cpu_isset(cpu, cpu_callin_map))
                return -EINVAL;

+       if (!cpu_isset(cpu, cpu_enabled_map))
+               return -EINVAL;
+
        per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
        /* Processor goes to start_secondary(), sets online flag */
        ret = do_boot_cpu(sapicid, cpu);

That was the easiest, most-straightforward solution I could think
of. If you have an idea for a version with lower taxes (doesn't
touch all the archs or can be #define'd out), I'm happy to hear
it.

Thanks.

/ac

--

From: Russell King
Date: Friday, July 18, 2008 - 2:44 pm

I think I did make a suggestion in the bit you quote from me above.

Let me be more explicit:

static void __init smp_init(void)
{
        unsigned int cpu;

        /* FIXME: This should be done in userspace --RR */
        for_each_present_cpu(cpu) {
                if (num_online_cpus() >= setup_max_cpus)
                        break;
-		if (!cpu_online(cpu))
+		if (smp_cpu_enabled(cpu) && !cpu_online(cpu))
                        cpu_up(cpu);
        }

        /* Any cleanup work */
        printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
        smp_cpus_done(setup_max_cpus);
}

and have architectures provide 'smp_cpu_enabled(cpu)' which can either
be a function, inline function or a macro (and therefore possible to be
completely eliminated.)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--

From: Alex Chiang
Date: Friday, July 18, 2008 - 4:08 pm

Yup, this is nicer. I'll try this.

/ac

--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 6:11 pm

I experimented today with an ia64-only solution, keeping track of
'present' vs 'enabled' vs 'online' all in arch-specific code.

The arch-specific stuff turns out to be more or less a wash; that
is, it's not too hard to keep it all in ia64.

However, the problem is, I would still need a generic
'enabled_map' to control whether 'online' and 'crash_notes'
entries get created for /sys/devices/system/cpu/cpuN/.

So if other archs are at least neutral on this class of CPUs, I
can work on another patchset that lowers the tax to a simple
#define for archs that don't care.

But if people hate this idea of a new map, I'd like to know so
that I'm not wasting my time and can work on a different solution
(what that would be, I have no idea at the moment).

Thanks.

/ac

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/powerpc/kernel/setup-common.c           |    1 +
 arch/powerpc/platforms/powermac/smp.c        |    1 +
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    2 ++
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index db540ea..a4c894a 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -413,6 +413,7 @@ void __init smp_setup_cpu_maps(void)
 			DBG("    thread %d -> cpu %d (hard id %d)\n",
 			    j, cpu, intserv[j]);
 			cpu_set(cpu, cpu_present_map);
+			cpu_set(cpu, cpu_enabled_map);
 			set_hard_smp_processor_id(cpu, intserv[j]);
 			cpu_set(cpu, cpu_possible_map);
 			cpu++;
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index cb2d894..a74dada 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -317,6 +317,7 @@ static int __init smp_psurge_probe(void)
 		ncpus = NR_CPUS;
 	for (i = 1; i < ncpus ; ++i) {
 		cpu_set(i, cpu_present_map);
+		cpu_set(i, cpu_enabled_map);
 		set_hard_smp_processor_id(i, i);
 	}
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1f03248..e738b07 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -186,6 +186,7 @@ static int pseries_add_processor(struct device_node *np)
 	for_each_cpu_mask(cpu, tmp) {
 		BUG_ON(cpu_isset(cpu, cpu_present_map));
 		cpu_set(cpu, cpu_present_map);
+		cpu_set(cpu, cpu_enabled_map);
 ...
From: Benjamin Herrenschmidt
Date: Monday, July 14, 2008 - 10:51 pm

Care to educate me on the difference between online_map and
enabled_map ?

Cheers,

--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 6:04 pm

enabled_map is closer conceptually to present_map than
online_map.

present_map are CPUs that are actually plugged in

online_map are CPUs that have had cpu_up() called on them; ie.
schedulable

enabled_map is somewhere inbetween -- the CPUs are plugged in,
but we don't want to cpu_up() them. On hp ia64 systems, these
CPUs are disabled by system firmware.

Currently, a user can only configure/deconfigure the CPUs from
the system firmware interface. By providing a sysfs interface for
these CPUs, we can allow the user to configure/deconfigure them
from userspace. More realistically, higher level managability
software now has an OS-level interface to interact with these
CPUs.

Might this be useful for ppc and your hypervisor based
architecture? I could imagine your hypervisor telling the kernel
about all the physically present CPUs, but then you would be able
to have finer grained control using the enabled_map.

I haven't studied your code in depth, so maybe you can just do
everything with pure online/offline, but at least on my
platforms, there are use-cases where we might want something
in-between.

Thanks.

/ac

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:34 pm

Provide a new sysfs interface for CPU deconfiguration.

Since no vendors can agree on terminology for related but slightly
different features, provide a method for a platform to implement
its own version of what it thinks 'deconfiguring' a CPU might be.

Provide an HP-specific CPU deconfiguration implementation.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

 drivers/acpi/Kconfig                 |   18 ++
 drivers/acpi/Makefile                |    4 
 drivers/acpi/processor_core.c        |    8 +
 drivers/acpi/processor_deconfigure.c |  275 ++++++++++++++++++++++++++++++++++
 include/acpi/processor.h             |    6 +
 5 files changed, 311 insertions(+), 0 deletions(-)
 create mode 100644 drivers/acpi/processor_deconfigure.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index c52fca8..36ad177 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -188,6 +188,24 @@ config ACPI_HOTPLUG_CPU
 	select ACPI_CONTAINER
 	default y
 
+config ACPI_DECONFIGURE_CPU
+	bool "Processor deconfiguration"
+	depends on ACPI_PROCESSOR
+	default n
+	help
+	  This processor driver submodule allows a user to mark a CPU
+	  for firmware disabling/enabling. It will create the following
+	  sysfs file:
+	  
+	  /sys/devices/system/cpu/cpuN/deconfigure
+	  
+	  Behavior of this interface is highly vendor-dependent and
+	  requires firmware support.
+
+	  This option is NOT required for CPU hotplug support.
+
+	  If unsure, say N.
+
 config ACPI_THERMAL
 	tristate "Thermal Zone"
 	depends on ACPI_PROCESSOR
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 40b0fca..92a5037 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -35,6 +35,10 @@ ifdef CONFIG_CPU_FREQ
 processor-objs	+= processor_perflib.o
 endif
 
+ifdef CONFIG_ACPI_DECONFIGURE_CPU
+processor-objs	+= processor_deconfigure.o
+endif
+
 obj-y				+= sleep/
 obj-y				+= bus.o glue.o
 obj-y				+= scan.o
diff --git ...
From: Andi Kleen
Date: Tuesday, July 15, 2008 - 10:06 am

Why are you ccing this to linux-arch? Dropped.

What is the standard status of these new SCFG and ECFG tables? Have they
been submitted for possible inclusion in ACPI? And is there a spec
available? I can't say I'm really thrilled with having HP specific
support in there.

It would be better at least if you could reserve the table names and 
then drop the HP DMI check. This is needed anyways, otherwise the 

Now that seems like weird semantics for a public fixed API. What happens
when some other vendor adds hot deconfiguration?

My feeling is that this seems to be overly specific to your BIOS
and might better belong into some separate management tool. At least
until we can define a nice general API for this with clear semantics. 
For what systems is this anyways?

-Andi

--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 11:40 am

Hm, sorry.

I thought it would have been weird to send patches 1-13 / 14 to
linux-arch, but not send 14 / 14. Perhaps I should set a Reply-to:

These are not new tables -- they are methods that live underneath
processor objects in the namespace.

Yes, they are specific to HP, but because they are methods, there
shouldn't be any collision with other vendors defining methods

Yeah, I'm not totally happy with it either. But I'd like to
clarify -- are you concerned with the name of the interface
(deconfigure) or the "nothing happens until next boot" behavior?

Would it help if I renamed it to "enabled" and had something
like:

	echo 0 > enabled
	echo 1 > enabled
	cat enabled

And then that would map to vendor-specific behavior?

Or is it really the "nothing happens until next boot" thing that

Well, life would be a lot easier if we had a generic way to poke
at ACPI methods, but dev_acpi has been rejected multiple times.
;)

On a more serious note, my fear is that an interface like this is
not going to have agreement / clear semantics for a long time
because one vendor is going to want to call it 'deconfigured' and
another one might want to call it 'disabled' and a third might
want to call it 'puppy_dogs' and they'll all be kinda related but
not exactly the same.

That's why I was going down the road of creating at least one
generic interface, but with vendor-specific semantics.


We have HP ia64 systems shipping today that support this.

Thanks.

/ac

--

From: Alex Chiang
Date: Monday, July 14, 2008 - 7:56 pm

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

	/sys/devices/system/cpu/

It should be populated correctly.

There will be a user-visible change under the above directory.
cpuN/ entries for firmware-disabled CPUs will now appear, whereas
before, they did not due to a check against ACPI_MADT_ENABLED.

The cpuN/ entries will be empty, and the online file in the
above directory will reflect which CPUs are actually schedulable.

Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---

 arch/x86/kernel/acpi/boot.c         |    7 +++++--
 arch/x86/kernel/apic_32.c           |    1 +
 arch/x86/kernel/apic_64.c           |    1 +
 arch/x86/kernel/smpboot.c           |    2 ++
 arch/x86/mach-voyager/voyager_smp.c |    2 ++
 arch/x86/xen/smp.c                  |    1 +
 6 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 33c5216..c6dc5da 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -559,8 +559,7 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)
 
 	lapic = (struct acpi_madt_local_apic *)obj->buffer.pointer;
 
-	if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC ||
-	    !(lapic->lapic_flags & ACPI_MADT_ENABLED)) {
+	if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC) {
 		kfree(buffer.pointer);
 		return -EINVAL;
 	}
@@ -584,6 +583,9 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)
 		return -EINVAL;
 	}
 
+	if (lapic->lapic_flags & ACPI_MADT_ENABLED)
+		cpu_set(cpu, cpu_enabled_map);
+
 	cpu = first_cpu(new_map);
 
 	*pcpu = cpu;
@@ -601,6 +603,7 @@ int acpi_unmap_lsapic(int cpu)
 {
 	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	cpu_clear(cpu, cpu_present_map);
+	cpu_clear(cpu, ...
From: H. Peter Anvin
Date: Friday, July 18, 2008 - 1:00 pm

From an x86 standpoint this patchset seems reasonable to me.

Acked-by: H. Peter Anvin <hpa@zytor.com>

Since it is a panarch patchset it presumably should go via -mm rather 
than in the arch trees, so I'm not going to add it to -tip.

Obviously, if the sematics of the operations don't make sense for other 
architectures -- which I will leave up to the affected maintainers -- 
then that should be carefully considered if the generic operations can 
be done better.

	-hpa
--

From: Alex Chiang
Date: Friday, July 18, 2008 - 4:06 pm

Thanks Peter. Let me try and rework the patchset according to
Russell's suggestion here:

	http://lkml.org/lkml/2008/7/18/467


Russell's solution avoids the issue with the ability to #define
the check away for archs that don't care.

cheers,

/ac

--

From: Luck, Tony
Date: Tuesday, July 15, 2008 - 1:10 pm

PiBQYXRjaCAxNCBpcyB0aGUgbW9uZXkgcGF0Y2guIEl0IGRlbW9uc3RyYXRlcyB3aHkgd2UgbWln
aHQNCj4gd2FudCB0byBnbyB0aHJvdWdoIGFsbCB0aGVzZSBneXJhdGlvbnMuIE5vdyB0aGF0IGlh
NjQgcHJlc2VudHMNCj4gKmFsbCogcGh5c2ljYWxseSBwcmVzZW50IENQVXMgaW4gc3lzZnMsIGV2
ZW4gaWYgdGhleSBoYXZlIGJlZW4NCj4gZGlzYWJsZWQgYnkgZmlybXdhcmUsIHdlIGdpdmUgdXNl
cnNwYWNlIGEgd2F5IHRvIHBva2UgYXQgdGhvc2UNCj4gQ1BVcy4NCg0KVGhlcmUncyBvbmx5IHRo
ZSBvbmUgYml0IGZvciAiZGlzYWJsZWQgYnkgZmlybXdhcmUiIC4uLiBubyBleHRyYQ0Kc3BhY2Ug
Zm9yIGFueSBleHRyYSBpbmZvcm1hdGlvbi4gIEhvdyB3b3VsZCB1c2Vyc3BhY2Uga25vdyB0aGF0
DQppdCB3YXMgc2FmZSB0byBwb2tlIGF0IGEgZGlzYWJsZWQgY3B1PyAgUGVyaGFwcyBmaXJtd2Fy
ZSBkaXNhYmxlZA0KaXQgZm9yIHNvbWUgdmVyeSBnb29kIHJlYXNvbiwgYW5kIHBva2luZyBhdCBp
dCBjb3VsZCBjYXVzZSBzeXN0ZW0NCmluc3RhYmlsaXR5Lg0KDQotVG9ueQ0K
--

From: Alex Chiang
Date: Tuesday, July 15, 2008 - 4:54 pm

I didn't include linux-ia64 originally. Sorry about that.

Here is the 00/14 cover email describing the patch series:

	http://lkml.org/lkml/2008/7/14/468

Here is the 12/14 ia64 specific bit:

	http://lkml.org/lkml/2008/7/14/478

Here is the 14/14 patch that Tony is referring to:

	http://lkml.org/lkml/2008/7/14/482


My thought here was that it would be a vendor-specific thing. In
patch 14/14 I created:

	/sys/devices/system/cpu/cpuN/deconfigure

(although /sys/device/system/cpu/cpuN/enabled would probably be
better)

I set up 'deconfigure' to have different implementations based on
a DMI, so it is very much an opt-in (especially since it's a
Kconfig option).

It would be the responsibility of the vendor to provide something
safe to poke at. In the sample implementation I gave, nothing
happens to the system until the *next* reboot, so it shouldn't
cause the current boot any distress.

A different implementation of deconfigure/enabled could return
an error to userspace if an operation was unsafe.

/ac

--

Previous thread: DMA timeout on PIIX4 by gshan on Monday, July 14, 2008 - 7:17 pm. (1 message)

Next thread: [PATCH 10/14] [SPARC] Populate cpu_enabled_map by Alex Chiang on Monday, July 14, 2008 - 7:55 pm. (1 message)