The following series presents cpu_enabled_map, which captures the concept of a physically present and enabled CPU. The complement of this map gives us present but disabled CPUs. Perhaps the CPU was disabled by firmware for whatever reason. For the most part, cpu_enabled_map will follow cpu_present_map. On x86 and ia64, ACPI can tell us if a CPU is present, enabled, or both. On those two archs, cpu_enabled_map is a subset of cpu_present_map. All CPUs described in the MADT are now added to cpu_present_map, but only ones with ACPI_MADT_ENABLED are added to cpu_enabled_map. Other archs may wish to do something similar, so some minimum levels of support are included with this patch series, enabling others to fill things in as desired. I certainly acknowledge that a better job could be done. Patch 01 introduces the actual map and interface. Patches 02 -- 12 modify arch code in an attempt to populate the map correctly. Patch 12 is the most interesting one, in that we actually demonstrate how an arch [ia64] would actually use cpu_enabled_map to control calling cpu_up() on a given CPU. Patch 13 fixes a potential overflow bug in ia64 ACPI code. Patch 14 is the money patch. It demonstrates why we might want to go through all these gyrations. Now that ia64 presents *all* physically present CPUs in sysfs, even if they have been disabled by firmware, we give userspace a way to poke at those CPUs. This is possible because present CPUs are in the ACPI namespace. Even though we might not have a valid per_cpu pointer for a disabled CPU (because we never called cpu_up() on it), we can still obtain a valid ACPI handle for that CPU by walking the namespace and executing methods underneath it. The big picture implication is that we can allow userspace to interact with disabled CPUs. In this particular example, we provide a knob that lets a sysadmin schedule any present CPU for firmware deconfiguration or enablement. I have compile tested on: - arm, sparc, powerpc, alpha, ...
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
---
arch/m32r/kernel/smpboot.c | 1 +
include/asm-m32r/smp.h | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/m32r/kernel/smpboot.c b/arch/m32r/kernel/smpboot.c
index 2c03ac1..4f2bbde 100644
--- a/arch/m32r/kernel/smpboot.c
+++ b/arch/m32r/kernel/smpboot.c
@@ -184,6 +184,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
physid_set(phys_id, phys_cpu_present_map);
#ifndef CONFIG_HOTPLUG_CPU
cpu_present_map = cpu_possible_map;
+ cpu_enabled_map = cpu_possible_map;
#endif
show_mp_info(nr_cpu);
diff --git a/include/asm-m32r/smp.h b/include/asm-m32r/smp.h
index 078e1a5..4ea1845 100644
--- a/include/asm-m32r/smp.h
+++ b/include/asm-m32r/smp.h
@@ -65,6 +65,7 @@ extern volatile int cpu_2_physid[NR_CPUS];
extern cpumask_t cpu_callout_map;
extern cpumask_t cpu_possible_map;
extern cpumask_t cpu_present_map;
+extern cpumask_t cpu_enabled_map;
static __inline__ int hard_smp_processor_id(void)
{
--
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Richard Henderson <rth@twiddle.net>
---
arch/alpha/kernel/process.c | 2 ++
arch/alpha/kernel/smp.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 96ed82f..7a261e2 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -94,6 +94,7 @@ common_shutdown_1(void *generic_ptr)
flags |= 0x00040000UL; /* "remain halted" */
*pflags = flags;
cpu_clear(cpuid, cpu_present_map);
+ cpu_clear(cpuid, cpu_enabled_map);
halt();
}
#endif
@@ -120,6 +121,7 @@ common_shutdown_1(void *generic_ptr)
#ifdef CONFIG_SMP
/* Wait for the secondaries to halt. */
cpu_clear(boot_cpuid, cpu_present_map);
+ cpu_clear(boot_cpuid, cpu_enabled_map);
while (cpus_weight(cpu_present_map))
barrier();
#endif
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 2525692..ec53061 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -436,6 +436,7 @@ setup_smp(void)
if ((cpu->flags & 0x1cc) == 0x1cc) {
smp_num_probed++;
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
cpu->pal_revision = boot_cpu_palrev;
}
@@ -469,6 +470,7 @@ smp_prepare_cpus(unsigned int max_cpus)
/* Nothing to do on a UP box, or when told not to. */
if (smp_num_probed == 1 || max_cpus == 0) {
cpu_present_map = cpumask_of_cpu(boot_cpuid);
+ cpu_enabled_map = cpumask_of_cpu(boot_cpuid);
printk(KERN_INFO "SMP mode deactivated.\n");
return;
}
--
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@dyn-67.arm.linux.org.uk>
---
arch/arm/mach-realview/platsmp.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/arch/arm/mach-realview/platsmp.c b/arch/arm/mach-realview/platsmp.c
index 8e813ed..c33f298 100644
--- a/arch/arm/mach-realview/platsmp.c
+++ b/arch/arm/mach-realview/platsmp.c
@@ -241,8 +241,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
* Initialise the present map, which describes the set of CPUs
* actually populated at the present time.
*/
- for (i = 0; i < max_cpus; i++)
+ for (i = 0; i < max_cpus; i++) {
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
+ }
/*
* Initialise the SCU if there are more than one CPU and let
--
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
---
arch/mips/kernel/smp.c | 1 +
arch/mips/kernel/smtc.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index cdf87a9..8f7f742 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -304,6 +304,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
set_cpu_sibling_map(0);
#ifndef CONFIG_HOTPLUG_CPU
cpu_present_map = cpu_possible_map;
+ cpu_enabled_map = cpu_possible_map;
#endif
}
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 3e86318..b0c16e0 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -498,6 +498,7 @@ void mipsmt_prepare_cpus(void)
while (tc < (((val & MVPCONF0_PTC) >> MVPCONF0_PTC_SHIFT) + 1)) {
cpu_clear(tc, phys_cpu_present_map);
cpu_clear(tc, cpu_present_map);
+ cpu_clear(tc, cpu_enabled_map);
tc++;
}
--
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
---
arch/parisc/kernel/processor.c | 1 +
arch/parisc/kernel/smp.c | 3 +++
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/parisc/kernel/processor.c b/arch/parisc/kernel/processor.c
index 370086f..fb81222 100644
--- a/arch/parisc/kernel/processor.c
+++ b/arch/parisc/kernel/processor.c
@@ -201,6 +201,7 @@ static int __cpuinit processor_probe(struct parisc_device *dev)
#ifdef CONFIG_SMP
if (cpuid) {
cpu_set(cpuid, cpu_present_map);
+ cpu_set(cpuid, cpu_enabled_map);
cpu_up(cpuid);
}
#endif
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 85fc775..a31b0ad 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -535,6 +535,7 @@ void __devinit smp_prepare_boot_cpu(void)
cpu_set(bootstrap_processor, cpu_online_map);
cpu_set(bootstrap_processor, cpu_present_map);
+ cpu_set(bootstrap_processor, cpu_enabled_map);
}
@@ -546,7 +547,9 @@ void __devinit smp_prepare_boot_cpu(void)
void __init smp_prepare_cpus(unsigned int max_cpus)
{
cpus_clear(cpu_present_map);
+ cpus_clear(cpu_enabled_map);
cpu_set(0, cpu_present_map);
+ cpu_set(0, cpu_enabled_map);
parisc_max_cpus = max_cpus;
if (!max_cpus)
--
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/kernel/smp.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 5d4fa4b..f7ae20b 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -458,6 +458,7 @@ static int smp_rescan_cpus_sigp(cpumask_t avail)
if (!cpu_stopped(logical_cpu))
continue;
cpu_set(logical_cpu, cpu_present_map);
+ cpu_set(logical_cpu, cpu_enabled_map);
smp_cpu_state[logical_cpu] = CPU_STATE_CONFIGURED;
logical_cpu = next_cpu(logical_cpu, avail);
if (logical_cpu == NR_CPUS)
@@ -490,6 +491,7 @@ static int smp_rescan_cpus_sclp(cpumask_t avail)
__cpu_logical_map[logical_cpu] = cpu_id;
smp_cpu_polarization[logical_cpu] = POLARIZATION_UNKNWN;
cpu_set(logical_cpu, cpu_present_map);
+ cpu_set(logical_cpu, cpu_enabled_map);
if (cpu >= info->configured)
smp_cpu_state[logical_cpu] = CPU_STATE_STANDBY;
else
@@ -844,6 +846,7 @@ void __init smp_prepare_boot_cpu(void)
current_thread_info()->cpu = 0;
cpu_set(0, cpu_present_map);
+ cpu_set(0, cpu_enabled_map);
cpu_set(0, cpu_online_map);
S390_lowcore.percpu_offset = __per_cpu_offset[0];
current_set[0] = current;
@@ -1104,8 +1107,10 @@ int __ref smp_rescan_cpus(void)
cpus_andnot(newcpus, cpu_present_map, newcpus);
for_each_cpu_mask(cpu, newcpus) {
rc = smp_add_present_cpu(cpu);
- if (rc)
+ if (rc) {
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, cpu_enabled_map);
+ }
}
rc = 0;
out:
--
Populate the cpu_enabled_map correctly. Note that this patch does not actually make any decisions based on the contents of the map. However, as the map is presented via sysfs in: /sys/devices/system/cpu/ It should be populated correctly. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Paul Mundt <lethal@linux-sh.org> --- arch/sh/kernel/smp.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c index 5d039d1..b91fdfa 100644 --- a/arch/sh/kernel/smp.c +++ b/arch/sh/kernel/smp.c @@ -60,6 +60,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) #ifndef CONFIG_HOTPLUG_CPU cpu_present_map = cpu_possible_map; + cpu_enabled_map = cpu_possible_map; #endif } --
Modify MADT local SAPIC parsing such that:
- all present CPUs added to cpu_present_map
- all enabled CPUs added to cpu_enabled_map
This change allows us to check during __cpu_up() if we should
actually bring up the CPU. That is, if a CPU is present, but not
enabled by firmware, we should not bring it up.
Contrariwise, by creating a sysfs interface for a disabled CPU,
we provide a hook that allows a user to interact with the CPU.
The most visible user interface change with this patch is that more
sysfs entries will appear in /sys/devices/system/cpu/cpuN/ on
multi-threaded systems with threads turned off.
The actual directories will be empty. A later patch in this series
will provide an example of using this new hook to provide a user
interface for disabled CPUs.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
---
arch/ia64/kernel/acpi.c | 16 +++++++++-------
arch/ia64/kernel/smpboot.c | 7 +++++++
include/asm-ia64/smp.h | 1 +
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 43687cc..1b4b338 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -212,13 +212,14 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end)
/*Skip BAD_MADT_ENTRY check, as lsapic size could vary */
- if (lsapic->lapic_flags & ACPI_MADT_ENABLED) {
#ifdef CONFIG_SMP
- smp_boot_data.cpu_phys_id[available_cpus] =
- (lsapic->id << 8) | lsapic->eid;
+ smp_boot_data.cpu_phys_id[available_cpus] =
+ (lsapic->id << 8) | lsapic->eid;
+
+ smp_boot_data.cpu_enabled[available_cpus] =
+ lsapic->lapic_flags & ACPI_MADT_ENABLED;
#endif
- ++available_cpus;
- }
+ ++available_cpus;
total_cpus++;
return 0;
@@ -872,8 +873,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
lsapic = (struct acpi_madt_local_sapic *)obj->buffer.pointer;
- if ((lsapic->header.type != ACPI_MADT_TYPE_LOCAL_SAPIC) ||
- ...acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.
We need to update the signature of acpi_map_cpu2node() too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
---
arch/ia64/kernel/acpi.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 1b4b338..a928b94 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -775,7 +775,7 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
*/
#ifdef CONFIG_ACPI_HOTPLUG_CPU
static
-int acpi_map_cpu2node(acpi_handle handle, int cpu, long physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
{
#ifdef CONFIG_ACPI_NUMA
int pxm_id;
@@ -855,8 +855,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
union acpi_object *obj;
struct acpi_madt_local_sapic *lsapic;
cpumask_t tmp_map;
- long physid;
- int cpu;
+ int cpu, physid;
if (ACPI_FAILURE(acpi_evaluate_object(handle, "_MAT", NULL, &buffer)))
return -EINVAL;
--
Currently, the following cpu maps exist:
cpu_possible_map - map of populatable CPUs
cpu_present_map - map of populated CPUs
cpu_online_map - map of schedulable CPUs
These maps do not provide the concept of populated, but disabled CPUs.
That is, a system may contain CPU modules that are physically plugged
in, but disabled by system firmware. The existence of this class of
CPUs breaks the following assumption in smp_init():
for_each_present_cpu(cpu) {
.../...
if (!cpu_online(cpu))
cpu_up(cpu);
}
The assumption is that the kernel should attempt cpu_up() on every
physically populated CPU, which may not be desirable for present but
disabled CPUs.
By providing cpu_enabled_map, we can keep the above [simplifying]
assumption in smp_init(), and push the knowledge of disabled CPUs
and the decision to bring them up, down into arch specific code.
Signed-off-by: Alex Chiang <achiang@hp.com>
---
drivers/base/cpu.c | 9 +++++++--
include/linux/cpumask.h | 31 ++++++++++++++++++++++++-------
init/main.c | 1 +
kernel/sched.c | 16 ++++++++++++----
4 files changed, 44 insertions(+), 13 deletions(-)
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e38dfed..bc300df 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -57,7 +57,10 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);
static void __cpuinit register_cpu_control(struct cpu *cpu)
{
- sysdev_create_file(&cpu->sysdev, &attr_online);
+ int logical_cpu = cpu->sysdev.id;
+
+ if (cpu_isset(logical_cpu, cpu_enabled_map))
+ sysdev_create_file(&cpu->sysdev, &attr_online);
}
void unregister_cpu(struct cpu *cpu)
{
@@ -125,11 +128,13 @@ struct sysdev_class_attribute attr_##type##_map = \
print_cpus_func(online);
print_cpus_func(possible);
print_cpus_func(present);
+print_cpus_func(enabled);
struct sysdev_class_attribute *cpu_state_attr[] = {
&attr_online_map,
...I don't understand why we want to know about these CPUs. Surely they should be 'possible', but not 'present'? What useful thing can Linux do with them? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
He explained it in the intro, near the end (I nearly complained about this too when I hadn't finished reading it completely :): |The big picture implication is that we can allow userspace |to interact with disabled CPUs. In this particular example, |we provide a knob that lets a sysadmin schedule any present |CPU for firmware deconfiguration or enablement. The reason sounds pretty exotic, but ok. -Andi --
I don't see why this needs to be cross architecture then - shouldn't the generic kernel only be concerning itself with things that are possible, present and/or online? If you have an interface which allows you to change the machines configuration in a machine specific way, shouldn't that be something for that machine to support and forced upon the entire kernel? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: --
I suppose that's a fair statement. Touching all the archs for something 'exotic' like this does seem to be a bit of an overkill. My thought was that big SMP systems like ia64, possibly sparc and ppc, and increasingly, x86, might find something like this useful, as systems get larger and larger, and vendors are going to want to do RAS-ish features, like the ability to keep CPUs in firmware across reboots until told otherwise by the sysadmin. Right now, a 'present' CPU strongly implies 'online' as well, since we're calling cpu_up() for all 'present' CPUs in smp_init(). But this hurts if: - you don't actually want to bring up all 'present' CPUs - you still want to interact with these weirdo zombie CPUs that are 'present' but not 'online' That second item refers to creating a sysfs interface for each 'present' CPU in topology_init(). This feature puts a tax on smaller archs like arm, but maybe I could be smarter about it by using a #define cpu_enabled_mask cpu_online_mask I think that the generic kernel would be the appropriate place to create a place for these zombie CPUs, and give the vendor specific stuff a way to hook in. I'd be interested in learning if any of the other 'big' archs would have a use for something like this. Thanks. /ac --
Have you considered simply failing __cpu_up() for CPUs that are deconfigured by firmware? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
But what if you want to have a system boot with, say, 4 CPUs and then decide at run time to bring up another 4 CPUs when required? How about having smp_init() call into arch code to query whether it should bring up a not-already-online CPU? Architectures that want to do something special can then make the decision there and everyone else can define the test completely away. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: --
So this is exactly what I'm doing. The ia64 patch has this hunk:
@@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
if (cpu_isset(cpu, cpu_callin_map))
return -EINVAL;
+ if (!cpu_isset(cpu, cpu_enabled_map))
+ return -EINVAL;
+
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* Processor goes to start_secondary(), sets online flag */
ret = do_boot_cpu(sapicid, cpu);
That was the easiest, most-straightforward solution I could think
of. If you have an idea for a version with lower taxes (doesn't
touch all the archs or can be #define'd out), I'm happy to hear
it.
Thanks.
/ac
--
I think I did make a suggestion in the bit you quote from me above.
Let me be more explicit:
static void __init smp_init(void)
{
unsigned int cpu;
/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
break;
- if (!cpu_online(cpu))
+ if (smp_cpu_enabled(cpu) && !cpu_online(cpu))
cpu_up(cpu);
}
/* Any cleanup work */
printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
smp_cpus_done(setup_max_cpus);
}
and have architectures provide 'smp_cpu_enabled(cpu)' which can either
be a function, inline function or a macro (and therefore possible to be
completely eliminated.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
--
I experimented today with an ia64-only solution, keeping track of 'present' vs 'enabled' vs 'online' all in arch-specific code. The arch-specific stuff turns out to be more or less a wash; that is, it's not too hard to keep it all in ia64. However, the problem is, I would still need a generic 'enabled_map' to control whether 'online' and 'crash_notes' entries get created for /sys/devices/system/cpu/cpuN/. So if other archs are at least neutral on this class of CPUs, I can work on another patchset that lowers the tax to a simple #define for archs that don't care. But if people hate this idea of a new map, I'd like to know so that I'm not wasting my time and can work on a different solution (what that would be, I have no idea at the moment). Thanks. /ac --
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
arch/powerpc/kernel/setup-common.c | 1 +
arch/powerpc/platforms/powermac/smp.c | 1 +
arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 ++
3 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index db540ea..a4c894a 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -413,6 +413,7 @@ void __init smp_setup_cpu_maps(void)
DBG(" thread %d -> cpu %d (hard id %d)\n",
j, cpu, intserv[j]);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
set_hard_smp_processor_id(cpu, intserv[j]);
cpu_set(cpu, cpu_possible_map);
cpu++;
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index cb2d894..a74dada 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -317,6 +317,7 @@ static int __init smp_psurge_probe(void)
ncpus = NR_CPUS;
for (i = 1; i < ncpus ; ++i) {
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
set_hard_smp_processor_id(i, i);
}
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1f03248..e738b07 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -186,6 +186,7 @@ static int pseries_add_processor(struct device_node *np)
for_each_cpu_mask(cpu, tmp) {
BUG_ON(cpu_isset(cpu, cpu_present_map));
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
...Care to educate me on the difference between online_map and enabled_map ? Cheers, --
enabled_map is closer conceptually to present_map than online_map. present_map are CPUs that are actually plugged in online_map are CPUs that have had cpu_up() called on them; ie. schedulable enabled_map is somewhere inbetween -- the CPUs are plugged in, but we don't want to cpu_up() them. On hp ia64 systems, these CPUs are disabled by system firmware. Currently, a user can only configure/deconfigure the CPUs from the system firmware interface. By providing a sysfs interface for these CPUs, we can allow the user to configure/deconfigure them from userspace. More realistically, higher level managability software now has an OS-level interface to interact with these CPUs. Might this be useful for ppc and your hypervisor based architecture? I could imagine your hypervisor telling the kernel about all the physically present CPUs, but then you would be able to have finer grained control using the enabled_map. I haven't studied your code in depth, so maybe you can just do everything with pure online/offline, but at least on my platforms, there are use-cases where we might want something in-between. Thanks. /ac --
Provide a new sysfs interface for CPU deconfiguration. Since no vendors can agree on terminology for related but slightly different features, provide a method for a platform to implement its own version of what it thinks 'deconfiguring' a CPU might be. Provide an HP-specific CPU deconfiguration implementation. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Andi Kleen <ak@linux.intel.com> --- drivers/acpi/Kconfig | 18 ++ drivers/acpi/Makefile | 4 drivers/acpi/processor_core.c | 8 + drivers/acpi/processor_deconfigure.c | 275 ++++++++++++++++++++++++++++++++++ include/acpi/processor.h | 6 + 5 files changed, 311 insertions(+), 0 deletions(-) create mode 100644 drivers/acpi/processor_deconfigure.c diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index c52fca8..36ad177 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -188,6 +188,24 @@ config ACPI_HOTPLUG_CPU select ACPI_CONTAINER default y +config ACPI_DECONFIGURE_CPU + bool "Processor deconfiguration" + depends on ACPI_PROCESSOR + default n + help + This processor driver submodule allows a user to mark a CPU + for firmware disabling/enabling. It will create the following + sysfs file: + + /sys/devices/system/cpu/cpuN/deconfigure + + Behavior of this interface is highly vendor-dependent and + requires firmware support. + + This option is NOT required for CPU hotplug support. + + If unsure, say N. + config ACPI_THERMAL tristate "Thermal Zone" depends on ACPI_PROCESSOR diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 40b0fca..92a5037 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -35,6 +35,10 @@ ifdef CONFIG_CPU_FREQ processor-objs += processor_perflib.o endif +ifdef CONFIG_ACPI_DECONFIGURE_CPU +processor-objs += processor_deconfigure.o +endif + obj-y += sleep/ obj-y += bus.o glue.o obj-y += scan.o diff --git ...
Why are you ccing this to linux-arch? Dropped. What is the standard status of these new SCFG and ECFG tables? Have they been submitted for possible inclusion in ACPI? And is there a spec available? I can't say I'm really thrilled with having HP specific support in there. It would be better at least if you could reserve the table names and then drop the HP DMI check. This is needed anyways, otherwise the Now that seems like weird semantics for a public fixed API. What happens when some other vendor adds hot deconfiguration? My feeling is that this seems to be overly specific to your BIOS and might better belong into some separate management tool. At least until we can define a nice general API for this with clear semantics. For what systems is this anyways? -Andi --
Hm, sorry. I thought it would have been weird to send patches 1-13 / 14 to linux-arch, but not send 14 / 14. Perhaps I should set a Reply-to: These are not new tables -- they are methods that live underneath processor objects in the namespace. Yes, they are specific to HP, but because they are methods, there shouldn't be any collision with other vendors defining methods Yeah, I'm not totally happy with it either. But I'd like to clarify -- are you concerned with the name of the interface (deconfigure) or the "nothing happens until next boot" behavior? Would it help if I renamed it to "enabled" and had something like: echo 0 > enabled echo 1 > enabled cat enabled And then that would map to vendor-specific behavior? Or is it really the "nothing happens until next boot" thing that Well, life would be a lot easier if we had a generic way to poke at ACPI methods, but dev_acpi has been rejected multiple times. ;) On a more serious note, my fear is that an interface like this is not going to have agreement / clear semantics for a long time because one vendor is going to want to call it 'deconfigured' and another one might want to call it 'disabled' and a third might want to call it 'puppy_dogs' and they'll all be kinda related but not exactly the same. That's why I was going down the road of creating at least one generic interface, but with vendor-specific semantics. We have HP ia64 systems shipping today that support this. Thanks. /ac --
Populate the cpu_enabled_map correctly.
Note that this patch does not actually make any decisions based
on the contents of the map.
However, as the map is presented via sysfs in:
/sys/devices/system/cpu/
It should be populated correctly.
There will be a user-visible change under the above directory.
cpuN/ entries for firmware-disabled CPUs will now appear, whereas
before, they did not due to a check against ACPI_MADT_ENABLED.
The cpuN/ entries will be empty, and the online file in the
above directory will reflect which CPUs are actually schedulable.
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/acpi/boot.c | 7 +++++--
arch/x86/kernel/apic_32.c | 1 +
arch/x86/kernel/apic_64.c | 1 +
arch/x86/kernel/smpboot.c | 2 ++
arch/x86/mach-voyager/voyager_smp.c | 2 ++
arch/x86/xen/smp.c | 1 +
6 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 33c5216..c6dc5da 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -559,8 +559,7 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)
lapic = (struct acpi_madt_local_apic *)obj->buffer.pointer;
- if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC ||
- !(lapic->lapic_flags & ACPI_MADT_ENABLED)) {
+ if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC) {
kfree(buffer.pointer);
return -EINVAL;
}
@@ -584,6 +583,9 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)
return -EINVAL;
}
+ if (lapic->lapic_flags & ACPI_MADT_ENABLED)
+ cpu_set(cpu, cpu_enabled_map);
+
cpu = first_cpu(new_map);
*pcpu = cpu;
@@ -601,6 +603,7 @@ int acpi_unmap_lsapic(int cpu)
{
per_cpu(x86_cpu_to_apicid, cpu) = -1;
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, ...From an x86 standpoint this patchset seems reasonable to me. Acked-by: H. Peter Anvin <hpa@zytor.com> Since it is a panarch patchset it presumably should go via -mm rather than in the arch trees, so I'm not going to add it to -tip. Obviously, if the sematics of the operations don't make sense for other architectures -- which I will leave up to the affected maintainers -- then that should be carefully considered if the generic operations can be done better. -hpa --
Thanks Peter. Let me try and rework the patchset according to Russell's suggestion here: http://lkml.org/lkml/2008/7/18/467 Russell's solution avoids the issue with the ability to #define the check away for archs that don't care. cheers, /ac --
PiBQYXRjaCAxNCBpcyB0aGUgbW9uZXkgcGF0Y2guIEl0IGRlbW9uc3RyYXRlcyB3aHkgd2UgbWln aHQNCj4gd2FudCB0byBnbyB0aHJvdWdoIGFsbCB0aGVzZSBneXJhdGlvbnMuIE5vdyB0aGF0IGlh NjQgcHJlc2VudHMNCj4gKmFsbCogcGh5c2ljYWxseSBwcmVzZW50IENQVXMgaW4gc3lzZnMsIGV2 ZW4gaWYgdGhleSBoYXZlIGJlZW4NCj4gZGlzYWJsZWQgYnkgZmlybXdhcmUsIHdlIGdpdmUgdXNl cnNwYWNlIGEgd2F5IHRvIHBva2UgYXQgdGhvc2UNCj4gQ1BVcy4NCg0KVGhlcmUncyBvbmx5IHRo ZSBvbmUgYml0IGZvciAiZGlzYWJsZWQgYnkgZmlybXdhcmUiIC4uLiBubyBleHRyYQ0Kc3BhY2Ug Zm9yIGFueSBleHRyYSBpbmZvcm1hdGlvbi4gIEhvdyB3b3VsZCB1c2Vyc3BhY2Uga25vdyB0aGF0 DQppdCB3YXMgc2FmZSB0byBwb2tlIGF0IGEgZGlzYWJsZWQgY3B1PyAgUGVyaGFwcyBmaXJtd2Fy ZSBkaXNhYmxlZA0KaXQgZm9yIHNvbWUgdmVyeSBnb29kIHJlYXNvbiwgYW5kIHBva2luZyBhdCBp dCBjb3VsZCBjYXVzZSBzeXN0ZW0NCmluc3RhYmlsaXR5Lg0KDQotVG9ueQ0K --
I didn't include linux-ia64 originally. Sorry about that. Here is the 00/14 cover email describing the patch series: http://lkml.org/lkml/2008/7/14/468 Here is the 12/14 ia64 specific bit: http://lkml.org/lkml/2008/7/14/478 Here is the 14/14 patch that Tony is referring to: http://lkml.org/lkml/2008/7/14/482 My thought here was that it would be a vendor-specific thing. In patch 14/14 I created: /sys/devices/system/cpu/cpuN/deconfigure (although /sys/device/system/cpu/cpuN/enabled would probably be better) I set up 'deconfigure' to have different implementations based on a DMI, so it is very much an opt-in (especially since it's a Kconfig option). It would be the responsibility of the vendor to provide something safe to poke at. In the sample implementation I gave, nothing happens to the system until the *next* reboot, so it shouldn't cause the current boot any distress. A different implementation of deconfigure/enabled could return an error to userspace if an operation was unsafe. /ac --
