One of my co-workers noticed that the powernow-k8 driver no longer restarts when a CPU core is hot-disabled and then hot-enabled on AMD quad-core systems. The following comands work fine on 2.6.26 and fail on 2.6.27-rc1: echo 0 > /sys/devices/system/cpu/cpu3/online echo 1 > /sys/devices/system/cpu/cpu3/online find /sys -name cpufreq For 2.6.26, the find will return a cpufreq directory for each processor. In 2.6.27-rc1, the cpu3 directory is missing. After digging through the code, the following logic is failing when the core is hot-enabled at runtime. The code works during the boot sequence. cpumask_t = current->cpus_allowed; set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)); if (smp_processor_id() != cpu) return -ENODEV; The objective is to move to the specific CPU and test it's MSRs to see if it is supported. For some reason, it isn't working. Any suggestions on how to fix this are appreciated. -Mark Langdsorf Operating System Research Center AMD --
if it gets called from any of the cpu-hotplug handlers, it won't work now (x86-microcode is another victim). Please give a try to the following patch: http://lkml.org/lkml/2008/7/30/171 does it help? (the explanation is also available in this thread). well, provided we may guarantee that load-balancing has been initialized (it's ok in our case) by the moment CPU_ONLINE gets called, this approach is not that bad perhaps... (and it looks like there is plenty of code that relies on set_cpus_allowed_ptr() being workable in cpu-hotplug-handlers). Although, I personally don't like that much this particular use-case of set_cpus_allowed_ptr() (I posted patches for x86-microcode). btw., last time I briefly looked at various places, there seemed to be a few where old_mask = p->cpus_allowed; set_cpus_allowed_ptr(p, target_cpu); // do something set_cpus_allowed_ptr(p, old_mask); is used just wrongly. e.g. it may race with sched_setaffinity() and -- Best regards, Dmitry Adamushko --
i've queued up the fix below in tip/sched/urgent. Ingo ------------------------> From 3bc8fb8f85b79aa2d6341adb5090799b93d64bc9 Mon Sep 17 00:00:00 2001 From: Dmitry Adamushko <dmitry.adamushko@gmail.com> Date: Wed, 30 Jul 2008 12:34:04 +0200 Subject: [PATCH] sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks a similar problem also affects the microcode driver. So set the CPU active before calling the CPU_ONLINE notifier chain, there are a handful of notifiers that use set_cpus_allowed(). Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/cpu.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index e202a68..c977c33 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen) goto out_notify; BUG_ON(!cpu_online(cpu)); + cpu_set(cpu, cpu_active_map); + /* Now call notifier in preparation. */ raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu); @@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu) err = _cpu_up(cpu, 0); - if (cpu_online(cpu)) - cpu_set(cpu, cpu_active_map); - out: cpu_maps_update_done(); return err; --
Sorry for the delay in response - we had some building power issues this weekend. Yes, that fixed the issue. -Mark Langsdorf Operating System Research Center AMD --
thanks Mark - i've added your Tested-by line to the commit. Ingo --
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> This fix also solves the problem with x86-microcode. I've sent alternative patches for microcode, but as this "rely on set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)" thing seems to be more broad than what we thought, perhaps this fix should be applied. With this patch we define that by the moment CPU_ONLINE is being sent, -- Best regards, Dmitry Adamushko --
Ok, not only does that fix the bug, but it simplifies the code and looks obviously ok. However, I don't have it in my tree yet, and I'd like to do an -rc3 that has this fixes (so that along with the PCI MSI thing, we hopefully have most of the suspend/resume regressions fixed). And I was hoping to do -rc3 today. Can I please have pull-requests for the appropriate urgent scheduler/x86 fixes? Or should I just take these as patches? Linus --
i'll send pull requests for all pending patches. I wanted to send them tomorrow originally but will do them now. Ingo --
I actually thought it's somewhat against the original idea. It seems that we'd be setting 'active' be a little too early. ie before all the hotplug handlers had a chance to realize that cpu is now online. It'd be nice if -rc3 included my cpuset patch so that we could put circular locking issues in the cpu hotplug path to the rest. Ingo, I'm talking about this: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4) Max Max --
the latest (-v4) version of the patch was submitted just half an hour ago and it's rather large/complex, with a few unrelated changes (whitespace, etc.) mixed in as well. I'd like to wait for Paul's final ack for -v4 (he has already agreed with the approach in general), and wanted to have it tested myself as well, at least minimally. Ingo --
Fair enough. btw Whitespace and other cosmetic changes were requested by reviewers. Max --
yeah - it just makes it a tiny bit harder decision whether to queue up a patch in the urgent path. It's better to keep cleanups separate - that way any typos and unintended bugs in cleanups are more obvious as well. (because later on a person debugging a breakage does not have to wonder about whether a change's side-effects were intended or not.) But your patch certainly looks OK standalone as well, just IMO not as a very-last-minute patch. (No strong feelings though, your patch should not break anything in the normal !CPUSETS or the CPUSETS+no-cpuset-used usecases.) Ingo --
