From: Max Krasnyansky <maxk@qualcomm.com> For some reason we had redundant parsers registered for maxcpus=. One in init/main.c and another in arch/x86/smpboot.c So I nuked the one in arch/x86. Also 64-bit kernels used to handle maxcpus= as documented in Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized but not booted. 32-bit version for some reason ignored them even though all the infrastructure for booting them later is there. In the current mainline both 64 and 32 bit versions are broken. I'm too lazy to look through git history but I'm guessing it happened as part of the i386 and x86_64 unification. This patch restores the correct behaviour. I've tested x86_64 version on 4- and 8- way Core2 and 2-way Opteron based machines. Various config combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG. Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected. I cannot test 32-bit version (no 32-bit machines here). Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Cc: lizf@cn.fujitsu.com Cc: jeff.chua.linux@gmail.com --- arch/x86/kernel/apic_32.c | 8 -------- arch/x86/kernel/apic_64.c | 7 ------- arch/x86/kernel/smpboot.c | 14 -------------- 3 files changed, 0 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c index d6c8983..71d49cc 100644 --- a/arch/x86/kernel/apic_32.c +++ b/arch/x86/kernel/apic_32.c @@ -1454,8 +1454,6 @@ void disconnect_bsp_APIC(int virt_wire_setup) } } -unsigned int __cpuinitdata maxcpus = NR_CPUS; - void __cpuinit generic_processor_info(int apicid, int version) { int cpu; @@ -1482,12 +1480,6 @@ void __cpuinit generic_processor_info(int apicid, int version) return; } - if (num_processors >= maxcpus) { - printk(KERN_WARNING "WARNING: maxcpus limit of %i reached." - " Processor ignored.\n", maxcpus); - return; - } - num_processors++; cpus_complement(tmp_map, cpu_present_map); cpu = first_cpu(tmp_map); diff --git ...
I just realized that I managed to screw up my From: email address in git-send-email command line. Please reply to maxk@qualcomm.com. Max --
I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline, and then I got softlockup BUG immediately when I onlined cpu1: SMP alternatives: switching to SMP code CPU 1 irqstacks, hard=c078c000 soft=c076c000 Booting processor 1/1 ip 6000 Initializing CPU#1 Calibrating delay using timer specific routine.. 5600.37 BogoMIPS (lpj=2800188) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (24) available CPU1: Thermal monitoring enabled CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04 checking TSC synchronization [CPU#0 -> CPU#1]: passed. Switched to high resolution mode on CPU 1 BUG: soft lockup - CPU#1 stuck for 216s! [events/0:0] Modules linked in: bridge stp llc autofs4 dm_mirror dm_log dm_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd soundcore r8169 snd_page_alloc sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] irq event stamp: 156 hardirqs last enabled at (155): [<c044407f>] trace_hardirqs_on+0xb/0xd hardirqs last disabled at (156): [<c04eee88>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (152): [<c042c2f3>] __do_softirq+0xe3/0xe9 softirqs last disabled at (95): [<c04058eb>] do_softirq+0x65/0xb4 Pid: 0, comm: events/0 Not tainted (2.6.27-rc1 #224) EIP: 0060:[<c04088ba>] EFLAGS: 00000246 CPU: 1 EIP is at mwait_idle+0x3c/0x4a EAX: 00000000 EBX: e3e48008 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: e3e48f9c ESP: e3e48f98 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 00768000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0402591>] cpu_idle+0xbf/0xdf ...
Test on Lenovo X60s (CoreDuo) 2 CPUs with maxcpus=1, and Dell R900 (Xeon QuadCore x4) 16 CPUs with maxcpus=8. All working. I can switch online and offline up to 2CPUs on X60s, and I'm using latest linux kernel. Thanks, Jeff. --
Excellent. Andrew, Ingo, can one of you guys please push this patch to mainline. Thanx Max --
This is unrelated to the patch that I sent. In fact looks like the patch actually worked for you. In the sense that it did the right thing, initialized cpus but did not boot them. As far as the soft-lockup goes you might want to try different configs. ie Disable features you do not need. For example cpusets hotplug path in the current mainline is unsafe (the patch is in review). Also for me if ftrace is enabled onlining a cpu causes immediate reboot. So I'd say start disabling features and see which one cases the problem. Max --
Yes, the patch works for me, and the soft-lockup is another different issue. Thx for the explanation. :) --
yes in essence. 32-bit always had maxcpus as a hard restriction in the
number of CPUs. This got extended to 64-bit as well, via commit
89b08200ad:
x86: make x86_64 accept the max_cpus parameter
in v2.6.25. Two major kernel releases and nobody noticed - it's a rarely
This will need some test time on 32-bit as that is where this represents
a material change. ( albeit what matters most is the maxcpus=1
distinction - and for that nosmp can be used as well to turn off
multi-cpu support altogether. So we could do this in v2.6.27 as well. )
Also, a quick grep shows that your patch will very likely break the
visws build:
arch/x86/kernel/visws_quirks.c:extern unsigned int __cpuinitdata maxcpus;
arch/x86/kernel/visws_quirks.c: if (ncpus > maxcpus)
arch/x86/kernel/visws_quirks.c: ncpus = maxcpus;
could you please fix that?
Ingo
--
So far we got a couple of reports that it works as expected on 32 (both laptop Oh. I missed that one. Will fix. Max --
Yes, but the usecase i'm worried about is when say maxcpus=1 was used to _prevent_ an SMP bootup - because the system would not work otherwise. i guess we want to tickle those systems anyway as that case is not supposed to happen (and it can always be totally disabled via nosmp or noapic). So i'm not against your fix/change per se, i just wanted to highlight that it has some impact on existing uses of maxcpus that is outside of your cpu-hotplug usecase. Ingo --
I see what you mean. I think it's fairly safe though since we do not actually do much for the cpus that are not going to be brought online. Mainly just setting cpu_*_map and initializing per cpu areas. If something is broken in there we'd probably want to fix that asap anyway. And like you said nosmp does the job too. Max --
btw I think it's rarely used because many people do not realize it's there. There are at least a couple of use cases that came up recently. - Busted cpu. You can boot the machine with maxcpus=1 and then bring up cpus one by one to see which one is busted. - Recently reported regression that 16cpu box booted fine with NRCPUS=8 but failed with NRCPUS=16. Again we can boot with maxcpus=8 and bring other cpus later to see when/where we fail. Things like that. Max --
