On Wed, Aug 6, 2008 at 5:42 PM, Jeff Chua <jeff.chua.linux@gmail.com> It works. Booted with 16CPUs. 32GB RAM. CPU0 L7345 1.86GHz 0C CPU1 L7345 1.86GHz 0C CPU2 L7345 1.86GHz 0C CPU3 L7345 1.86GHz 0C CPU4 L7345 1.86GHz 0C CPU5 L7345 1.86GHz 0C CPU6 L7345 1.86GHz 0C CPU7 L7345 1.86GHz 0C CPU8 L7345 1.86GHz 0C CPU9 L7345 1.86GHz 0C CPU10 L7345 1.86GHz 0C CPU11 L7345 1.86GHz 0C CPU12 L7345 1.86GHz 0C CPU13 L7345 1.86GHz 0C CPU14 L7345 1.86GHz 0C CPU15 L7345 1.86GHz 0C So, but setting the config not obvious. And should CONFIG_X86_PC be considered as well as CONFIG_X86_GENERICARCH? With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y. With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA. I'm using the patch below to enable sparsemem instead of flatmem, but don't know what impact it has. System booted and running. It would be nice to automatically default CONFIG_X86_BIGSMP with CPUs > 8. But I don't know to do that. Thanks, Jeff. --- linux/arch/x86/Kconfig.org 2008-08-06 18:41:08 +0800 +++ linux/arch/x86/Kconfig 2008-08-06 18:48:13 +0800 @@ -1035,7 +1035,7 @@ config ARCH_FLATMEM_ENABLE def_bool y - depends on X86_32 && ARCH_SELECT_MEMORY_MODEL && X86_PC && !NUMA + depends on X86_32 && ARCH_SELECT_MEMORY_MODEL && !NUMA config ARCH_DISCONTIGMEM_ENABLE def_bool y @@ -1051,7 +1051,7 @@ config ARCH_SPARSEMEM_ENABLE def_bool y - depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC) + depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC) || X86_GENERICARCH select SPARSEMEM_STATIC if X86_32 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 --
actually x86_pc is one mode of genericarch..., genericarch already could detect pc, bigsmp, and numaq, es7000, bigsmp, visew.. hope later we can change mach_default to default. but embed guys may want to keep it as seperated one. in the dmesg when booting x86_pc only, we already have warning to let you set bigsmp if you have 8 more cpus. YH --
It seems to get "sparse mem", NUMA must be set first, but this is not With more than 8 CPUs and upon boot up and hangs, Shift+PgUp does not work, so it's not possible to view console messages except those on the current page, so I guess I missed that hint. Jeff. --
thanks, applied. i'm wondering, does with that patch applied a working 2.6.26 .config if put through 'make oldconfig' boot fine on your box now? Any make oldconfig breakage is a regression we want to fix. We want upgrades between kernel versions to be seemless and complete. Ingo --
btw., could you please check that v2.6.27-rc3 (or later) kernels boot fine (with about 8 cpus) even if you hae genericarch/bigsmp disabled, and do not silently hang as it happened on your box before? Ingo --
With 16 CPUs, it still hangs, but now the console is showing the errors as intended. ... but it is supposed to hang? More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. Booting processor 8/1 ip 6000 Initializing CPU#8 Calibrating delay using timer specific routine.. 3723.88 BogoMIPS (lpj=7447763) CPU: L1 I cache: 32Kb, L1 D cache: 32K CPU: L2 cache: 4096K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU8: Intel(8) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b checking TSC synchronization [CPU#0 -> CPU#8]: passed. *** HANGS HERE *** Thanks, Jeff. --
I tried with just CONFIG_NR_CPUS=8 and this time it booted, but stange thing is I only see 2 CPUs! To be more precise, it's without both CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. And when I tried to enable the CPUs, it complained about: # cat cpu6/online 0 # echo 1 > cpu6/online More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. -bash: echo: write error: Input/output error Prior to the patch, the system booted with all 8 CPUs. Again, if I enable both CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP, I get all 16 CPUs. Thanks, Jeff. --
Yinghai, could the APIC ID enumeration be nonsequential and we skip CPUs starting at the third one already? I think we should accept all CPUs that are within our support range. Ingo --
will try to clear those bits on smp_sanity_check... YH --
jeff, please check the attached patch. it should fix the new regression and will not hang. YH
Ok, booted up and not hanged, but those messages below don't show up
anywhere. I've tested with CONFIG_NR_CPUS=16 and 8 as well. Just got 8
cpus
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
# cat /sys/devices/system/cpu/possible
0-7
CONFIG_X86_32=y
CONFIG_X86_PC=y
Looks like it's not going into this condition
+ if (def_to_bigsmp && nr_cpu_ids > 8) {
Shall this be put back so that it'll show the message?
- if (def_to_bigsmp && apicid > 8) {
- printk(KERN_WARNING
- "More than 8 CPUs detected - skipping them.\n"
- "Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
- }
Thanks,
Jeff.
--
double checked on one 16 cores system got CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU 0(4) -> Core 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. using C1E aware idle routine Checking 'hlt' instruction... OK. ACPI: Core revision 20080609 Parsing all Control Methods: Table [DSDT](id 0001) - 1289 Objects with 114 Devices 462 Methods 26 Regions Parsing all Control Methods: Table [SSDT](id 0002) - 80 Objects with 0 Devices 0 Methods 0 Regions tbxface-0596 [00] tb_load_namespace : ACPI Tables successfully acquired evxfevnt-0091 [00] enable : Transition to ACPI mode successful More than 8 CPUs detected - skipping them. Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP. enabled ExtINT on CPU#0 YH --
could you post the full dmesg? And the modified patch that you've tested to both have 8 CPUs without bigsmp and which also shows the printk? Ingo --
