Re: [PATCH 13/16] x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit

Previous thread: [RFC][PATCH v2 0/6] eCryptfs: added support for the encrypted key type by Roberto Sassu on Tuesday, December 28, 2010 - 3:48 am. (6 messages)

Next thread: Drift when measuring time using add_timer by Mohan V on Tuesday, December 28, 2010 - 5:05 am. (5 messages)
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Hello,

This is a repost of 3rd take of unify-x86_32-and-64-NUMA-init-paths
patchset which was initially posted on 11st Dec but didn't get
archived anywhere because vger had its disk full at the time.  Nothing
has changed since the first posting.

Most changes from the last take[L] are to reflect comments from tglx.

* Rebased on top of tip/x86/apic-cleanup.

* &quot;x86: apic: Cleanup and simplify setup_local_APIC()&quot; is already
  merged into x86/apic-cleanup and dropped from this series.

* Instead of repurposing apic-&gt;cpu_to_logical_apicid(), replace it
  with a new 32bit specific callback
  apic-&gt;x86_32_early_logical_apicid().

* For consistency, in 0012, apic-&gt;apicid_to_node() is renamed to
  x86_32_numa_cpu_node() and put inside CONFIG_X86_32.

* NULL assignments for unimplemented apic ops dropped.

* 0013 now explicitly describes why amd srat_detect_node() accesses
  __apicid_to_node[] directly.  Also, add a comment in the function
  noting the ugliness.

* Patch description updated in 0014.

* Other misc updates.

This patchset started as an attempt to fix percpu setup problem on
x86_32 NUMA configurations reported by Eric Dumazet.  On x86_32, NUMA
configuration is initialized during SMP bringup which happens way
later than percpu setup.  As percpu setup code doesn't have access to
NUMA configuration, it gets set up as if the system is UMA.

The NUMA init paths differ subtly yet significantly between x86_32 and
64 making it quite difficult to follow.  Custom apic configurations on
x86_32 worsens the problem.

There is no fundamental reason why x86_32 can't be initialized in the
same steps as x86_64.  They just were written differently and just
weren't unified during x86_32/64 code merge.  This patchset unifies
them and in the process fixes percpu setup problem on 32bit NUMA.

This patchset contains the following sixteen patches.

 0001-x86-Kill-unused-static-boot_cpu_logical_apicid-in-sm.patch
 0002-x86-Rename-x86_32-MAX_APICID-to-MAX_LOCAL_APIC.patch
 ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/kernel/smpboot.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 083e99d..4de6a00 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -165,8 +165,6 @@ static void unmap_cpu_to_node(int cpu)
 #endif
 
 #ifdef CONFIG_X86_32
-static int boot_cpu_logical_apicid;
-
 u8 cpu_2_logical_apicid[NR_CPUS] __read_mostly =
 					{ [0 ... NR_CPUS-1] = BAD_APICID };
 
@@ -1101,9 +1099,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	 * Setup boot CPU information
 	 */
 	smp_store_cpu_info(0); /* Final full version of the data */
-#ifdef CONFIG_X86_32
-	boot_cpu_logical_apicid = logical_smp_processor_id();
-#endif
+
 	current_thread_info()-&gt;cpu = 0;  /* needed? */
 	for_each_possible_cpu(i) {
 		zalloc_cpumask_var(&amp;per_cpu(cpu_sibling_map, i), GFP_KERNEL);
-- 
1.7.1

--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Unlike x86_64, on x86_32, the mapping from cpu to logical apicid may
vary depending on apic in use.  cpu_2_logical_apicid[] array is used
for this mapping.  Replace it with early percpu variable
x86_cpu_to_logical_apicid to make it better aligned with other
mappings.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/include/asm/apic.h      |    4 ----
 arch/x86/include/asm/smp.h       |    3 +++
 arch/x86/kernel/apic/apic.c      |   11 +++++++++++
 arch/x86/kernel/apic/es7000_32.c |    2 +-
 arch/x86/kernel/apic/numaq_32.c  |    2 +-
 arch/x86/kernel/apic/summit_32.c |    4 ++--
 arch/x86/kernel/setup_percpu.c   |    7 +++++++
 arch/x86/kernel/smpboot.c        |    7 ++-----
 8 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 5e3969c..eb139ec 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -595,8 +595,4 @@ extern int default_check_phys_apicid_present(int phys_apicid);
 
 #endif /* CONFIG_X86_LOCAL_APIC */
 
-#ifdef CONFIG_X86_32
-extern u8 cpu_2_logical_apicid[NR_CPUS];
-#endif
-
 #endif /* _ASM_X86_APIC_H */
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4c2f63c..dc7c46a 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -38,6 +38,9 @@ static inline struct cpumask *cpu_core_mask(int cpu)
 
 DECLARE_EARLY_PER_CPU(u16, x86_cpu_to_apicid);
 DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid);
+#if defined(CONFIG_SMP) &amp;&amp; defined(CONFIG_X86_32)
+DECLARE_EARLY_PER_CPU(int, x86_cpu_to_logical_apicid);
+#endif
 
 /* Static state in head.S used to set up a CPU */
 extern struct {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index c0f6426..ba78b1e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -79,6 +79,17 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 
 #ifdef CONFIG_X86_32
+
+#ifdef ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Currently, cpu -&gt; logical apic id translation is done by
apic-&gt;cpu_to_logical_apicid() callback which may or may not use
x86_cpu_to_logical_apicid.  This is unnecessary as it should always
equal logical_smp_processor_id() which is known early during CPU bring
up.

Initialize x86_cpu_to_logical_apicid after apic-&gt;init_apic_ldr() in
setup_local_APIC() and always use x86_cpu_to_logical_apicid for cpu -&gt;
logical apic id mapping.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/kernel/apic/apic.c |    8 ++++++++
 arch/x86/kernel/apic/ipi.c  |    8 ++++----
 arch/x86/kernel/smpboot.c   |    7 +++----
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ba78b1e..8ad231c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1252,6 +1252,14 @@ void __cpuinit setup_local_APIC(void)
 	 */
 	apic-&gt;init_apic_ldr();
 
+#ifdef CONFIG_X86_32
+	/*
+	 * APIC LDR is initialized.  Fetch and store logical_apic_id.
+	 */
+	early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
+		logical_smp_processor_id();
+#endif
+
 	/*
 	 * Set Task Priority to 'accept all'. We never change this
 	 * later on.
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 5037736..cce91bf 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -73,8 +73,8 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
-			apic-&gt;cpu_to_logical_apicid(query_cpu), vector,
-			apic-&gt;dest_logical);
+			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
+			vector, apic-&gt;dest_logical);
 	local_irq_restore(flags);
 }
 
@@ -92,8 +92,8 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 		if (query_cpu == this_cpu)
 			continue;
 		__default_send_IPI_dest_field(
-			apic-&gt;cpu_to_logical_apicid(query_cpu), ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Implement x86_32_early_logical_apicid() for the default apic flat
routing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/kernel/apic/probe_32.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 40be7c3..0f9a9ab 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -77,6 +77,11 @@ void __init default_setup_apic_routing(void)
 		apic-&gt;setup_apic_routing();
 }
 
+static int default_x86_32_early_logical_apicid(int cpu)
+{
+	return 1 &lt;&lt; cpu;
+}
+
 static void setup_apic_flat_routing(void)
 {
 #ifdef CONFIG_X86_IO_APIC
@@ -167,7 +172,7 @@ struct apic apic_default = {
 	.wait_icr_idle			= native_apic_wait_icr_idle,
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
-	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
+	.x86_32_early_logical_apicid	= default_x86_32_early_logical_apicid,
 };
 
 extern struct apic apic_numaq;
-- 
1.7.1

--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

On x86_32, the mapping between cpu and logical apic ID differs
depending on the specific apic implementation in use.  The mapping is
initialized while bringing up CPUs; however, this makes early inits
ignore memory topology.

Add a x86_32 specific apic-&gt;x86_32_early_logical_apicid() which is
called early during boot to query the mapping.  The mapping is later
verified against the result of init_apic_ldr().  The method is allowed
to return BAD_APICID if it can't be determined early.

noop variant which always returns BAD_APICID is implemented and added
to all x86_32 apic implementations.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/include/asm/apic.h      |   19 +++++++++++++++++++
 arch/x86/kernel/apic/apic.c      |   12 ++++++++++--
 arch/x86/kernel/apic/apic_noop.c |    4 ++++
 arch/x86/kernel/apic/bigsmp_32.c |    2 ++
 arch/x86/kernel/apic/es7000_32.c |    4 ++++
 arch/x86/kernel/apic/numaq_32.c  |    2 ++
 arch/x86/kernel/apic/probe_32.c  |    2 ++
 arch/x86/kernel/apic/summit_32.c |    2 ++
 8 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index d1aa0c3..efb073b 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -354,6 +354,20 @@ struct apic {
 	void (*icr_write)(u32 low, u32 high);
 	void (*wait_icr_idle)(void);
 	u32 (*safe_wait_icr_idle)(void);
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Called very early during boot from get_smp_config().  It should
+	 * return the logical apicid.  x86_[bios]_cpu_to_apicid is
+	 * initialized before this function is called.
+	 *
+	 * If logical apicid can't be determined that early, the function
+	 * may return BAD_APICID.  Logical apicid will be configured after
+	 * init_apic_ldr() while bringing up CPUs.  Note that NUMA affinity
+	 * won't be applied properly during early boot in this case.
+	 */
+	int (*x86_32_early_logical_apicid)(int cpu);
+#endif
 };
 
 /*
@@ -501,6 +515,11 @@ extern struct apic apic_noop;
 
 ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

The mapping between cpu/apicid and node is done via apicid_to_node[]
on 64bit and apicid_2_node[] + apic-&gt;x86_32_numa_cpu_node() on 32bit.
This difference makes it difficult to further unify 32 and 64bit NUMA
handling.

This patch unifies it by replacing both apicid_to_node[] and
apicid_2_node[] with __apicid_to_node[] array, which is accessed by
two accessors - set_apicid_to_node() and numa_cpu_node().  On 64bit,
numa_cpu_node() always consults __apicid_to_node[] directly while
32bit goes through apic-&gt;numa_cpu_node() method to allow apic
implementations to override it.

srat_detect_node() for amd cpus contains workaround for broken NUMA
configuration which assumes relationship between APIC ID, HT node ID
and NUMA topology.  Leave it to access __apicid_to_node[] directly as
mapping through CPU might result in undesirable behavior change.  The
comment is reformatted and updated to note the ugliness.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/include/asm/mpspec.h  |    1 -
 arch/x86/include/asm/numa.h    |   28 +++++++++++++++++++++++
 arch/x86/include/asm/numa_32.h |    6 +++++
 arch/x86/include/asm/numa_64.h |    5 +--
 arch/x86/kernel/acpi/boot.c    |    3 +-
 arch/x86/kernel/apic/apic.c    |    2 +-
 arch/x86/kernel/cpu/amd.c      |   47 +++++++++++++++++++++++++--------------
 arch/x86/kernel/cpu/intel.c    |    3 +-
 arch/x86/kernel/smpboot.c      |    6 +----
 arch/x86/mm/amdtopology_64.c   |    2 +-
 arch/x86/mm/numa.c             |    6 ++++-
 arch/x86/mm/numa_32.c          |    6 +++++
 arch/x86/mm/numa_64.c          |   18 +++++++-------
 arch/x86/mm/srat_32.c          |    2 +-
 arch/x86/mm/srat_64.c          |   10 ++++----
 15 files changed, 97 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 018ffc1..ae78732 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -24,7 +24,6 @@ extern int pic_mode;
 ...
From: David Rientjes
Date: Tuesday, December 28, 2010 - 1:35 pm

This is going to conflict with a387e95a (&quot;&quot;) in x86/numa, so you'll need 
the following hunk for acpi_fake_nodes().  I'm not sure why this patchset 
is being based on x86/apic-cleanup rather than x86/numa?

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -511,7 +511,7 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes
                 * node, it must now point to the fake node ID.
                 */
                for (j = 0; j &lt; MAX_LOCAL_APIC; j++)
-                       if (apicid_to_node[j] == nid &amp;&amp;
+                       if (__apicid_to_node[j] == nid &amp;&amp;
                            fake_apicid_to_node[j] == NUMA_NO_NODE)
                                fake_apicid_to_node[j] = i;
        }
@@ -522,13 +522,13 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nod
         * value.
         */
        for (i = 0; i &lt; MAX_LOCAL_APIC; i++)
-               if (apicid_to_node[i] != NUMA_NO_NODE &amp;&amp;
+               if (__apicid_to_node[i] != NUMA_NO_NODE &amp;&amp;
                    fake_apicid_to_node[i] == NUMA_NO_NODE)
                        fake_apicid_to_node[i] = 0;
 
        for (i = 0; i &lt; num_nodes; i++)
                __acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
-       memcpy(apicid_to_node, fake_apicid_to_node, sizeof(apicid_to_node));
+       memcpy(__apicid_to_node, fake_apicid_to_node, sizeof(__apicid_to_node));
 
        nodes_clear(nodes_parsed);
        for (i = 0; i &lt; num_nodes; i++)
--

From: Tejun Heo
Date: Wednesday, December 29, 2010 - 3:52 am

Because several patches from the patchset have already been committed
into x86/apic-cleanup.  Thomas, Peter, if this needs to be rebased
somewhere, let me know.

Thanks.

-- 
tejun
--

From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 12:36 pm

x86/numa is already a dependent branch (on x86/amd-nb), so I'm fine
merging x86/apic-cleanup into that branch and then if you could rebase
it on top of the new x86/numa then I'll push it out tomorrow.

OK?

	-hpa
--

From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 3:05 pm

I have pushed out this merge onto x86/numa, so waiting for your rebase.
 If you don't have time, then I'll rebase.

	-hpa
--

From: Tejun Heo
Date: Thursday, December 30, 2010 - 4:33 am

Will rebase &amp; repost soon.

Thanks.

-- 
tejun
--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for CPU
-&gt; NUMA node mapping.  Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.

* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.

* x86_cpu_to_node_map and numa_set/clear_node() are moved from numa_64
  to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized with
  0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
  behavior change and will be updated once init path is unified.

* srat_detect_node() is now enabled for x86_32 too.  It calls
  numa_set_node() and initializes the mapping making explicit
  cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/Kconfig                |    2 +-
 arch/x86/include/asm/numa.h     |    8 ++++
 arch/x86/include/asm/numa_64.h  |    4 --
 arch/x86/include/asm/topology.h |   17 ---------
 arch/x86/kernel/acpi/boot.c     |    5 ---
 arch/x86/kernel/cpu/amd.c       |    4 +-
 arch/x86/kernel/cpu/intel.c     |    2 +-
 arch/x86/kernel/setup_percpu.c  |    4 +-
 arch/x86/kernel/smpboot.c       |    6 ---
 arch/x86/mm/numa.c              |   72 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/mm/numa_64.c           |   65 -----------------------------------
 11 files changed, 85 insertions(+), 104 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 58ecad5..8d9dd6c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1707,7 +1707,7 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	depends on NUMA
 
 config USE_PERCPU_NUMA_NODE_ID
-	def_bool X86_64
+	def_bool y
 	depends on NUMA
 
 menu &quot;Power management and ACPI options&quot;
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 5e01c76..2b21fff 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -30,4 +30,12 @@ static inline void set_apicid_to_node(int apicid, s16 node)
 # include &quot;numa_64.h&quot;
 #endif
 
+#ifdef ...
From: David Rientjes
Date: Tuesday, December 28, 2010 - 1:39 pm

This is also going to conflict with c1c3443c (&quot;x86, numa: Fake 
node-to-cpumask for NUMA emulation&quot;) in x86/numa, but it's trival to merge 
because there's no actual implementation change.
--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Now that everything else is unified, NUMA initialization can be
unified too.

* numa_init_array() and init_cpu_to_node() are moved from numa_64 to
  numa.

* numa_32::initmem_init() is updated to call numa_init_array() and
  setup_arch() to call init_cpu_to_node() on 32bit too.

* x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on 32bit too.
  This is safe now as numa_init_array() will initialize it early
  during boot.

This makes NUMA mapping fully initialized before setup_per_cpu_areas()
on 32bit too and thus makes the first percpu chunk which contains all
the static variables and some of dynamic area allocated with NUMA
affinity correctly considered.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/include/asm/numa.h    |    4 ++
 arch/x86/include/asm/numa_64.h |    3 --
 arch/x86/kernel/setup.c        |    2 -
 arch/x86/mm/numa.c             |   76 +++++++++++++++++++++++++++++++++++++--
 arch/x86/mm/numa_32.c          |    1 +
 arch/x86/mm/numa_64.c          |   75 ---------------------------------------
 6 files changed, 77 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 46f3e0d..927e12f 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -34,6 +34,8 @@ static inline void set_apicid_to_node(int apicid, s16 node)
 #ifdef CONFIG_NUMA
 extern void __cpuinit numa_set_node(int cpu, int node);
 extern void __cpuinit numa_clear_node(int cpu);
+extern void __init numa_init_array(void);
+extern void __init init_cpu_to_node(void);
 
 # ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern void __cpuinit numa_add_cpu(int cpu);
@@ -52,6 +54,8 @@ static inline void __cpuinit numa_remove_cpu(int cpu)
 #else	/* CONFIG_NUMA */
 static inline void numa_set_node(int cpu, int node)	{ }
 static inline void numa_clear_node(int cpu)		{ }
+static inline void numa_init_array(void)		{ }
+static ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

x86_32 has been managing node_to_cpumask_map explicitly from
map_cpu_to_node() and friends in a rather ugly way.  With previous
changes, it's now possible to simply share the code with 64bit.

* numa_add/remove_cpu() are moved from numa_64 to numa.

* identify_cpu() now calls numa_add_cpu() for 32bit too.  This makes
  the explicit mask management from map_cpu_to_node() unnecessary.

* The whole x86_32 specific map_cpu_to_node() chunk is no longer
  necessary.  Dropped.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/include/asm/numa.h    |   18 +++++++++++++
 arch/x86/include/asm/numa_32.h |    1 -
 arch/x86/include/asm/numa_64.h |    4 ---
 arch/x86/kernel/cpu/common.c   |    2 +-
 arch/x86/kernel/smpboot.c      |   47 ----------------------------------
 arch/x86/mm/numa.c             |   33 ++++++++++++++++++++++++
 arch/x86/mm/numa_64.c          |   55 ----------------------------------------
 7 files changed, 52 insertions(+), 108 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 2b21fff..46f3e0d 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_NUMA_H
 #define _ASM_X86_NUMA_H
 
+#include &lt;asm/topology.h&gt;
 #include &lt;asm/apicdef.h&gt;
 
 #ifdef CONFIG_NUMA
@@ -33,9 +34,26 @@ static inline void set_apicid_to_node(int apicid, s16 node)
 #ifdef CONFIG_NUMA
 extern void __cpuinit numa_set_node(int cpu, int node);
 extern void __cpuinit numa_clear_node(int cpu);
+
+# ifdef CONFIG_DEBUG_PER_CPU_MAPS
+extern void __cpuinit numa_add_cpu(int cpu);
+extern void __cpuinit numa_remove_cpu(int cpu);
+# else	/* CONFIG_DEBUG_PER_CPU_MAPS */
+static inline void __cpuinit numa_add_cpu(int cpu)
+{
+	cpumask_set_cpu(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+}
+static inline void __cpuinit numa_remove_cpu(int cpu)
+{
+	cpumask_clear_cpu(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+}
+# endif	/* ...
From: David Rientjes
Date: Tuesday, December 28, 2010 - 1:43 pm

This has been almost entirely rewritten in x86/numa by c1c3443c (&quot;x86, 
numa: Fake node-to-cpumask for NUMA emulation&quot;) and the pending fix 
http://marc.info/?l=linux-kernel&amp;m=129340072128297 (&quot;x86, numa: Fix 
CONFIG_DEBUG_PER_CPU_MAPS without NUMA&quot;).  Since this is only moving those 
functions to numa.c, it should be trivial to fix once based on x86/numa.
--

From: Tejun Heo
Date: Thursday, December 30, 2010 - 5:48 am

Sigh, the new NUMA emulation thing is 64bit only. :-( I'll see if it
can be applied to 32bit too.  BTW, how does this interact with the
Shaohui's patchset.  Isn't that about NUMA emulation too?  Are these
the same patches?

Thanks.

-- 
tejun
--

From: Tejun Heo
Date: Thursday, December 30, 2010 - 7:58 am

Hello, again.


Okay, after going through the changes, here are some thoughts FWIW.

I'm doubtful the direction was the correct one.  NUMA emulation,
almost by definition, doesn't vary too much depending on the
underlying hardware and how it's configured.  All that's necessary is
the physical node map an apicid to nid mapping, which can and probably
should be represented in common form regardless of emulation and the
emulation code should simply manipulate the parsed information.

The original implementation was 64bit apic specific.  If it needed to
be extended to cover amd topology, the right way would be updating the
apic and amd numa code so that they generate the information in a
uniform way and NUMA emulation would follow naturally.  Instead, it
looks like what happened was simply adding amd specific code with more
gluing.  The unnecessary differences between acpi and amd paths sure
aren't your fault but I really don't think we should be moving toward
that direction when things can (and definitely should) be unified with
a bit more effort.  NUMA code already is unnecessary hairy with
32/64bit and different subarchs somewhat unified but subtly differing
in many places without clear indications.

That said, Shaohui is already working on unifying the emulation code,
so I'll just do another quick dirty #ifdef'ing for the affected part.

Thank you.

-- 
tejun
--

From: David Rientjes
Date: Thursday, December 30, 2010 - 11:40 am

Shaohui's patchset is for node hotplug emulation, not for NUMA emulation 

And pxm mappings since node_distance() must reflect the physical topology 
on the emulated environment.  We have specialized callbacks in the amd and 
ACPI code to ascertain the physical topology because its discovery is 
handled differently, either with an implied set of possible nodes in the 
amd case and the statically-allocated nodes_parsed in the ACPI case, prior 

You could export an apicid-to-node and pxm-to-node mapping from both amd 
and ACPI code to do the actual remap, but it would still require seperate 
functions, amd_fake_nodes() and acpi_fake_nodes(), to determine what the 
mapping should be.  Instead of adding an additional layer of indirection, 
it's currently done in a single function depending on whether amd or acpi 

I agree that unification is in our best interest but the patches as they 
sit in x86/numa right now actually fix real bugs when using numa=fake on 
the command line and adding i386 NUMA emulation is an additional feature 
(and the unification that can be done in the meantime would only be the 
actual apicid-to-node and pxm-to-node mappings after it is exported from 
amd and acpi specific functions), so I'd prefer to handle cleanups as 
incremental patches on top of those.
--

From: Tejun Heo
Date: Friday, December 31, 2010 - 6:20 am

Hello,


I see.  Yeah, I should have actually read the patchset before talking

I'm not really arguing against you and as said before the patches were
fine given the current state of the code, but I still want to point
out that it's through these mostly innocent series of changes which
build upon existing complications which eventually lead to
unsustainable pile of mess.  No one is particularly wrong but then

The problem is that those &quot;incremental&quot; patches often don't happen
once the itch is gone while the immediate fixes/improvements aggravate
existing complications.  If you were/are planning to clean it up, I
have nothing to whine about.  :-)

Thanks.

-- 
tejun
--

From: David Rientjes
Date: Friday, December 31, 2010 - 4:09 pm

We can certainly implement i386 NUMA emulation after the unification is 
done (only to avoid additional conflicts on your set :), but I'm not sure 
if there's an audience of testers and debuggers who want to use it for 
i386 NUMA; there hasn't been one in the past when people like Magnus Damm 
proposed it about four years ago.  Agreed that it's probably going to be 
simpler after your unification patchset is merged, but it seems like we're 
in the unique position where we're extending a feature for an architecture 
because it's simple and makes things cleaner in the code rather than being 
actually useful.
--

From: David Rientjes
Date: Thursday, December 30, 2010 - 11:43 am

There's nothing new about NUMA emulation on x86_64, it's existed for well 
over four years (we've been using it at Google since 2.6.18, see 
Documentation/x86/x86_64/fake-numa-for-cpusets).  Shaohui's patches are 
completely seperate and add hotplug emulation.  You _could_ boot with a 
very minimal amount of memory and then hotplug all the emulated nodes like 
numa=fake does, but you may find it difficult to parse the e820 and 
interleave the nodes appropriately over the set of physical nodes in 
userspace.
--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

apic-&gt;apicid_to_node() is 32bit specific apic operation which
determines NUMA node for a CPU.  Depending on the APIC implementation,
it can be easier to determine NUMA node from either physical or
logical apicid.  Currently, -&gt;apicid_to_node() takes @logical_apicid
and calls hard_smp_processor_id() if the physical apicid is needed.

This prevents NUMA mapping from being queried from a different CPU,
which in turn makes it impossible to initialize NUMA mapping before
SMP bringup.

This patch replaces apic-&gt;apicid_to_node() with
-&gt;x86_32_numa_cpu_node() which takes @cpu, from which both logical and
physical apicids can easily be determined.  While at it, drop
duplicate implementations from bigsmp_32 and summit_32, and use the
default one.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/include/asm/apic.h           |    6 ++++--
 arch/x86/kernel/apic/apic.c           |   10 +++++++---
 arch/x86/kernel/apic/apic_flat_64.c   |    2 --
 arch/x86/kernel/apic/apic_noop.c      |   16 +++++++++-------
 arch/x86/kernel/apic/bigsmp_32.c      |    7 +------
 arch/x86/kernel/apic/es7000_32.c      |    7 +++----
 arch/x86/kernel/apic/numaq_32.c       |   11 ++++++++++-
 arch/x86/kernel/apic/probe_32.c       |    2 +-
 arch/x86/kernel/apic/summit_32.c      |   11 +----------
 arch/x86/kernel/apic/x2apic_cluster.c |    1 -
 arch/x86/kernel/apic/x2apic_phys.c    |    1 -
 arch/x86/kernel/apic/x2apic_uv_x.c    |    1 -
 arch/x86/kernel/smpboot.c             |    3 +--
 13 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index efb073b..ad30ca4 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -306,7 +306,6 @@ struct apic {
 
 	void (*setup_apic_routing)(void);
 	int (*multi_timer_check)(int apic, int irq);
-	int (*apicid_to_node)(int logical_apicid);
 	int (*cpu_present_to_apicid)(int mps_cpu);
 	void (*apicid_to_cpu_present)(int ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/kernel/apic/es7000_32.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index 0ffc1ec..5c53d05 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -460,6 +460,12 @@ static unsigned long es7000_check_apicid_present(int bit)
 	return physid_isset(bit, phys_cpu_present_map);
 }
 
+static int es7000_early_logical_apicid(int cpu)
+{
+	/* on es7000, logical apicid is the same as physical */
+	return early_per_cpu(x86_bios_cpu_apicid, cpu);
+}
+
 static unsigned long calculate_ldr(int cpu)
 {
 	unsigned long id = per_cpu(x86_bios_cpu_apicid, cpu);
@@ -683,7 +689,7 @@ struct apic __refdata apic_es7000_cluster = {
 	.wait_icr_idle			= native_apic_wait_icr_idle,
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
-	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
+	.x86_32_early_logical_apicid	= es7000_early_logical_apicid,
 };
 
 struct apic __refdata apic_es7000 = {
@@ -747,5 +753,5 @@ struct apic __refdata apic_es7000 = {
 	.wait_icr_idle			= native_apic_wait_icr_idle,
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
-	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
+	.x86_32_early_logical_apicid	= es7000_early_logical_apicid,
 };
-- 
1.7.1

--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Factor out logical apic id calculation from summit_init_apic_ldr() and
use it for the x86_32_early_logical_apicid() callback.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/kernel/apic/summit_32.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/summit_32.c b/arch/x86/kernel/apic/summit_32.c
index 172c498..8c91473 100644
--- a/arch/x86/kernel/apic/summit_32.c
+++ b/arch/x86/kernel/apic/summit_32.c
@@ -194,11 +194,10 @@ static unsigned long summit_check_apicid_present(int bit)
 	return 1;
 }
 
-static void summit_init_apic_ldr(void)
+static int summit_early_logical_apicid(int cpu)
 {
-	unsigned long val, id;
 	int count = 0;
-	u8 my_id = (u8)hard_smp_processor_id();
+	u8 my_id = early_per_cpu(x86_cpu_to_apicid, cpu);
 	u8 my_cluster = APIC_CLUSTER(my_id);
 #ifdef CONFIG_SMP
 	u8 lid;
@@ -214,7 +213,15 @@ static void summit_init_apic_ldr(void)
 	/* We only have a 4 wide bitmap in cluster mode.  If a deranged
 	 * BIOS puts 5 CPUs in one APIC cluster, we're hosed. */
 	BUG_ON(count &gt;= XAPIC_DEST_CPUS_SHIFT);
-	id = my_cluster | (1UL &lt;&lt; count);
+	return my_cluster | (1UL &lt;&lt; count);
+}
+
+static void summit_init_apic_ldr(void)
+{
+	int cpu = smp_processor_id();
+	unsigned long id = early_per_cpu(x86_cpu_to_logical_apicid, cpu);
+	unsigned long val;
+
 	apic_write(APIC_DFR, SUMMIT_APIC_DFR_VALUE);
 	val = apic_read(APIC_LDR) &amp; ~APIC_LDR_MASK;
 	val |= SET_APIC_LOGICAL_ID(id);
@@ -553,5 +560,5 @@ struct apic apic_summit = {
 	.wait_icr_idle			= native_apic_wait_icr_idle,
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
-	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
+	.x86_32_early_logical_apicid	= summit_early_logical_apicid,
 };
-- 
1.7.1

--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/kernel/apic/bigsmp_32.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index dd32a9b..bc7ed04 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -45,6 +45,12 @@ static unsigned long bigsmp_check_apicid_present(int bit)
 	return 1;
 }
 
+static int bigsmp_early_logical_apicid(int cpu)
+{
+	/* on bigsmp, logical apicid is the same as physical */
+	return early_per_cpu(x86_cpu_to_apicid, cpu);
+}
+
 static inline unsigned long calculate_ldr(int cpu)
 {
 	unsigned long val, id;
@@ -252,5 +258,5 @@ struct apic apic_bigsmp = {
 	.wait_icr_idle			= native_apic_wait_icr_idle,
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
-	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
+	.x86_32_early_logical_apicid	= bigsmp_early_logical_apicid,
 };
-- 
1.7.1

--

From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

After the previous patch, apic-&gt;cpu_to_logical_apicid() is no longer
used.  Kill it.

For apic types with custom cpu_to_logical_apicid() which is also used
for other purposes, remove the function and modify its users to do the
mapping directly.

#ifdef's on CONFIG_SMP in es7000_32 and summit_32 are ignored during
conversion as they are not used for UP kernels.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
---
 arch/x86/include/asm/apic.h           |    7 -------
 arch/x86/kernel/apic/apic_flat_64.c   |    2 --
 arch/x86/kernel/apic/apic_noop.c      |    6 ------
 arch/x86/kernel/apic/bigsmp_32.c      |   19 +++++++------------
 arch/x86/kernel/apic/es7000_32.c      |   18 ++----------------
 arch/x86/kernel/apic/numaq_32.c       |    8 --------
 arch/x86/kernel/apic/probe_32.c       |    1 -
 arch/x86/kernel/apic/summit_32.c      |   17 ++---------------
 arch/x86/kernel/apic/x2apic_cluster.c |    1 -
 arch/x86/kernel/apic/x2apic_phys.c    |    1 -
 arch/x86/kernel/apic/x2apic_uv_x.c    |    1 -
 11 files changed, 11 insertions(+), 70 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index eb139ec..d1aa0c3 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -307,7 +307,6 @@ struct apic {
 	void (*setup_apic_routing)(void);
 	int (*multi_timer_check)(int apic, int irq);
 	int (*apicid_to_node)(int logical_apicid);
-	int (*cpu_to_logical_apicid)(int cpu);
 	int (*cpu_present_to_apicid)(int mps_cpu);
 	void (*apicid_to_cpu_present)(int phys_apicid, physid_mask_t *retmap);
 	void (*setup_portio_remap)(void);
@@ -557,12 +556,6 @@ static inline void default_ioapic_phys_id_map(physid_mask_t *phys_map, physid_ma
 	*retmap = *phys_map;
 }
 
-/* Mapping from cpu number to logical apicid */
-static inline int default_cpu_to_logical_apicid(int cpu)
-{
-	return 1 &lt;&lt; cpu;
-}
-
 static inline int __default_cpu_present_to_apicid(int mps_cpu)
 {
 	if (mps_cpu &lt; nr_cpu_ids &amp;&amp; cpu_present(mps_cpu))
diff --git ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Both functions are used only in 32bit.  Put them inside CONFIG_X86_32.
This is to prepare for logical apicid handling update.

- Cyrill Gorcunov spotted that I forgot to move declarations in ipi.h
  under CONFIG_X86_32.  Fixed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
Reviewed-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
---
 arch/x86/include/asm/ipi.h |    8 ++++----
 arch/x86/kernel/apic/ipi.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/ipi.h b/arch/x86/include/asm/ipi.h
index 0b72282..615fa90 100644
--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -123,10 +123,6 @@ extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask,
 						 int vector);
 extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 							 int vector);
-extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
-							 int vector);
-extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
-							 int vector);
 
 /* Avoid include hell */
 #define NMI_VECTOR 0x02
@@ -150,6 +146,10 @@ static inline void __default_local_send_IPI_all(int vector)
 }
 
 #ifdef CONFIG_X86_32
+extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
+							 int vector);
+extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
+							 int vector);
 extern void default_send_IPI_mask_logical(const struct cpumask *mask,
 						 int vector);
 extern void default_send_IPI_allbutself(int vector);
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 08385e0..5037736 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -56,6 +56,8 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_X86_32
+
 void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 ...
From: Tejun Heo
Date: Tuesday, December 28, 2010 - 4:48 am

Replace x86_32 MAX_APICID in include/asm/mpspec.h with MAX_LOCAL_APIC
in include/asm/apicdef.h to make it consistent with x86_64.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Pekka Enberg &lt;penberg@kernel.org&gt;
---
 arch/x86/include/asm/apicdef.h |    1 +
 arch/x86/include/asm/mpspec.h  |    2 --
 arch/x86/kernel/smpboot.c      |    2 +-
 arch/x86/mm/srat_32.c          |    4 ++--
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index a859ca4..47a30ff 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -145,6 +145,7 @@
 
 #ifdef CONFIG_X86_32
 # define MAX_IO_APICS 64
+# define MAX_LOCAL_APIC 256
 #else
 # define MAX_IO_APICS 128
 # define MAX_LOCAL_APIC 32768
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index c82868e..018ffc1 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -32,8 +32,6 @@ extern int mp_bus_id_to_local[MAX_MP_BUSSES];
 extern int quad_local_to_mp_bus_id [NR_CPUS/4][4];
 #endif
 
-#define MAX_APICID		256
-
 #else /* CONFIG_X86_64: */
 
 #define MAX_MP_BUSSES		256
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4de6a00..0b32f17 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -72,7 +72,7 @@
 #include &lt;asm/i8259.h&gt;
 
 #ifdef CONFIG_X86_32
-u8 apicid_2_node[MAX_APICID];
+u8 apicid_2_node[MAX_LOCAL_APIC];
 #endif
 
 /* State of each CPU */
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index a17dffd..e55e748 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -57,7 +57,7 @@ struct node_memory_chunk_s {
 static struct node_memory_chunk_s __initdata node_memory_chunk[MAXCHUNKS];
 
 static int __initdata num_memory_chunks; /* total number of memory chunks */
-static u8 __initdata apicid_to_pxm[MAX_APICID];
+static u8 __initdata apicid_to_pxm[MAX_LOCAL_APIC];
 
 int numa_off __initdata;
 ...
Previous thread: [RFC][PATCH v2 0/6] eCryptfs: added support for the encrypted key type by Roberto Sassu on Tuesday, December 28, 2010 - 3:48 am. (6 messages)

Next thread: Drift when measuring time using add_timer by Mohan V on Tuesday, December 28, 2010 - 5:05 am. (5 messages)