[patch] x86: fix fake apicid to node mapping for numa emulation

Previous thread: RE: [RFC] Exposing TSC "reliability" to userland by Dan Magenheimer on Tuesday, May 4, 2010 - 4:51 pm. (1 message)

Next thread: [PATCH 0/2] perf: Redesign trace events reordering by Frederic Weisbecker on Tuesday, May 4, 2010 - 5:03 pm. (6 messages)
From: David Rientjes
Date: Tuesday, May 4, 2010 - 5:00 pm

apicids must be mapped to the lowest node ids to maintain generic kernel
use of functions such as cpu_to_node() that determine device affinity.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Third resend of the same patch.

 arch/x86/mm/srat_64.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -461,7 +461,8 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 		 * node, it must now point to the fake node ID.
 		 */
 		for (j = 0; j < MAX_LOCAL_APIC; j++)
-			if (apicid_to_node[j] == nid)
+			if (apicid_to_node[j] == nid &&
+			    fake_apicid_to_node[j] == NUMA_NO_NODE)
 				fake_apicid_to_node[j] = i;
 	}
 	for (i = 0; i < num_nodes; i++)
--

From: Ingo Molnar
Date: Thursday, May 6, 2010 - 12:07 am

There's no info in the changelog about what negative effects this bug had when 
it was found, on what hardware it occured, and what the general urgency of the 
patch is.

Thanks,

	Ingo
--

From: David Rientjes
Date: Thursday, May 6, 2010 - 2:24 am

Ah, true.  Given the relative obscurity of using NUMA emulation to being 
with, it would probably benefit from being even more verbose as well.  
I'll rewrite the changelog and reply to this message with it.

Thanks.
--

From: David Rientjes
Date: Thursday, May 6, 2010 - 2:24 am

With NUMA emulation, it's possible for a single cpu to be bound to
multiple nodes since more than one may have affinity if allocated on a
physical node that is local to the cpu.

APIC ids must therefore be mapped to the lowest node ids to maintain
generic kernel use of functions such as cpu_to_node() that determine
device affinity.  For example, if a device has proximity to physical node
1, for instance, and a cpu happens to be mapped to a higher emulated node
id 8, the proximity may not be correctly determined by comparison in
generic code even though the cpu may be truly local and allocated on
physical node 1.  When this happens, the true topology of the machine
isn't accurately represented in the emulated environment; although this
isn't critical to the system's uptime, any generic code that is NUMA
aware benefits from the physical topology being accurated represented.

This can affect any system that maps multiple APIC ids to a single node
and is booted with numa=fake=N where N is greater than the number of
physical nodes.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/mm/srat_64.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -461,7 +461,8 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 		 * node, it must now point to the fake node ID.
 		 */
 		for (j = 0; j < MAX_LOCAL_APIC; j++)
-			if (apicid_to_node[j] == nid)
+			if (apicid_to_node[j] == nid &&
+			    fake_apicid_to_node[j] == NUMA_NO_NODE)
 				fake_apicid_to_node[j] = i;
 	}
 	for (i = 0; i < num_nodes; i++)
--

From: tip-bot for David Rientjes
Date: Thursday, May 6, 2010 - 3:07 am

Commit-ID:  b0c4d952a158a6a2547672cf4fc9d55e415410de
Gitweb:     http://git.kernel.org/tip/b0c4d952a158a6a2547672cf4fc9d55e415410de
Author:     David Rientjes <rientjes@google.com>
AuthorDate: Thu, 6 May 2010 02:24:34 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 6 May 2010 12:02:05 +0200

x86: Fix fake apicid to node mapping for numa emulation

With NUMA emulation, it's possible for a single cpu to be bound
to multiple nodes since more than one may have affinity if
allocated on a physical node that is local to the cpu.

APIC ids must therefore be mapped to the lowest node ids to
maintain generic kernel use of functions such as cpu_to_node()
that determine device affinity.  For example, if a device has
proximity to physical node 1, for instance, and a cpu happens to
be mapped to a higher emulated node id 8, the proximity may not
be correctly determined by comparison in generic code even
though the cpu may be truly local and allocated on physical node 1.

When this happens, the true topology of the machine isn't
accurately represented in the emulated environment; although
this isn't critical to the system's uptime, any generic code
that is NUMA aware benefits from the physical topology being
accurately represented.

This can affect any system that maps multiple APIC ids to a
single node and is booted with numa=fake=N where N is greater
than the number of physical nodes.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.DEB.2.00.1005060224140.19473@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/srat_64.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 28c6876..38512d0 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -461,7 +461,8 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 ...
From: Ingo Molnar
Date: Thursday, May 6, 2010 - 3:01 am

Applied to tip:x86/urgent, thanks David!

	Ingo
--

Previous thread: RE: [RFC] Exposing TSC "reliability" to userland by Dan Magenheimer on Tuesday, May 4, 2010 - 4:51 pm. (1 message)

Next thread: [PATCH 0/2] perf: Redesign trace events reordering by Frederic Weisbecker on Tuesday, May 4, 2010 - 5:03 pm. (6 messages)