Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes. These nodes can be used in conjunction with cpusets for
coarse memory resource management.The old command-line option is still supported:
numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the
actual machine.But now you may configure your system for the node sizes of your choice:
numa=fake=2*512,1024,2*256
gives two 512M nodes, one 1024M node, two 256M nodes, and
the rest of system memory to a sixth node.Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
---
Documentation/x86_64/boot-options.txt | 9 +-
arch/x86_64/mm/numa.c | 249 +++++++++++++++++++--------------
include/asm-x86_64/mmzone.h | 2 +-
3 files changed, 150 insertions(+), 110 deletions(-)diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 625a21d..6ccdb5e 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -149,7 +149,14 @@ NUMAnuma=noacpi Don't parse the SRAT table for NUMA setup
- numa=fake=X Fake X nodes and ignore NUMA setup of the actual machine.
+ numa=fake=CMDLINE
+ If a number, fakes CMDLINE nodes and ignores NUMA setup of the
+ actual machine. Otherwise, system memory is configured
+ depending on the sizes and coefficients listed. For example:
+ numa=fake=2*512,1024,4*256
+ gives two 512M nodes, a 1024M node, and four 256M nodes. If
+ the last character of CMDLINE is a comma, the remaining system
+ memory is not allocated to an additional node.numa=hotadd=percent
Only allow hotadd memory to preallocate page structures upto
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 9ff3141..0417921 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -276,125 +276,160 @@ void __init numa_init_array(void)#ifdef CONFIG_NUMA_EMU
/* Numa emulation */
-...
Extends the numa=fake x86_64 command-line option to split the remaining
system memory into equal-sized nodes.For example:
numa=fake=2*512,4* gives two 512M nodes and the remaining system
memory is split into four approximately equal
chunks.Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
---
Documentation/x86_64/boot-options.txt | 4 +++-
arch/x86_64/mm/numa.c | 24 +++++++++++++++++++-----
2 files changed, 22 insertions(+), 6 deletions(-)diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 6ccdb5e..0721416 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -155,7 +155,9 @@ NUMA
depending on the sizes and coefficients listed. For example:
numa=fake=2*512,1024,4*256
gives two 512M nodes, a 1024M node, and four 256M nodes. If
- the last character of CMDLINE is a comma, the remaining system
+ the last character of CMDLINE is a *, the remaining system
+ memory is divided up equally among its previous coefficient.
+ If the last character is a comma, the remaining system
memory is not allocated to an additional node.numa=hotadd=percent
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 0417921..3344d60 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -415,11 +415,25 @@ static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
done:
if (!num_nodes)
return -1;
- /* Fill remainder of system RAM with a final node, if appropriate. */
- if (addr < max_addr && *(cmdline - 1) != ',') {
- setup_node_range(num_nodes, nodes, &addr, max_addr - addr,
- max_addr);
- num_nodes++;
+ /* Fill remainder of system RAM, if appropriate. */
+ if (addr < max_addr) {
+ switch (*(cmdline - 1)) {
+ case '*':
+ /* Split remaining nodes into coeff chunks */
+ if (coeff <= 0)
+ break;
+ num_nodes += split_n...
Extends the numa=fake x86_64 command-line option to split the remaining
system memory into nodes of fixed size. Any leftover memory is allocated
to a final node unless the command-line ends with a comma.For example:
numa=fake=2*512,*128 gives two 512M nodes and the remaining system
memory is split into nodes of 128M each.Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
---
Documentation/x86_64/boot-options.txt | 15 ++++++----
arch/x86_64/mm/numa.c | 47 ++++++++++++++++++++++++++-------
2 files changed, 46 insertions(+), 16 deletions(-)diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 0721416..9917b9f 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -153,12 +153,15 @@ NUMA
If a number, fakes CMDLINE nodes and ignores NUMA setup of the
actual machine. Otherwise, system memory is configured
depending on the sizes and coefficients listed. For example:
- numa=fake=2*512,1024,4*256
- gives two 512M nodes, a 1024M node, and four 256M nodes. If
- the last character of CMDLINE is a *, the remaining system
- memory is divided up equally among its previous coefficient.
- If the last character is a comma, the remaining system
- memory is not allocated to an additional node.
+ numa=fake=2*512,1024,4*256,*128
+ gives two 512M nodes, a 1024M node, four 256M nodes, and the
+ rest split into 128M chunks. If the last character of CMDLINE
+ is a *, the remaining memory is divided up equally among its
+ coefficient:
+ numa=fake=2*512,2*
+ gives two 512M nodes and the rest split into two nodes. If
+ the last character is a comma, the remaining system memory is
+ not allocated to an additional node.numa=hotadd=percent
Only allow hotadd memory to preallocate page structures upto
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 3344d60..2ee228b 100644
--- a/arc...
That sounds like syntactical vinegar and a nasty trap. Remember
that venus probe that got lost because of a wrong comma.
Can you find some nicer syntax for that please?Also it's pretty complex. Are there use cases for all of this?
-Andi
-
The only other appropriate syntax that comes to mind is perhaps a
command-line that ends with a 0. For example, numa=fake=2*512,0 wouldThere are. Configurable node sizes (i.e. 'numa=fake=512,4*128', etc) are
the major concept and help to avoid the overhead associated with something
like 64 nodes of 64M each on a 4G machine. We've seen some inefficiencies
with scanning through so many zone lists on page_alloc when we encounter a
full node. Additional support such as 'numa=fake=2*512,*128' are used
more for machines where you're unsure of their total system RAM in the
first place but want to make sure you have the node sizes you need.David
-
I agree it's not a good idea to prevent the remaining RAM from being
allocated to an additional node. It was helpful in testing and the
gathering of benchmarks for the purpose of memory management, but not for
real-world cases. It's been removed.David
-
Mark the new numa=fake x86_64 helper functions, setup_node_range(),
split_nodes_equally(), and split_nodes_by_size() as __init.Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
---
arch/x86_64/mm/numa.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 2ee228b..5d8fee6 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -287,8 +287,8 @@ char *cmdline __initdata;
* if there is additional memory left for allocation past addr and -1 otherwise.
* addr is adjusted to be at the end of the node.
*/
-static int setup_node_range(int nid, struct bootnode *nodes, u64 *addr,
- u64 size, u64 max_addr)
+static int __init setup_node_range(int nid, struct bootnode *nodes, u64 *addr,
+ u64 size, u64 max_addr)
{
int ret = 0;
nodes[nid].start = *addr;
@@ -310,8 +310,9 @@ static int setup_node_range(int nid, struct bootnode *nodes, u64 *addr,
* is the number of nodes split up and addr is adjusted to be at the end of the
* last node allocated.
*/
-static int split_nodes_equally(struct bootnode *nodes, u64 *addr, u64 max_addr,
- int node_start, int num_nodes)
+static int __init split_nodes_equally(struct bootnode *nodes, u64 *addr,
+ u64 max_addr, int node_start,
+ int num_nodes)
{
unsigned int big;
u64 size;
@@ -363,8 +364,8 @@ static int split_nodes_equally(struct bootnode *nodes, u64 *addr, u64 max_addr,
* always assigned to a final node and can be asymmetric. Returns the number of
* nodes split.
*/
-static int split_nodes_by_size(struct bootnode *nodes, u64 *addr, u64 max_addr,
- int node_start, u64 size)
+static int __init split_nodes_by_size(struct bootnode *nodes, u64 *addr,
+ u64 max_addr, int node_start, u64 size)
{
int i = node_start;
size = (size << 20) & FAKE_NODE_MIN_HASH_MASK;
-
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Eric Sandeen | Re: [RFC] Heads up on sys_fallocate() |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Antonio Almeida | HTB accuracy for high speed |
