On Sat, 2008-03-08 at 14:14 -0800, David Rientjes wrote:
Good.
I tested this patch atop David's other patches on 2-6.25-rc3-mm1 and I
agree that it fixes the regression with local allocations. It also
preserves existing behavior for MPOL_DEFAULT.
However, [and I'm going to sound like the "Nodemask Nazi" again, sorry]
it does change the behavior of MPOL_PREFERRED when one specifies a
non-empty nodemask with preferred/all node[s] outside of the
mems_allowed mask and neither MPOL_F_{STATIC|RELATIVE}_NODES are
specified.
Existing behavior is to return EINVAL in this case. I had tried to
preserve this behavior with my
"silently-restrict-nodemask-to-allowed-nodes" patch with those ugly
"was_empty" and "is_empty" variables to capture the state of the
nodemask before and after we remove !allowed nodes. I also tried to
preserve this behavior in the fix I had sent to the local alloc
regression, by passing a NULL nodemask pointer to the create op for
localalloc.
With this patch, we silently fall back to local allocation for
MPOL_PREFERRED. This occurs because by the time mpol_new_preferred()
sees the nodemask, we may have masked off all nodes with the
current_mems_allowed mask. I have a proposed patch below to address
this [atop this one].
Another comment in-line below.
Lee
So, the only time we get a NULL 'nodes' pointer to mpol_new() is when we
pass MPOL_DEFAULT to mpol_shared_policy_init(). The syscall entries
always pass a non-null nodemask pointer to an on-stack nodemask_t. I've
added a VM_BUG_ON() to assert this in the patch below, as the rest of
mpol_new() assumes a non-NULL 'nodes'. Maybe should be just 'BUG_ON()'?
Here's a patch, atop the subject patch, that addresses the issues
discussed above. Tested with the numactl regression "suite", my little
MPOL_DEFAULT error return test and a couple of ad hoc memtoy sessions in
and out of restricted cpusets. All passed--once I remembered to exit
the restricted cpuset that didn't include nodes 0 & 1, used by the
regression test...
David: if you agree with this, I can generate a combined patch with
your signoff and mine to replace the previous regression fix which I
believe Andrew has already added to -mm. Or,
Andrew: if you prefer, I can generate a patch atop the one you've
already added to the tree to accomplish the same thing.
Please advise...
Lee
-----------------------------------------------------------------------
Against: 2.6.25-rc3-mm1 atop "mempolicy: disallow static or relative
flags for local preferred allocation - v2"
Restores error checking for MPOL_PREFERRED that specifies only
dis-allowed nodes. Skips cpuset related mempolicy setup for
local allocation.
Also, enforce mpol_new() assumption that 'nodes' arg is non-NULL
except for MPOL_DEFAULT.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/mempolicy.c | 47 +++++++++++++++++++++++++++++++----------------
1 file changed, 31 insertions(+), 16 deletions(-)
Index: linux-2.6.25-rc3-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.25-rc3-mm1.orig/mm/mempolicy.c 2008-03-10 10:25:58.000000000 -0400
+++ linux-2.6.25-rc3-mm1/mm/mempolicy.c 2008-03-10 12:12:19.000000000 -0400
@@ -158,7 +158,9 @@ static int mpol_new_interleave(struct me
static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
{
- pol->v.preferred_node = !nodes_empty(*nodes) ? first_node(*nodes) : -1;
+ if (nodes && nodes_empty(*nodes))
+ return -EINVAL;
+ pol->v.preferred_node = nodes ? first_node(*nodes) : -1;
return 0;
}
@@ -176,6 +178,7 @@ static struct mempolicy *mpol_new(unsign
{
struct mempolicy *policy;
nodemask_t cpuset_context_nmask;
+ int localalloc = 0;
int ret;
pr_debug("setting mode %d flags %d nodes[0] %lx\n",
@@ -186,36 +189,48 @@ static struct mempolicy *mpol_new(unsign
return ERR_PTR(-EINVAL);
return NULL;
}
+ VM_BUG_ON(!nodes);
+
/*
* MPOL_PREFERRED cannot be used with MPOL_F_STATIC_NODES or
* MPOL_F_RELATIVE_NODES if the nodemask is empty (local allocation).
* All other modes require a valid pointer to a non-empty nodemask.
*/
if (mode == MPOL_PREFERRED) {
- if (nodes_empty(*nodes) && ((flags & MPOL_F_STATIC_NODES) ||
- (flags & MPOL_F_RELATIVE_NODES)))
- return ERR_PTR(-EINVAL);
+ if (nodes_empty(*nodes)) {
+ if (((flags & MPOL_F_STATIC_NODES) ||
+ (flags & MPOL_F_RELATIVE_NODES)))
+ return ERR_PTR(-EINVAL);
+ localalloc++;
+ }
} else if (nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
if (!policy)
return ERR_PTR(-ENOMEM);
atomic_set(&policy->refcnt, 1);
- cpuset_update_task_memory_state();
- if (flags & MPOL_F_RELATIVE_NODES)
- mpol_relative_nodemask(&cpuset_context_nmask, nodes,
- &cpuset_current_mems_allowed);
- else
- nodes_and(cpuset_context_nmask, *nodes,
- cpuset_current_mems_allowed);
policy->policy = mode;
policy->flags = flags;
- if (mpol_store_user_nodemask(policy))
- policy->w.user_nodemask = *nodes;
- else
- policy->w.cpuset_mems_allowed = cpuset_mems_allowed(current);
+ if (!localalloc) {
+ /*
+ * cpuset related setup doesn't apply to local allocation
+ */
+ cpuset_update_task_memory_state();
+ if (flags & MPOL_F_RELATIVE_NODES)
+ mpol_relative_nodemask(&cpuset_context_nmask, nodes,
+ &cpuset_current_mems_allowed);
+ else
+ nodes_and(cpuset_context_nmask, *nodes,
+ cpuset_current_mems_allowed);
+ if (mpol_store_user_nodemask(policy))
+ policy->w.user_nodemask = *nodes;
+ else
+ policy->w.cpuset_mems_allowed =
+ cpuset_mems_allowed(current);
+ }
- ret = mpol_ops[mode].create(policy, &cpuset_context_nmask);
+ ret = mpol_ops[mode].create(policy,
+ localalloc ? NULL : &cpuset_context_nmask);
if (ret < 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(ret);
--