MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for MPOL_PREFERRED policies that were created with an empty nodemask (for purely local allocations). They'll never be invalidated because the allowed mems of a task changes or need to be rebound relative to a cpuset's placement. Also fixes a bug identified by Lee Schermerhorn that disallowed empty nodemasks to be passed to MPOL_PREFERRED to specify local allocations. Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Rientjes <rientjes@google.com> --- Documentation/vm/numa_memory_policy.txt | 16 ++++++++++++++-- mm/mempolicy.c | 17 ++++++++++++----- 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -210,6 +210,12 @@ Components of Memory Policies local allocation for a specific range of addresses--i.e. for VMA policies. + It is possible for the user to specify that local allocation is + always preferred by passing an empty nodemask with this mode. + If an empty nodemask is passed, the policy cannot use the + MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described + below. + MPOL_INTERLEAVED: This mode specifies that page allocations be interleaved, on a page granularity, across the nodes specified in the policy. This mode also behaves slightly differently, based on @@ -259,7 +265,10 @@ Components of Memory Policies occurs over that node. If no nodes from the user's nodemask are now allowed, the Default behavior is used. - MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES. + MPOL_F_STATIC_NODES cannot be combined with the + ...
default_policy is defined in mm/mempolicy.c as the default system policy:
struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1),
.policy = MPOL_DEFAULT,
};
So instead of checking for comparisons against a mempolicy's mode to
MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
we should use the mode that is defined in this struct.
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/mempolicy.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -185,7 +185,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
return ERR_PTR(-EINVAL);
} else if (nodes && nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
- if (mode == MPOL_DEFAULT)
+ if (mode == default_policy.policy)
return NULL;
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
if (!policy)
@@ -1232,7 +1232,7 @@ static struct mempolicy * get_vma_policy(struct task_struct *task,
pol = vma->vm_ops->get_policy(vma, addr);
shared_pol = 1; /* if pol non-NULL, add ref below */
} else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ vma->vm_policy->policy != default_policy.policy)
pol = vma->vm_policy;
}
if (!pol)
@@ -1588,7 +1588,7 @@ void __mpol_free(struct mempolicy *p)
{
if (!atomic_dec_and_test(&p->refcnt))
return;
- p->policy = MPOL_DEFAULT;
+ p->policy = default_policy.policy;
kmem_cache_free(policy_cache, p);
}
@@ -1752,10 +1752,10 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy,
info->root = RB_ROOT;
spin_lock_init(&info->lock);
- if (policy != MPOL_DEFAULT) {
+ if (policy != default_policy.policy) {
struct mempolicy *newpol;
- /* Falls back to MPOL_DEFAULT on any ...Is this just a stylistic choice? That is, is the same machine
code produced either way?
If that's the case, could you briefly explain why you prefer
one style (default_policy.policy) over the other (MPOL_DEFAULT)?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--
Very fast response! Yes, the same code is generated since default_policy.policy is statically declared as MPOL_DEFAULT. I chose this because checks in mpol_new() to return NULL if the mode is MPOL_DEFAULT is purely based on the fact that, as the default system policy, policy task or VMA pointers are set to &default_policy. --
You're just as fast!
Reasonable answer - thanks.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--
And where is the patch to set the system (or cpuset....) default policy? ;-) Note that the system default policy changes during bootup. Could that be done with the default policy? We had some issues a while back with processes spawned at boot inheriting the interleave policy. If they could refer to default_policy instead then a change of default_policy would switch all spawned processes to use MPOL_DEFAULT? --
The system default policy is inherently defined in get_vma_policy() and With the patch, get_vma_policy() would return the VMA's policy if it has its own get_policy() function that returns a valid policy or differs from the default, otherwise it would refer to whatever default_policy is statically allocated to represent. For task mempolicies, alloc_pages_current() has always used default_policy if current->mempolicy is NULL. So you can change default_policy.policy to be anything you want (and its v.nodes or v.preferred_node members) and that will be the effected policy for any task that doesn't have a valid ->mempolicy. --
Do I understand that you're saying that when mode is specified as MPOL_DEFAULT, the current task policy [in the case of set_mempolicy()] or the indicated vma policy [in the case of mbind()] is set to &default_policy? If that's not what you're saying, please disregard the rest of this message. If that is what you're saying: This is not the case. When mode is specified as MPOL_DEFAULT, mpol_new() returns NULL and do_set_mempolicy() or do_mbind() [via it's helper functions] replaces the existing task or vma policy with the NULL, freeing [releasing a reference on] the existing policy, if any. This results in the target policy--task or vma--"falling back" to the surrounding policy scope. This seems to have been the behavior as far back as I have looked. In fact, the only place that MPOL_DEFAULT occurs in the policy [mode] member of struct mempolicy is in the system default_policy. [As Christoph noted, this is not the case during system initialization.] In this context, it means "local allocation". This requires us to have the MPOL_DEFAULT case in the switches throughout mempolicy.c. I have a patch queued up, waiting for things to settle down in mempolicy.c, to replace the policy/mode in default_policy with MPOL_PREFERRED with preferred_node = -1. Then, we can remove all of the MPOL_DEFAULT cases out of the switches in the allocation paths and "clean up" the documentation, including man pages. MPOL_DEFAULT becomes simply an API mechanism to request fall back to the surrounding policy scope which, to my mind, is what "default policy" means. I don't have a problem with this patch in the context of the current implementation, altho' it does seem a bit pointless to me, unless you have something further that you're trying to to accomplish. I'll move the aforementioned patch to the head of my queue and send it out early this coming week for review atop whatever mempolicy patches Andrew has added to the tree at that time. Later, Lee --
Ok, I'll await your patch that switches default_policy.policy to MPOL_PREFERRED. Using MPOL_DEFAULT purely for falling back to the task or system-wide policy, however, seems confusing. The semantics seem to indicate that MPOL_DEFAULT represents the system-wide default policy without any preferred node or set of nodes to bind or interleave. So if a VMA has a policy of MPOL_DEFAULT then, to me, it seems like that indicates the absence of a specific policy, not a mandate to fallback to the task policy. --
I designed MPOL_DEFAULT on vma originally to be a fallback to the task policy. Absence of specific policy would be MPOL_PREFERRED with -1 node. -Andi --
Not sure what you mean here, Andi. "MPOL_PREFERRED with -1 node" == "local allocation", right? Whereas, in the task mempolicy or vma policy or shared policy, the lack of a specific policy--i.e., a null mempolicy pointer, or no policy at that offset in a shared policy rbtree--means fall back to surrounding context, right? As far back as I've looked, mempolicy.c implemented MPOL_DEFAULT, passed to set_mempolicy() or mbind(), by deleting the target policy, resulting in "fall back". The only place that MPOL_DEFAULT actually occurs in a struct mempolicy is in the system default policy. I think we can simplify the code and documentation--not have to explain the context dependent meaning of MPOL_DEFAULT--by making it simply an API request to remove the target policy and establish "default behavior" for that context--i.e., fallback. --
On Fri, 7 Mar 2008 17:24:15 -0800 (PST) I get a significant-looking reject from this. Can you please redo and resend? I put my current rollup (against -rc5) at http://userweb.kernel.org/~akpm/dr.gz and the broken-out tree is, as always at http://userweb.kernel.org/~akpm/mmotm/ It would be better for you to get set up for using mmotm - it is my usual way of publishing the -mm queue between releases. --
