Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

Previous thread: none

Next thread: Question about pdflush's two ratio? by 内核追踪 on Friday, March 7, 2008 - 6:53 pm. (1 message)
From: David Rientjes
Date: Friday, March 7, 2008 - 6:24 pm

MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
MPOL_PREFERRED policies that were created with an empty nodemask (for
purely local allocations).  They'll never be invalidated because the
allowed mems of a task changes or need to be rebound relative to a
cpuset's placement.

Also fixes a bug identified by Lee Schermerhorn that disallowed empty
nodemasks to be passed to MPOL_PREFERRED to specify local allocations.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/vm/numa_memory_policy.txt |   16 ++++++++++++++--
 mm/mempolicy.c                          |   17 ++++++++++++-----
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -210,6 +210,12 @@ Components of Memory Policies
 	    local allocation for a specific range of addresses--i.e. for
 	    VMA policies.
 
+	    It is possible for the user to specify that local allocation is
+	    always preferred by passing an empty nodemask with this mode.
+	    If an empty nodemask is passed, the policy cannot use the
+	    MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described
+	    below.
+
 	MPOL_INTERLEAVED:  This mode specifies that page allocations be
 	interleaved, on a page granularity, across the nodes specified in
 	the policy.  This mode also behaves slightly differently, based on
@@ -259,7 +265,10 @@ Components of Memory Policies
 	    occurs over that node.  If no nodes from the user's nodemask are
 	    now allowed, the Default behavior is used.
 
-	    MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES.
+	    MPOL_F_STATIC_NODES cannot be combined with the
+	    ...
From: David Rientjes
Date: Friday, March 7, 2008 - 6:24 pm

default_policy is defined in mm/mempolicy.c as the default system policy:

	struct mempolicy default_policy = {
		.refcnt = ATOMIC_INIT(1),
		.policy = MPOL_DEFAULT,
	};

So instead of checking for comparisons against a mempolicy's mode to
MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
we should use the mode that is defined in this struct.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/mempolicy.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -185,7 +185,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
 			return ERR_PTR(-EINVAL);
 	} else if (nodes && nodes_empty(*nodes))
 		return ERR_PTR(-EINVAL);
-	if (mode == MPOL_DEFAULT)
+	if (mode == default_policy.policy)
 		return NULL;
 	policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
 	if (!policy)
@@ -1232,7 +1232,7 @@ static struct mempolicy * get_vma_policy(struct task_struct *task,
 			pol = vma->vm_ops->get_policy(vma, addr);
 			shared_pol = 1;	/* if pol non-NULL, add ref below */
 		} else if (vma->vm_policy &&
-				vma->vm_policy->policy != MPOL_DEFAULT)
+			   vma->vm_policy->policy != default_policy.policy)
 			pol = vma->vm_policy;
 	}
 	if (!pol)
@@ -1588,7 +1588,7 @@ void __mpol_free(struct mempolicy *p)
 {
 	if (!atomic_dec_and_test(&p->refcnt))
 		return;
-	p->policy = MPOL_DEFAULT;
+	p->policy = default_policy.policy;
 	kmem_cache_free(policy_cache, p);
 }
 
@@ -1752,10 +1752,10 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy,
 	info->root = RB_ROOT;
 	spin_lock_init(&info->lock);
 
-	if (policy != MPOL_DEFAULT) {
+	if (policy != default_policy.policy) {
 		struct mempolicy *newpol;
 
-		/* Falls back to MPOL_DEFAULT on any ...
From: Paul Jackson
Date: Friday, March 7, 2008 - 6:28 pm

Is this just a stylistic choice?  That is, is the same machine
code produced either way?

If that's the case, could you briefly explain why you prefer
one style (default_policy.policy) over the other (MPOL_DEFAULT)?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214
--

From: David Rientjes
Date: Friday, March 7, 2008 - 6:33 pm

Very fast response!

Yes, the same code is generated since default_policy.policy is statically 
declared as MPOL_DEFAULT.

I chose this because checks in mpol_new() to return NULL if the mode is 
MPOL_DEFAULT is purely based on the fact that, as the default system 
policy, policy task or VMA pointers are set to &default_policy.
--

From: Paul Jackson
Date: Friday, March 7, 2008 - 6:35 pm

You're just as fast!

Reasonable answer - thanks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214
--

From: Christoph Lameter
Date: Friday, March 7, 2008 - 6:59 pm

And where is the patch to set the system (or cpuset....) default policy? 
;-)

Note that the system default policy changes during bootup. Could that be 
done with the default policy? We had some issues a while back with 
processes spawned at boot inheriting the interleave policy. If they could 
refer to default_policy instead then a change of default_policy would 
switch all spawned processes to use MPOL_DEFAULT?

--

From: David Rientjes
Date: Friday, March 7, 2008 - 7:14 pm

The system default policy is inherently defined in get_vma_policy() and 

With the patch, get_vma_policy() would return the VMA's policy if it has 
its own get_policy() function that returns a valid policy or differs from 
the default, otherwise it would refer to whatever default_policy is 
statically allocated to represent.

For task mempolicies, alloc_pages_current() has always used default_policy 
if current->mempolicy is NULL.  So you can change default_policy.policy to 
be anything you want (and its v.nodes or v.preferred_node members) and 
that will be the effected policy for any task that doesn't have a valid 
->mempolicy.
--

From: Lee Schermerhorn
Date: Saturday, March 8, 2008 - 12:13 pm

Do I understand that you're saying that when mode is specified as
MPOL_DEFAULT, the current task policy [in the case of set_mempolicy()]
or the indicated vma policy [in the case of mbind()] is set to
&default_policy?  If that's not what you're saying, please disregard the
rest of this message.

If that is what you're saying:   This is not the case.  When mode is
specified as MPOL_DEFAULT,  mpol_new() returns NULL and
do_set_mempolicy() or do_mbind() [via it's helper functions] replaces
the existing task or vma policy with the NULL, freeing [releasing a
reference on] the existing policy, if any.  This results in the target
policy--task or vma--"falling back" to the surrounding policy scope.
This seems to have been the behavior as far back as I have looked.

In fact, the only place that MPOL_DEFAULT occurs in the policy [mode]
member of struct mempolicy is in the system default_policy.  [As
Christoph noted, this is not the case during system initialization.]  In
this context, it means "local allocation".  This requires us to have the
MPOL_DEFAULT case in the switches throughout mempolicy.c.

I have a patch queued up, waiting for things to settle down in
mempolicy.c, to replace the policy/mode in default_policy with
MPOL_PREFERRED with preferred_node = -1.  Then, we can remove all of the
MPOL_DEFAULT cases out of the switches in the allocation paths and
"clean up" the documentation, including man pages.  MPOL_DEFAULT becomes
simply an API mechanism to request fall back to the surrounding policy
scope which, to my mind, is what "default policy" means.

I don't have a problem with this patch in the context of the current
implementation, altho' it does seem a bit pointless to me, unless you
have something further that you're trying to to accomplish.

I'll move the aforementioned patch to the head of my queue and send it
out early this coming week for review atop whatever mempolicy patches
Andrew has added to the tree at that time.

Later,
Lee


--

From: David Rientjes
Date: Saturday, March 8, 2008 - 3:20 pm

Ok, I'll await your patch that switches default_policy.policy to 
MPOL_PREFERRED.

Using MPOL_DEFAULT purely for falling back to the task or system-wide 
policy, however, seems confusing.  The semantics seem to indicate that 
MPOL_DEFAULT represents the system-wide default policy without any 
preferred node or set of nodes to bind or interleave.  So if a VMA has a 
policy of MPOL_DEFAULT then, to me, it seems like that indicates the 
absence of a specific policy, not a mandate to fallback to the task 
policy.
--

From: Andi Kleen
Date: Saturday, March 8, 2008 - 4:19 pm

I designed MPOL_DEFAULT on vma originally to be a fallback to the task policy.

Absence of specific policy would be MPOL_PREFERRED with -1 node.

-Andi



--

From: Lee Schermerhorn
Date: Monday, March 10, 2008 - 6:48 am

Not sure what you mean here, Andi.   

"MPOL_PREFERRED with -1 node" == "local allocation", right?  

Whereas, in the task mempolicy or vma policy or shared policy, the lack
of a specific policy--i.e., a null mempolicy pointer, or no policy at
that offset in a shared policy rbtree--means fall back to surrounding
context, right?  As far back as I've looked, mempolicy.c implemented
MPOL_DEFAULT, passed to set_mempolicy() or mbind(), by deleting the
target policy, resulting in "fall back".  

The only place that MPOL_DEFAULT actually occurs in a struct mempolicy
is in the system default policy.  I think we can simplify the code and
documentation--not have to explain the context dependent meaning of
MPOL_DEFAULT--by making it simply an API request to remove the target
policy and establish "default behavior" for that context--i.e.,
fallback.  


--

From: Andrew Morton
Date: Monday, March 10, 2008 - 12:09 pm

On Fri, 7 Mar 2008 17:24:15 -0800 (PST)

I get a significant-looking reject from this.  Can you please redo and
resend?


I put my current rollup (against -rc5) at
http://userweb.kernel.org/~akpm/dr.gz and the broken-out tree is, as always
at http://userweb.kernel.org/~akpm/mmotm/

It would be better for you to get set up for using mmotm - it is my usual
way of publishing the -mm queue between releases.
--

Previous thread: none

Next thread: Question about pdflush's two ratio? by 内核追踪 on Friday, March 7, 2008 - 6:53 pm. (1 message)