Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node.

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: KOSAKI Motohiro <kosaki.motohiro@...>
Cc: Andi Kleen <andi@...>, <linux-mm@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Christoph Lameter <clameter@...>, Paul Jackson <pj@...>, David Rientjes <rientjes@...>, Mel Gorman <mel@...>
Date: Monday, February 4, 2008 - 2:20 pm

On Sat, 2008-02-02 at 18:37 +0900, KOSAKI Motohiro wrote:

The memoryless nodes patch series changed a lot of things, so just
reverting this one area [mpol_check_policy()] probably won't restore the
prior behavior.  A fully populated node mask is not necessarily a proper
subset of node_online_map().  And contextualize_policy() also requires
the mask to be a subset of mems_allowed which also defaults to nodes
with memory.

I don't know how Mel Gorman's "two zonelist" series, which is still
awaiting a window into the -mm tree, affects this behavior.  Those
patches will certainly be affected by whatever we decide here.

I don't know the current state of Paul's rework of cpusets and
mems_allowed.  That probably resolves this issue, if he still plans on
allowing a fully populated mask to indicate interleaving over all
allowed nodes.

I have a patch that takes a different approach to "interleave=all" that
doesn't solve Paul's and David's requirements.  I also have patches to
libnuma and numactl that work with my patches, but I saw no sense in
posting them unless my kernel patches got some traction.  If interested,
you can find them at:

 http://free.linux.hp.com/~lts/Patches/Numactl/



 

In addition to Andi's answer about simplicity, libnuma and numactl
predate the sysfs node masks.  There was no way to query what the valid
set of nodes would be, but the kernel allowed a fully populated map.  We
broke that with the memoryless nodes rework.



Regarding the patch itself:  If others have no problems with displaying
a "has_high_memory" node mask for systems w/o HIGH_MEM configured, I can
live with it.  

The current upstream kernel [2.6.24] supports a MPOL_MEMS_ALLOWED flag
to get_mempolicy() to return the nodes allowed in the caller's cpuset.
My numactl patches, mentioned above, support this.

However, as Andi says, we really can't break application behavior.  All
applications that use mempolicy don't necessarily use libnuma APIs.  So,
a fully populated interleave node mask should be allowed and should
probably mean "all allowed nodes with memory". 

I think we'd still need to reduce the interleave policy mask to nodes
with memory when it's installed or find another way to skip memoryless
nodes when interleaving, else we don't get even distribution of
interleaved pages over the nodes that do have memory.  This is one of
the memoryless nodes fixes.  I THINK this is one of the areas that Paul
and David are investigating.

Christoph, Mel, Paul:  any suggestions for a [relatively quick] fix that
doesn't break the memoryless nodes work and doesn't violate cpuset
constraints?




--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all d..., Lee Schermerhorn, (Mon Feb 4, 2:20 pm)
Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodema..., Lee Schermerhorn, (Mon Feb 11, 12:47 pm)