Re: [patch 2/2] cpusets: add interleave_over_allowed option

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: David Rientjes <rientjes@...>
Cc: <clameter@...>, <Lee.Schermerhorn@...>, <akpm@...>, <ak@...>, <linux-kernel@...>
Date: Monday, October 29, 2007 - 12:27 am

> Nobody can show an example of an application that would be broken because 

Well, neither you nor I have shown an example.  That's different than
"nobody can."

Since it would affect any task setting memory policies while in a
cpuset holding less than all memory nodes, it seems potentially serious
to me.

Actually, I have one example.  The libcpuset library would have some
breakage with Choice B the only Choice.  But I'm in a position to deal
with that, so it's not a big deal.



Breaking the libnuma-Oracle solution stack is not an option.

And, unless someone in the know tells us otherwise, I have to assume
that this could break them.  Now, the odds are that they simply don't
run that solution stack on any system making active use of cpusets,
so the odds are this would be no problem for them.  But I don't
presently have enough knowledge of their situation to take that risk.



We could get rid of Choice A once libnuma and libcpuset have adapted
to Choice B, and any other uses of Choice A that we've subsequently
identified have had sufficient time to adapt.

But dual support is pretty easy so far as the kernel code is concerned.
It's just a few nodes_remap() calls optionally invoked at a few key
spots in mm/mempolicy.c.  Consequently there won't be a big hurry to
remove Choice A.



There is no "_then_ attach the task to a cpuset."  On systems with
kernels configured with CONFIG_CPUSETS=y, all tasks are in a cpuset
all the time.  Moreover, from a practical point of view, on large
systems managed with cpuset based mechanisms, almost all tasks are in
cpusets that do not include all nodes, for the entire life of the task.

And besides, I can't break existing applications willy-nilly, and
then claim it's their fault, because they should have been coded
differently.  So "correct way" arguments don't hold alot of weight
for already released and deployed product.



David ;)  I make some effort to avoid forcing applications to be
recoded and rebuilt in order to continue functioning.



I had to read that a couple of times to make sense of it.  I take that
it means that the node numbering used in each cpuset's 'mems' file has
to be system-wide.  Yes, agreed.

(Well, actually, the node numbering of each cpusets 'mems' file could
be relative to its parent cpusets 'mem' numbers, but let's not go
there, as this discussion is already sufficiently complicated ;)



That's what Choice B states, yes.  Though to be clear, time for another
example:
  * task is in cpuset with mems: 24-31
  * task wants some memory policy on the first two nodes of its cpuset.
  * by Choice A, it asks for nodes 24 and 25
  * by Choice B, it asks for nodes 0 and 1

The Choice B numbering can be thought of as cpuset relative.  In it,
node N means the N-th node in my current cpuset, modulo whatever is the
node size of that cpuset.

However ...

We need to continue to support Choice A as well, perhaps for some
interim, perhaps forever.  Which doesn't much matter for now.

===

David - how would the following do for you?

Would it meet the need that prompted your initial patch set if we
added Choice B memory policy node numbering, but left Choice A as the
kernel default, with a per-task option (perhaps invokable by a new
option to one of the {get,set}_mempolicy() calls) to choose Choice B?

This lets us get Choice B out there, and lets the two main libraries,
libnuma and libcpuset, dynamically adapt to whichever Choice is active
for the current task.

Unchanged applications and existing binaries would simply continue with
Choice A.  With one additional line of code, a user application could
get Choice B, with its ability for example to request MPOL_INTERLEAVE
over all cpuset allowed nodes, where the kernel automatically adapts
that to changing cpuset changes from larger 'mems' to smaller 'mems'
and back to larger 'mems' again.

It would mean you would have to make a change to your applications
to get this improved interleaving.  But I trust from all you've been
advocating that such a code change and rebuild would not be any problem
for whatever situation you're concerned with.

We could recommend that new code probe to see if Choice B is available
and prefer it if it is.   At some future time, we might deprecate and
eventually remove Choice A.

I appreciate that you don't want to leave in place the complications
of dual Choices, but I lack the experience, knowledge or clarity I
need to support fully changing over to Choice B at this time.

Getting Choice B out there will go a long way toward providing us
with the feedback we will need to guide future decisions.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch 2/2] cpusets: add interleave_over_allowed option, David Rientjes, (Thu Oct 25, 6:54 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Thu Oct 25, 7:37 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Thu Oct 25, 8:28 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 11:18 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 12:23 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Tue Oct 30, 4:20 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Mon Oct 29, 4:36 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 1:36 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, David Rientjes, (Thu Oct 25, 10:11 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 11:30 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 4:43 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 5:13 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 5:31 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 11:10 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 3:01 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 9:26 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 12:54 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Paul Jackson, (Mon Oct 29, 12:27 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, David Rientjes, (Mon Oct 29, 12:47 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 10:50 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Sat Oct 27, 2:07 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Sat Oct 27, 1:47 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 5:17 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 5:26 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 5:37 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 11:00 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Mon Oct 29, 4:35 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Mon Oct 29, 1:46 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 5:05 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Christoph Lameter, (Fri Oct 26, 5:12 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, David Rientjes, (Thu Oct 25, 10:45 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, David Rientjes, (Thu Oct 25, 11:58 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 11:37 am)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Michael Kerrisk, (Fri Oct 26, 4:21 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Michael Kerrisk, (Fri Oct 26, 4:33 pm)
Re: [patch 2/2] cpusets: add interleave_over_allowed option, Lee Schermerhorn, (Fri Oct 26, 1:28 pm)