> Nobody can show an example of an application that would be broken becauseWell, neither you nor I have shown an example. That's different than "nobody can." Since it would affect any task setting memory policies while in a cpuset holding less than all memory nodes, it seems potentially serious to me. Actually, I have one example. The libcpuset library would have some breakage with Choice B the only Choice. But I'm in a position to deal with that, so it's not a big deal. Breaking the libnuma-Oracle solution stack is not an option. And, unless someone in the know tells us otherwise, I have to assume that this could break them. Now, the odds are that they simply don't run that solution stack on any system making active use of cpusets, so the odds are this would be no problem for them. But I don't presently have enough knowledge of their situation to take that risk. We could get rid of Choice A once libnuma and libcpuset have adapted to Choice B, and any other uses of Choice A that we've subsequently identified have had sufficient time to adapt. But dual support is pretty easy so far as the kernel code is concerned. It's just a few nodes_remap() calls optionally invoked at a few key spots in mm/mempolicy.c. Consequently there won't be a big hurry to remove Choice A. There is no "_then_ attach the task to a cpuset." On systems with kernels configured with CONFIG_CPUSETS=y, all tasks are in a cpuset all the time. Moreover, from a practical point of view, on large systems managed with cpuset based mechanisms, almost all tasks are in cpusets that do not include all nodes, for the entire life of the task. And besides, I can't break existing applications willy-nilly, and then claim it's their fault, because they should have been coded differently. So "correct way" arguments don't hold alot of weight for already released and deployed product. David ;) I make some effort to avoid forcing applications to be recoded and rebuilt in order to continue functioning. I had to read that a couple of times to make sense of it. I take that it means that the node numbering used in each cpuset's 'mems' file has to be system-wide. Yes, agreed. (Well, actually, the node numbering of each cpusets 'mems' file could be relative to its parent cpusets 'mem' numbers, but let's not go there, as this discussion is already sufficiently complicated ;) That's what Choice B states, yes. Though to be clear, time for another example: * task is in cpuset with mems: 24-31 * task wants some memory policy on the first two nodes of its cpuset. * by Choice A, it asks for nodes 24 and 25 * by Choice B, it asks for nodes 0 and 1 The Choice B numbering can be thought of as cpuset relative. In it, node N means the N-th node in my current cpuset, modulo whatever is the node size of that cpuset. However ... We need to continue to support Choice A as well, perhaps for some interim, perhaps forever. Which doesn't much matter for now. === David - how would the following do for you? Would it meet the need that prompted your initial patch set if we added Choice B memory policy node numbering, but left Choice A as the kernel default, with a per-task option (perhaps invokable by a new option to one of the {get,set}_mempolicy() calls) to choose Choice B? This lets us get Choice B out there, and lets the two main libraries, libnuma and libcpuset, dynamically adapt to whichever Choice is active for the current task. Unchanged applications and existing binaries would simply continue with Choice A. With one additional line of code, a user application could get Choice B, with its ability for example to request MPOL_INTERLEAVE over all cpuset allowed nodes, where the kernel automatically adapts that to changing cpuset changes from larger 'mems' to smaller 'mems' and back to larger 'mems' again. It would mean you would have to make a change to your applications to get this improved interleaving. But I trust from all you've been advocating that such a code change and rebuild would not be any problem for whatever situation you're concerned with. We could recommend that new code probe to see if Choice B is available and prefer it if it is. At some future time, we might deprecate and eventually remove Choice A. I appreciate that you don't want to leave in place the complications of dual Choices, but I lack the experience, knowledge or clarity I need to support fully changing over to Choice B at this time. Getting Choice B out there will go a long way toward providing us with the feedback we will need to guide future decisions. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| James Bottomley | Re: Announce: Linux-next (Or Andrew's dream :-)) |
| Trent Piepho | Re: [PATCH] fakephp: Allocate PCI resources before adding the device |
| Antonio Almeida | HTB accuracy for high speed |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
git: | |
