On Sun, 28 Oct 2007, Paul Jackson wrote:Nobody can show an example of an application that would be broken because of this and, given the scenario and sequence of events that it requires to be broken when implementing the default as Choice B, I don't think it's as much of an issue as you believe. So all applications that use the libnuma interface and numactl will have different default behavior than those that simply issue {get,set}_mempolicy() calls. libnuma is a collection of higher level functions that should be built upon {get,set}_mempolicy() like they currently are and not introduce new subtleties like changing the semantics of a preferred node argument. This is going to quickly become a documentation nightmare and, in my opinion, isn't worth the time or effort to support because we haven't even idenitifed any real-world examples. Maybe Andi Kleen should weigh in on this topic because, if we go with what you're suggesting, we'll never get rid of the two differing behaviors and we'll be introducing different semantics to arguments of libnuma functions than the kernel API they are built upon. True, but the ordering of that scenario is troublesome. The correct way to implement it is to use set_mempolicy() or a higher level libnuma function with the same semantics and _then_ attach the task to a cpuset. Then the nodes_remap() takes care of the rest. The scenario you describe above has a problem because it requires the task to have knowledge of the cpuset's mems in which it is attached when, for portability, it should have been written so that it is robust to any range of nodes you happen to assign it to. No, because nodes_remap() takes care of the instances you describe above when the task sets its memory policy (usually done when it is started) and is then attached to a cpuset. Supporting two different behaviors is going to be more problematic than simply selecting one and going with it and its associated documentation in future versions of the kernel. Paul, the changes required to an application that is currently using {get,set}_mempolicy() calls to setup the memory policy or the higher level functions through libnuma is so easy to use Choice B as a default instead of Choice A that it would be ridiculous to support configuring it on a per-system or per-cpuset basis. Choosing only one behavior for the kernel (Choice B) is by far the superior selection because then any task can share a cpuset with any other task and implement its memory policy preferences in terms of low level system calls, numactl, or libnuma. That's the power that we should be giving users, not the addition of hacks or more configuration knobs that is going to clutter and confuse anybody who wants to simply pick a preferred node. Yet the 'mems' file would still be system-wide; otherwise it would be impossible to expand the memory your cpuset has access to. Everything else would be relative to 'mems'. David -
| Linus Torvalds | Linux 2.6.21 |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Con Kolivas | Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2 |
| Andrew Morton | echo mem > /sys/power/state |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [GIT]: Networking |
| Michael S. Tsirkin | Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure |
