On Sat, 2007-10-27 at 12:16 -0700, David Rientjes wrote:David: as we discussed when you contacted me off-list about this, the libnuma API and the system call interface are two quite different APIs. For example, numa_set_interleave_mask(&numa_no_nodes) does not pass MPOL_INTERLEAVE with an empty mask to set_mempolicy(). Rather it "installs" an MPOL_DEFAULT policy which internally just deletes the task's mempolicy, allowing fallback to system default policy. I would not propose to change this behavior, nor break libnuma in any way. For other, who weren't involved in the off-list exchange, here's an excerpt from my response to David: [ At the libnuma level, I think we need an explicit "numa_set_interleave_allowed()"--analogous to "numa_set_localalloc()". The current "numa_alloc_interleaved()" should, I think, allocate on all *allowed* nodes, rather than all nodes. It can do this using the sys call interface as defined. Independent of cpuset-independent interleave, an application needs to pass a valid subset of the current mems allowed to "numa_alloc_interleaved_subset()". An application can now obtain the mems_allowed using the MPOL_F_MEMS_ALLOWED flag that I added, but we need a libnuma wrapper for this as well. [Yeah, this info can change at any time, but that's always been the case....] "numa_interleave_memory()" is essentially mbind(), I think [not looking at the libnuma source code at this moment]. Maybe provide "numa_interleave_memory_allowed(void *mem, size_t size)" ??? Finally, I think we need to add a query function: "nodemask_t numa_get_mems_allowed()" to return the mask of valid nodes in the current context [cpuset]. This would just be a wrapper around get_mempolicy() with the MPOL_F_MEMS_ALLOWED flag. ] Couple of comments on the above: 1. "the sys call interface as defined" in the 2nd paragraph of the except refers to my patch that uses null/empty nodemask to indicate "all allowed". 2. As this thread progresses, you've discussed relaxing the requirement that applications pass a valid subset of mems_allowed. I.e., something that was illegal becomes legal. An API change, I think. But, a backward compatible one, so that's OK, right? :-) 3. If we do change the semantics of the mempolicy system calls to allow nodes outside of the cpuset, then maybe we don't need to query the mems allowed. I still find it useful, but not absolutely necessary--e.g., to construct a nodemask that will be acceptable in the current cpuset. 4. I looked at libnuma source. numa_interleave_memory() does use mbind() which, again, does not complain about nodemasks that include non-allowed nodes. Another thing occurs to me: perhaps numactl would need an additional 'nodes' specifier such as 'allowed'. Alternatively, 'all' could be redefined to me 'all allowed'. This is independent of how you specify 'all allowed' to the system call. Regards, Lee -
| Alan | Re: [RFC] Heads up on sys_fallocate() |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Paul Mundt | Re: 2.6.22-rc4-mm2 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [GIT]: Networking |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
