This patch removes the mem_cgroup member from mm_struct and instead adds
an owner. This approach was suggested by Paul Menage. The advantage of
this approach is that, once the mm->owner is known, using the subsystem
id, the cgroup can be determined. It also allows several control groups
that are virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller,
to mm_struct.The code initially assigns mm->owner to the task and then after the
thread group leader is identified. The mm->owner is changed to the thread
group leader of the task later at the end of copy_process.Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---include/linux/memcontrol.h | 14 +++++++++++++-
include/linux/mm_types.h | 5 ++++-
kernel/fork.c | 4 ++++
mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++--------
4 files changed, 55 insertions(+), 10 deletions(-)diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner include/linux/mm_types.h
--- linux-2.6.25-rc5/include/linux/mm_types.h~memory-controller-add-mm-owner 2008-03-20 13:35:09.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/mm_types.h 2008-03-20 15:11:05.000000000 +0530
@@ -228,7 +228,10 @@ struct mm_struct {
rwlock_t ioctx_list_lock;
struct kioctx *ioctx_list;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
- struct mem_cgroup *mem_cgroup;
+ struct task_struct *owner; /* The thread group leader that */
+ /* owns the mm_struct. This */
+ /* might be useful even outside */
+ /* of the config option */
#endif#ifdef CONFIG_PROC_FS
diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
--- linux-2.6.25-rc5/kernel/fork.c~memory-controller-add-mm-owner 2008-03-20 13:35:09.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/fork.c 2008-03-24 18:49:29.000000000 +0530
@@ -1357,6 +1357,10 @@ static struct task_struct *copy_process(
write_unlock_irq(&ta...
Where is this function used? I don't see the corresponding one
--
I kept that as a template for freeing up code. I'll remove that since
OK
Thanks for the review
Balbir
--
This should probably be controlled by something like a CONFIG_MM_OWNER
that's selected by any Kconfig option (mem cgroup, etc) that needsIt seems to me that the code to setup/maintain mm->owner should be
independent of the control groups, but should be part of the generic
fork/exit code.Also, if mm->owner exits but mm is still alive (unlikely, but could
happen with weird custom threading libraries?) then we need to
reassign mm->owner to one of the other users of the mm (by looking
first in the thread group, then among the parents/siblings/children,I think we still need the rcu_read_lock(), since mm->owner can move
We shouldn't need reference counting on this pointer, since the
cgroups framework won't allow a subsystem to be freed while it has any
tasks in it.Paul
--
The comment in __exit_signal states that
"The group leader stays around as a zombie as long
as there are other threads. When it gets reaped,
the exit.c code will add its counts into these totals."Given that the thread group leader stays around, do we need to reassign
mm->owner? Do you do anything special in cgroups like cleanup theOK, so cgroup task movement is protected by RCU, right? I'll check for all
This reference earlier indicated that there were active mm->mem_cgroup users of
the cgroup. With mm->owner changes, we might not require this. Let me double
confirm that.Thanks for the review.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
OK, so we don't need to handle this for NPTL apps - but for anything
still using LinuxThreads or manually constructed clone() calls that
use CLONE_VM without CLONE_PID, this could still be an issue. (Also I
guess there's the case of someone holding a reference to the mm via aYes - cgroup_attach() uses synchronize_rcu() before release the cgroup
mutex. So although you can't guarantee that the cgroup set won't
change if you're just using RCU, you can't guarantee that you're
addressing a still-valid non-destroyed (and of course non-freed)
cgroup set.Paul
--
CLONE_PID?? Do you mean CLONE_THREAD?
For the case you mentioned, mm->owner is a moving target and we don't want to
spend time finding the successor, that can be expensive when threads start
exiting one-by-one quickly and when the number of threads are high. I wonder if
there is an efficient way to find mm->owner in that case.Yes, but in that case we'll not be charging/uncharging anything to that mm or
Yes, I understand that part of RCU.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
On Mon, Mar 24, 2008 at 10:33 AM, Balbir Singh
But:
- running a high-threadcount LinuxThreads process is by definition
inefficient and expensive (hence the move to NPTL)- any potential performance hit is only paid at exit time
- in the normal case, any of your children or one of your siblings
will be a suitable alternate owner- in the worst case, it's not going to be worse than doing a
for_each_thread() loopso I don't think this would be a major problem
Paul
--
I've been looking at zap_threads, I suspect we'll end up implementing a similar
loop, which makes me very uncomfortable. Adding code for the least possible
scenario. It will not get invoked for CLONE_THREAD, but will get invoked for the
case when CLONE_VM is set without CLONE_THREAD.I'll try and experiment a bit more and see what I come up with
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
This will have to be the common case, since you never know what combination of
clone calls did CLONE_VM and what did CLONE_THREAD. At exit time, we need to pay
a for_each_process() overhead. Although very unlikely, an application can call
pthread_* functions (NPTL) and then do a clone with CLONE_VM, thus forcing
threads in a thread group and another process to share the mm_struct. This makesI am yet to benchmark the cost of doing for_each_process() on every exit. I
suspect, we'll see a big drop in performance. I am not sure anymore if mm->owner
is worth the overhead.--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
I'm not convinced of this. All we have to do is find some other
process p where p->mm == current->mm and make it the new owner.
Exactly what sequence of clone() calls was used to cause the sharing
isn't really relevant. I really think that a suitable candidate will
be found amongst your children or your first sibling in 99.9% of those
cases where more than one process is using an mm.The actual sequence would have to go something like:
static inline bool need_new_owner(struct mm_struct *mm) {
return (mm && mm->owner == current && atomic_read(&mm->users) > 1);
}
static inline void try_give_mm_ownership(
struct task_struct *task,
struct mm_struct *mm) {
if (task->mm != mm) return;
task_lock(task);
if (task->mm == mm) {
mm->owner = task;
}
task_unlock(task);
}struct mm_struct *mm = current->mm;
task_lock(current);
current->mm = NULL;
task_unlock(current);/* First try my children */
if (need_new_owner(mm)) {
for_each_child(current, c) {
try_give_mm_ownership(c);
if (!need_new_owner(mm)) break;
}
}/* Then try my siblings */
if (need_new_owner(mm)) {
for_each_child(current->real_parent, c) {
try_give_mm_ownership(c);
if (!need_new_owner(mm)) break;
}
}if (need_new_owner(mm)) {
/* We'll almost never get here */
for_each_process(p) {
try_give_mm_ownership(p);
if (!need_new_owner(mm)) break;
}
}Paul
--
Hmmm.. the 99.9% of the time is just guess work (not measured, could be possibly
true). I see and understand your code below. But before I try and implement
something like that, I was wondering why zap_threads() does not have that
heuristic. That should explain my inhibition.Can anyone elaborate on zap_threads further?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
zap_threads() has to find *all* the other users, whereas in this case
we only have to find one other user.Paul
--
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
