Re: [RFC][-mm] Memory controller add mm->owner

Previous thread: MMC/SD: data CRC errors and unknown SCR version on 2.6.14 by Martin B. Andersen on Monday, March 24, 2008 - 7:39 am. (2 messages)

Next thread: [PATCH] x86_64: resize NR_IRQS for large machines by Alan Mayer on Monday, March 24, 2008 - 10:31 am. (8 messages)
To: <linux-mm@...>
Cc: Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, Paul Menage <menage@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Balbir Singh <balbir@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 10:01 am

This patch removes the mem_cgroup member from mm_struct and instead adds
an owner. This approach was suggested by Paul Menage. The advantage of
this approach is that, once the mm->owner is known, using the subsystem
id, the cgroup can be determined. It also allows several control groups
that are virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller,
to mm_struct.

The code initially assigns mm->owner to the task and then after the
thread group leader is identified. The mm->owner is changed to the thread
group leader of the task later at the end of copy_process.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

include/linux/memcontrol.h | 14 +++++++++++++-
include/linux/mm_types.h | 5 ++++-
kernel/fork.c | 4 ++++
mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++--------
4 files changed, 55 insertions(+), 10 deletions(-)

diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner include/linux/mm_types.h
--- linux-2.6.25-rc5/include/linux/mm_types.h~memory-controller-add-mm-owner 2008-03-20 13:35:09.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/mm_types.h 2008-03-20 15:11:05.000000000 +0530
@@ -228,7 +228,10 @@ struct mm_struct {
rwlock_t ioctx_list_lock;
struct kioctx *ioctx_list;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
- struct mem_cgroup *mem_cgroup;
+ struct task_struct *owner; /* The thread group leader that */
+ /* owns the mm_struct. This */
+ /* might be useful even outside */
+ /* of the config option */
#endif

#ifdef CONFIG_PROC_FS
diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
--- linux-2.6.25-rc5/kernel/fork.c~memory-controller-add-mm-owner 2008-03-20 13:35:09.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/fork.c 2008-03-24 18:49:29.000000000 +0530
@@ -1357,6 +1357,10 @@ static struct task_struct *copy_process(
write_unlock_irq(&ta...

To: Balbir Singh <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, Paul Menage <menage@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 9:26 pm

Where is this function used? I don't see the corresponding one

--

To: Li Zefan <lizf@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, Paul Menage <menage@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Tuesday, March 25, 2008 - 11:48 am

I kept that as a template for freeing up code. I'll remove that since

OK

Thanks for the review
Balbir
--

To: Balbir Singh <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 11:03 am

This should probably be controlled by something like a CONFIG_MM_OWNER
that's selected by any Kconfig option (mem cgroup, etc) that needs

It seems to me that the code to setup/maintain mm->owner should be
independent of the control groups, but should be part of the generic
fork/exit code.

Also, if mm->owner exits but mm is still alive (unlikely, but could
happen with weird custom threading libraries?) then we need to
reassign mm->owner to one of the other users of the mm (by looking
first in the thread group, then among the parents/siblings/children,

I think we still need the rcu_read_lock(), since mm->owner can move

We shouldn't need reference counting on this pointer, since the
cgroups framework won't allow a subsystem to be freed while it has any
tasks in it.

Paul
--

To: Paul Menage <menage@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 12:21 pm

The comment in __exit_signal states that

"The group leader stays around as a zombie as long
as there are other threads. When it gets reaped,
the exit.c code will add its counts into these totals."

Given that the thread group leader stays around, do we need to reassign
mm->owner? Do you do anything special in cgroups like cleanup the

OK, so cgroup task movement is protected by RCU, right? I'll check for all

This reference earlier indicated that there were active mm->mem_cgroup users of
the cgroup. With mm->owner changes, we might not require this. Let me double
confirm that.

Thanks for the review.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 12:34 pm

OK, so we don't need to handle this for NPTL apps - but for anything
still using LinuxThreads or manually constructed clone() calls that
use CLONE_VM without CLONE_PID, this could still be an issue. (Also I
guess there's the case of someone holding a reference to the mm via a

Yes - cgroup_attach() uses synchronize_rcu() before release the cgroup
mutex. So although you can't guarantee that the cgroup set won't
change if you're just using RCU, you can't guarantee that you're
addressing a still-valid non-destroyed (and of course non-freed)
cgroup set.

Paul
--

To: Paul Menage <menage@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 1:33 pm

CLONE_PID?? Do you mean CLONE_THREAD?

For the case you mentioned, mm->owner is a moving target and we don't want to
spend time finding the successor, that can be expensive when threads start
exiting one-by-one quickly and when the number of threads are high. I wonder if
there is an efficient way to find mm->owner in that case.

Yes, but in that case we'll not be charging/uncharging anything to that mm or

Yes, I understand that part of RCU.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Monday, March 24, 2008 - 1:46 pm

On Mon, Mar 24, 2008 at 10:33 AM, Balbir Singh

But:

- running a high-threadcount LinuxThreads process is by definition
inefficient and expensive (hence the move to NPTL)

- any potential performance hit is only paid at exit time

- in the normal case, any of your children or one of your siblings
will be a suitable alternate owner

- in the worst case, it's not going to be worse than doing a
for_each_thread() loop

so I don't think this would be a major problem

Paul
--

To: Paul Menage <menage@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Tuesday, March 25, 2008 - 7:41 am

I've been looking at zap_threads, I suspect we'll end up implementing a similar
loop, which makes me very uncomfortable. Adding code for the least possible
scenario. It will not get invoked for CLONE_THREAD, but will get invoked for the
case when CLONE_VM is set without CLONE_THREAD.

I'll try and experiment a bit more and see what I come up with

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Paul Menage <menage@...>
Cc: <balbir@...>, <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Wednesday, March 26, 2008 - 6:29 am

This will have to be the common case, since you never know what combination of
clone calls did CLONE_VM and what did CLONE_THREAD. At exit time, we need to pay
a for_each_process() overhead. Although very unlikely, an application can call
pthread_* functions (NPTL) and then do a clone with CLONE_VM, thus forcing
threads in a thread group and another process to share the mm_struct. This makes

I am yet to benchmark the cost of doing for_each_process() on every exit. I
suspect, we'll see a big drop in performance. I am not sure anymore if mm->owner
is worth the overhead.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Wednesday, March 26, 2008 - 7:20 am

I'm not convinced of this. All we have to do is find some other
process p where p->mm == current->mm and make it the new owner.
Exactly what sequence of clone() calls was used to cause the sharing
isn't really relevant. I really think that a suitable candidate will
be found amongst your children or your first sibling in 99.9% of those
cases where more than one process is using an mm.

The actual sequence would have to go something like:

static inline bool need_new_owner(struct mm_struct *mm) {
return (mm && mm->owner == current && atomic_read(&mm->users) > 1);
}
static inline void try_give_mm_ownership(
struct task_struct *task,
struct mm_struct *mm) {
if (task->mm != mm) return;
task_lock(task);
if (task->mm == mm) {
mm->owner = task;
}
task_unlock(task);
}

struct mm_struct *mm = current->mm;
task_lock(current);
current->mm = NULL;
task_unlock(current);

/* First try my children */
if (need_new_owner(mm)) {
for_each_child(current, c) {
try_give_mm_ownership(c);
if (!need_new_owner(mm)) break;
}
}

/* Then try my siblings */
if (need_new_owner(mm)) {
for_each_child(current->real_parent, c) {
try_give_mm_ownership(c);
if (!need_new_owner(mm)) break;
}
}

if (need_new_owner(mm)) {
/* We'll almost never get here */
for_each_process(p) {
try_give_mm_ownership(p);
if (!need_new_owner(mm)) break;
}
}

Paul
--

To: Paul Menage <menage@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Wednesday, March 26, 2008 - 7:41 am

Hmmm.. the 99.9% of the time is just guess work (not measured, could be possibly
true). I see and understand your code below. But before I try and implement
something like that, I was wondering why zap_threads() does not have that
heuristic. That should explain my inhibition.

Can anyone elaborate on zap_threads further?

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-mm@...>, Hugh Dickins <hugh@...>, Sudhir Kumar <skumar@...>, YAMAMOTO Takashi <yamamoto@...>, <lizf@...>, <linux-kernel@...>, <taka@...>, David Rientjes <rientjes@...>, Pavel Emelianov <xemul@...>, Andrew Morton <akpm@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Wednesday, March 26, 2008 - 11:21 am

zap_threads() has to find *all* the other users, whereas in this case
we only have to find one other user.

Paul
--

Previous thread: MMC/SD: data CRC errors and unknown SCR version on 2.6.14 by Martin B. Andersen on Monday, March 24, 2008 - 7:39 am. (2 messages)

Next thread: [PATCH] x86_64: resize NR_IRQS for large machines by Alan Mayer on Monday, March 24, 2008 - 10:31 am. (8 messages)