This is the fourth version of the address space control patches. These patches are against 2.6.26-rc2-mm1 and have been tested using KVM in SMP mode, both with and without the config enabled, on a powerpc box and using UML. The patches have also been compile tested with the config disabled on a powerpc box. The goal of this patch is to implement a virtual address space controller using cgroups. The documentation describes the controller, it's goal and usage in further details. The first patch adds the user interface. The second patch fixes the cgroup mm_owner_changed callback to pass the task struct, so that accounting can be adjusted on owner changes. The thrid patch adds accounting and control. The fourth patch updates documentation. An earlier post of the patchset can be found at http://lwn.net/Articles/275143/ This patch is built on top of the mm owner patches and utilizes that feature to virtually group tasks by mm_struct. Reviews, Comments? Changelog 1. Add documentation upfront 2. Refactor the code (error handling and changes to improvde code) 3. Protect reading of total_vm with mmap_sem 4. Port to 2.6.26-rc2 5. Changed the name from rlimit to memrlimit Series memrlimit-controller-add-documentation.patch memrlimit-controller-setup.patch cgroup-add-task-to-mm-owner-callbacks.patch memrlimit-controller-address-space-accounting-and-control.patch -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
Documentation patch - describes the goals and usage of the memrlimit controller. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- Documentation/controllers/memrlimit.txt | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff -puN /dev/null Documentation/controllers/memrlimit.txt --- /dev/null 2008-05-14 04:27:30.032276540 +0530 +++ linux-2.6.26-rc2-balbir/Documentation/controllers/memrlimit.txt 2008-05-14 18:35:55.000000000 +0530 @@ -0,0 +1,29 @@ +This controller is enabled by the CONFIG_CGROUP_MEMRLIMIT_CTLR option. Prior +to reading this documentation please read Documentation/cgroups.txt and +Documentation/controllers/memory.txt. Several of the principles of this +controller are similar to the memory resource controller. + +This controller framework is designed to be extensible to control any +memory resource limit with little effort. + +This new controller, controls the address space expansion of the tasks +belonging to a cgroup. Address space control is provided along the same lines as +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2). +The interface for controlling address space is provided through +"rlimit.limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t. the user +interface. Please see section 3 of the memory resource controller documentation +for more details on how to use the user interface to get and set values. + +The "memrlimit.usage_in_bytes" file provides information about the total address +space usage of the tasks in the cgroup, in bytes. + +Advantages of providing this feature + +1. Control over virtual address space allows for a cgroup to fail gracefully + i.e., via a malloc or mmap failure as compared to OOM kill when no + pages can be reclaimed. +2. It provides better control over how many pages can be swapped out when + the cgroup goes over its limit. A badly setup cgroup can cause excessive + swapping. Providing control over the address space allocations ensures + ...
Do you mean by this, limiting the number of pagetable pages (that are pinned in memory), this preventing oom by a cgroup that instantiates many pagetables? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. --
This is not for page tables (that is in the long term TODO list). This is more for user space calls to mmap(), malloc() or anything that causes the total virtual memory of the process to go up (in our case cgroups). The motivation is similar to the motivations of RLIMIT_AS. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
This patch sets up the rlimit cgroup controller. It adds the basic create, destroy and populate functionality. The user interface provided is very similar to the memory resource controller. The rlimit controller can be enhanced easily in the future to control mlocked pages. Changelog v3->v4 1. Use PAGE_ALIGN() 2. Rename rlimit to memrlimit Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/cgroup_subsys.h | 4 + include/linux/memrlimitcgroup.h | 19 +++++ init/Kconfig | 10 ++ mm/Makefile | 1 mm/memrlimitcgroup.c | 144 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 178 insertions(+) diff -puN include/linux/cgroup_subsys.h~memrlimit-controller-setup include/linux/cgroup_subsys.h --- linux-2.6.26-rc2/include/linux/cgroup_subsys.h~memrlimit-controller-setup 2008-05-14 18:36:36.000000000 +0530 +++ linux-2.6.26-rc2-balbir/include/linux/cgroup_subsys.h 2008-05-14 18:36:36.000000000 +0530 @@ -47,4 +47,8 @@ SUBSYS(mem_cgroup) SUBSYS(devices) #endif +#ifdef CONFIG_CGROUP_MEMRLIMIT_CTLR +SUBSYS(memrlimit_cgroup) +#endif + /* */ diff -puN /dev/null include/linux/memrlimitcgroup.h --- /dev/null 2008-05-14 04:27:30.032276540 +0530 +++ linux-2.6.26-rc2-balbir/include/linux/memrlimitcgroup.h 2008-05-14 18:36:36.000000000 +0530 @@ -0,0 +1,19 @@ +/* + * Copyright
Acked-by: Pavel Emelyanov <xemul@openvz.org> --
This patch adds an additional field to the mm_owner callbacks. This field
is required to get to the mm that changed.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
include/linux/cgroup.h | 3 ++-
kernel/cgroup.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff -puN include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks include/linux/cgroup.h
--- linux-2.6.26-rc2/include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks 2008-05-14 18:36:59.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/cgroup.h 2008-05-14 18:36:59.000000000 +0530
@@ -310,7 +310,8 @@ struct cgroup_subsys {
*/
void (*mm_owner_changed)(struct cgroup_subsys *ss,
struct cgroup *old,
- struct cgroup *new);
+ struct cgroup *new,
+ struct task_struct *p);
int subsys_id;
int active;
int disabled;
diff -puN kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks kernel/cgroup.c
--- linux-2.6.26-rc2/kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks 2008-05-14 18:36:59.000000000 +0530
+++ linux-2.6.26-rc2-balbir/kernel/cgroup.c 2008-05-14 18:36:59.000000000 +0530
@@ -2772,7 +2772,7 @@ void cgroup_mm_owner_callbacks(struct ta
if (oldcgrp == newcgrp)
continue;
if (ss->mm_owner_changed)
- ss->mm_owner_changed(ss, oldcgrp, newcgrp);
+ ss->mm_owner_changed(ss, oldcgrp, newcgrp, new);
}
}
}
_
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
This patch adds support for accounting and control of virtual address space
limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions.
The core of the accounting takes place during fork time in copy_process(),
may_expand_vm(), remove_vma_list() and exit_mmap(). There are some special
cases that are handled here as well (arch/ia64/kernel/perform.c,
arch/x86/kernel/ptrace.c, insert_special_mapping())
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
arch/ia64/kernel/perfmon.c | 6 ++
arch/x86/kernel/ptrace.c | 17 +++++--
fs/exec.c | 5 ++
include/linux/memrlimitcgroup.h | 21 ++++++++
kernel/fork.c | 8 +++
mm/memrlimitcgroup.c | 94 ++++++++++++++++++++++++++++++++++++++++
mm/mmap.c | 11 ++++
7 files changed, 157 insertions(+), 5 deletions(-)
diff -puN arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control arch/ia64/kernel/perfmon.c
--- linux-2.6.26-rc2/arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control 2008-05-14 18:09:32.000000000 +0530
+++ linux-2.6.26-rc2-balbir/arch/ia64/kernel/perfmon.c 2008-05-14 18:09:32.000000000 +0530
@@ -40,6 +40,7 @@
#include <linux/capability.h>
#include <linux/rcupdate.h>
#include <linux/completion.h>
+#include <linux/memrlimitcgroup.h>
#include <asm/errno.h>
#include <asm/intrinsics.h>
@@ -2294,6 +2295,9 @@ pfm_smpl_buffer_alloc(struct task_struct
DPRINT(("sampling buffer rsize=%lu size=%lu bytes\n", rsize, size));
+ if (memrlimit_cgroup_charge_as(mm, size >> PAGE_SHIFT))
+ return -ENOMEM;
+
/*
* check requested size to avoid Denial-of-service attacks
* XXX: may have to refine this test
@@ -2313,6 +2317,7 @@ pfm_smpl_buffer_alloc(struct task_struct
smpl_buf = pfm_rvmalloc(size);
if (smpl_buf == NULL) {
DPRINT(("Can't allocate sampling ...* Balbir Singh <balbir@linux.vnet.ibm.com> [2008-05-14 18:39:51]: Here's a better version of the patch. ptrace.c changes have a bug, where uncharge is called unconditionally. We now check for ret < 0. The two signed-off-by's by the same person did not look good either :) Description This patch adds support for accounting and control of virtual address space limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions. The core of the accounting takes place during fork time in copy_process(), may_expand_vm(), remove_vma_list() and exit_mmap(). There are some special cases that are handled here as well (arch/ia64/kernel/perform.c, arch/x86/kernel/ptrace.c, insert_special_mapping()) Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- arch/ia64/kernel/perfmon.c | 6 ++ arch/x86/kernel/ptrace.c | 18 +++++-- fs/exec.c | 5 ++ include/linux/memrlimitcgroup.h | 21 ++++++++ kernel/fork.c | 8 +++ mm/memrlimitcgroup.c | 94 ++++++++++++++++++++++++++++++++++++++++ mm/mmap.c | 11 ++++ 7 files changed, 158 insertions(+), 5 deletions(-) diff -puN arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control arch/ia64/kernel/perfmon.c --- linux-2.6.26-rc2/arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control 2008-05-14 18:37:11.000000000 +0530 +++ linux-2.6.26-rc2-balbir/arch/ia64/kernel/perfmon.c 2008-05-14 18:37:11.000000000 +0530 @@ -40,6 +40,7 @@ #include <linux/capability.h> #include <linux/rcupdate.h> #include <linux/completion.h> +#include <linux/memrlimitcgroup.h> #include <asm/errno.h> #include <asm/intrinsics.h> @@ -2294,6 +2295,9 @@ pfm_smpl_buffer_alloc(struct task_struct DPRINT(("sampling buffer rsize=%lu size=%lu bytes\n", rsize, size)); + if (memrlimit_cgroup_charge_as(mm, size >> PAGE_SHIFT)) + return -ENOMEM; + /* * check requested size to avoid ...
Assuming that we're holding a write lock on mm->mmap_sem here, and we
additionally hold mmap_sem for the whole of mm_update_next_owner(),
then maybe we don't need any extra synchronization here? Something
like simply:
int memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
{
struct memrlimit_cgroup *memrcg = memrlimit_cgroup_from_task(mm->owner);
return res_counter_charge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
}
Seems good to minimize additional synchronization on the fast path.
The only thing that's still broken is that the task_struct.cgroups
pointer gets updated only under the synchronization of task_lock(), so
we've still got the race of:
A: attach_task() updates B->cgroups
B: memrlimit_cgroup_charge_as() charges the new res counter and
updates mm->total_vm
A: memrlimit_cgroup_move_task() moves mm->total_vm from the old
counter to the new counter
Here's one way I see to fix this:
We change attach_task() so that rather than updating the
task_struct.cgroups pointer once from the original css_set to the
final css_set, it goes through a series of intermediate css_set
structures, one for each subsystem in the hierarchy, transitioning
from the old set to the final set. Then for each subsystem ss, it
would do:
next_css = <old css with pointer for ss updated>
if (ss->attach) {
ss->attach(ss, p, next_css);
} else {
task_lock(p);
rcu_assign_ptr(p->cgroups, next_css);
task_unlock(p);
}
i.e. the subsystem would be free to implement any synchronization it
desired in the attach() code. The attach() method's responsibility
would be to ensure that p->cgroups was updated to point to next_css
before returning. This should make it much simpler for a subsystem to
handle potential races between attach() and accounting. The current
semantics of can_attach()/update/attach() are sufficient for cpusets,
but probably not for systems with more complex accounting. I'd still
need to figure out a nice way to get the kind of ...The charge_as routine is not always called with mmap_sem held, since
the undo path gets more complicated under the lock. We already have
our own locking mechanism for the counters. We're not really accessing
any member of the mm here except the owner. Do we need to be called
A transaction manager would be great. We do the
mm_update_owner_changes under the task_lock(), may be the attach
Sure, I will. One of things we want to ensure is that mm->owner does
Taking the mmap_sem here would mean, we would need to document
(something I should have done earlier), that mmap_sem nests under
cgroup_mutex
Looking at the mm->owner patches, I am thinking of writing down a race
scenario card
Race conditions
R1: mm->owner can change dynamically under task_lock
R2: mm->owner's cgroup can change under cgroup_mutex
Read Write
mm->owner Prevent hold task_lock
cgroup from
changing
Prevent owner
from changing
Scenarios requiring protection/consistency
R1: causes no problem, since we expect to make appropriate
adjustment in mm_owner_changed
R2: Is handled by the attach() callback
Which leaves us with the following conclusion
We don't have move_task(), mm_owner_changed() and charge/uncharge()
running in parallel at the same time.
If we agree with the assertion/conclusion above, then a simple lock
might be able to protect us, assuming that it does not create a
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
On Wed, May 14, 2008 at 11:17 PM, Balbir Singh I'm not worried about the counters themselves being inconsistent - I'm worried about the case where charge_as() is called in the middle of the attach operation, and we account the charge X to the new cgroup's res_counter and update mm->total_vm, and then when we do the move, we charge the whole of mm->total_mm to the new cgroup even though the last charge was already accounted to the new res_counter, not the old one. That's what I'm hoping to address with the idea of splitting the attach into one update per subsystem, and letting the subsystems Not necessarily mmap_sem, but there needs to be something to ensure that the update to mm->total_vm and the charge/uncharge against the res_counter are an atomic pair with respect to the code that shifts an mm between two cgroups, either due to mm->owner change or due to an attach_task(). Since mmap_sem is held for write on almost all the fast path calls to the rlimit_as charge/uncharge functions, using that for the synchronization avoids the need for any additional synchronization in the fast path. Can you say more about the complications of holding a write lock on Yes - but new_css_set will be slightly different for each callback. Specifically, it will differ from the existing set pointed to by p->cgroups in the pointer for this particular subsystem. So the task will move over in a staggered fashion, and each subsystem will get to Right - and if we can make that lock be the mmap_sem of the mm in question, we avoid introducing a new lock into the fast path. Paul --
I want to focus on this conclusion/assertion, since it takes care of most of the locking related discussion above, unless I missed something. My concern with using mmap_sem, is that 1. It's highly contended (every page fault, vma change, etc) 2. It's going to make the locking hierarchy deeper and complex 3. It's not appropriate to call all the accounting callbacks with the mmap_sem() held, since the undo operations _can get_ complicated at the caller. I would prefer introducing a new lock, so that other subsystems are not affected. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
On Thu, May 15, 2008 at 12:03 AM, Balbir Singh But the only *new* cases of taking the mmap_sem that this would introduce would be: - on a failed vm limit charge - when a task exit/exec causes an mm ownership change - when a task moves between two cgroups in the memrlimit hierarchy. All of these should be rare events, so I don't think the additional Yes, potentially. But if the upside of that is that we eliminate a lock/unlock on a shared lock on every mmap/munmap call, it might well For getting the first cut of the memrlimit controller working this may well make sense. But it would be nice to avoid it longer-term. Paul --
Why a failed charge? Aren't we talking of moving all charge/uncharge Yes, this would nest cgroup_mutex and mmap_sem. Not sure if that would We do make several of all charge calls under the mmap_sem, but not Some paths of the uncharge are not under mmap_sem. Undoing the OK, so here's what I am going to try and do Refactor the code to try and use mmap_sem and see what I come up with. Basically use mmap_sem for all charge/uncharge operations as well use mmap_sem in read_mode in the move_task() and mm_owner_changed() callbacks. That should take care of the race conditions discussed, unless I missed something. Try and instrument insert_vm_struct() for charge/uncharge -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
Sorry, I worded that wrongly - I meant "cleaning up a successful charge after an expansion fails for other reasons" I thought that all the charges and most of the uncharges were already under mmap_sem, and it would just be a few of the cleanup paths that I think it's already nested that way - e.g. the cpusets code can call various migration functions (which take mmap_sem) while holding Sounds good. Thanks, Paul --
-- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
I've revamped the last two patches. Please review
This patch adds an additional field to the mm_owner callbacks. This field
is required to get to the mm that changed. Hold mmap_sem in write mode
before calling the mm_owner_changed callback
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
include/linux/cgroup.h | 3 ++-
kernel/cgroup.c | 4 +++-
kernel/exit.c | 3 +++
3 files changed, 8 insertions(+), 2 deletions(-)
diff -puN include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks include/linux/cgroup.h
--- linux-2.6.26-rc2/include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks 2008-05-14 18:36:59.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/cgroup.h 2008-05-14 18:36:59.000000000 +0530
@@ -310,7 +310,8 @@ struct cgroup_subsys {
*/
void (*mm_owner_changed)(struct cgroup_subsys *ss,
struct cgroup *old,
- struct cgroup *new);
+ struct cgroup *new,
+ struct task_struct *p);
int subsys_id;
int active;
int disabled;
diff -puN kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks kernel/cgroup.c
--- linux-2.6.26-rc2/kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks 2008-05-14 18:36:59.000000000 +0530
+++ linux-2.6.26-rc2-balbir/kernel/cgroup.c 2008-05-17 22:09:57.000000000 +0530
@@ -2758,6 +2758,8 @@ void cgroup_fork_callbacks(struct task_s
* Called on every change to mm->owner. mm_init_owner() does not
* invoke this routine, since it assigns the mm->owner the first time
* and does not change it.
+ *
+ * The callbacks are invoked with mmap_sem held in read mode.
*/
void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
{
@@ -2772,7 +2774,7 @@ void cgroup_mm_owner_callbacks(struct ta
if (oldcgrp == newcgrp)
continue;
if (ss->mm_owner_changed)
- ss->mm_owner_changed(ss, oldcgrp, newcgrp);
+ ss->mm_owner_changed(ss, oldcgrp, newcgrp, new);
}
}
}
diff -puN kernel/exit.c~cgroup-add-task-to-mm-owner-callbacks ...Here's the last patch for review
This patch adds support for accounting and control of virtual address space
limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions.
The core of the accounting takes place during fork time in copy_process(),
may_expand_vm(), remove_vma_list() and exit_mmap().
Changelog v5->v4
Move specific hooks in code to insert_vm_struct
Use mmap_sem to protect mm->owner from changing and mm->owner from
changing cgroups.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
arch/x86/kernel/ptrace.c | 18 +++++--
include/linux/memrlimitcgroup.h | 21 +++++++++
kernel/fork.c | 8 +++
mm/memrlimitcgroup.c | 92 ++++++++++++++++++++++++++++++++++++++++
mm/mmap.c | 17 ++++++-
5 files changed, 149 insertions(+), 7 deletions(-)
diff -puN arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control arch/ia64/kernel/perfmon.c
diff -puN arch/x86/kernel/ds.c~memrlimit-controller-address-space-accounting-and-control arch/x86/kernel/ds.c
diff -puN fs/exec.c~memrlimit-controller-address-space-accounting-and-control fs/exec.c
diff -puN include/linux/memrlimitcgroup.h~memrlimit-controller-address-space-accounting-and-control include/linux/memrlimitcgroup.h
--- linux-2.6.26-rc2/include/linux/memrlimitcgroup.h~memrlimit-controller-address-space-accounting-and-control 2008-05-17 23:14:53.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/memrlimitcgroup.h 2008-05-17 23:14:53.000000000 +0530
@@ -16,4 +16,25 @@
#ifndef LINUX_MEMRLIMITCGROUP_H
#define LINUX_MEMRLIMITCGROUP_H
+#ifdef CONFIG_CGROUP_MEMRLIMIT_CTLR
+
+int memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages);
+void memrlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages);
+
+#else /* !CONFIG_CGROUP_RLIMIT_CTLR */
+
+static inline int
+memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
+{
+ return ...AFAIS you didn't cover all the cases when VM expands. At least all the arch/ia64/ia32/binfmt_elf32.c is missed. I'd insert this charge into insert_vm_struct. This would a) cover all of the missed cases and b) reduce the amount of places to patch. [snip the rest of the patch] --
I thought I have those covered. insert_vm_struct() is called from places that we have covered in this patch. As far as arch/ia64/ia32/binfmt_elf32.c is concerned, it inserts a GDT, LDT and does not change total_vm. Having said that, I am not against putting the hooks in insert_vm_struct(). -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
On Wed, 14 May 2008 18:39:51 +0530
Large changes in linux-next's arch/x86/kernel/ptrace.c caused damage to
the memrlimit patches.
I decided to retain the patches because it looks repairable. The
problem is this reject from
memrlimit-add-memrlimit-controller-accounting-and-control.patch:
***************
*** 808,828 ****
current->mm->total_vm -= old_size;
current->mm->locked_vm -= old_size;
if (size == 0)
goto out;
rlim = current->signal->rlim[RLIMIT_AS].rlim_cur >> PAGE_SHIFT;
vm = current->mm->total_vm + size;
if (rlim < vm) {
ret = -ENOMEM;
if (!reduce_size)
- goto out;
size = rlim - current->mm->total_vm;
if (size <= 0)
- goto out;
}
rlim = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT;
--- 809,833 ----
current->mm->total_vm -= old_size;
current->mm->locked_vm -= old_size;
+ memrlimit_cgroup_uncharge_as(mm, old_size);
if (size == 0)
goto out;
+ if (memrlimit_cgroup_charge_as(current->mm, size))
+ goto out;
+
rlim = current->signal->rlim[RLIMIT_AS].rlim_cur >> PAGE_SHIFT;
vm = current->mm->total_vm + size;
if (rlim < vm) {
ret = -ENOMEM;
if (!reduce_size)
+ goto out_uncharge;
size = rlim - current->mm->total_vm;
if (size <= 0)
+ goto out_uncharge;
}
rlim = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT;
***************
*** 831,851 ****
ret = -ENOMEM;
if (!reduce_size)
- goto out;
size = rlim - current->mm->locked_vm;
if (size <= 0)
- goto out;
}
ret = ds_allocate((void **)&child->thread.ds_area_msr,
size << PAGE_SHIFT);
if (ret < 0)
- goto out;
current->mm->total_vm += size;
current->mm->locked_vm += size;
out:
if (child->thread.ds_area_msr)
set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
--- 836,859 ----
ret = -ENOMEM;
if (!reduce_size)
+ goto out_uncharge;
...I'll take a look tonight and see what needs to be done -- Balbir --
Andrew, I could not apply mmotm to linux-next (both downloaded right now). I applied the patches one-by-one resolving differences starting from #mm in the series file. Here is my fixed version of the patch, I compiled the patch, but could not run it, since I could not create the full series of applied patches. I compiled arch/x86/kernel/ds.o and ptrace.o. I've included the patch below, please let me know if the code looks OK (via review) and the patch applies. I'll test it once I can resonably resolve all conflicts between linux-next and mmotm. From: Balbir Singh <balbir@linux.vnet.ibm.com> This patch adds support for accounting and control of virtual address space limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions. The core of the accounting takes place during fork time in copy_process(), may_expand_vm(), remove_vma_list() and exit_mmap(). Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/x86/kernel/ptrace.c | 1 include/linux/memrlimitcgroup.h | 21 ++++++ kernel/fork.c | 14 ++++ mm/memrlimitcgroup.c | 92 ++++++++++++++++++++++++++++++ mm/mmap.c | 17 ++++- 5 files changed, 142 insertions(+), 3 deletions(-) Index: linux-next/include/linux/memrlimitcgroup.h =================================================================== --- linux-next.orig/include/linux/memrlimitcgroup.h 2008-09-18 23:19:26.000000000 -0700 +++ linux-next/include/linux/memrlimitcgroup.h 2008-09-18 ...
On Thu, 18 Sep 2008 23:38:23 -0700 mmotm includes linux-next.patch. mmotm is based upon the most recent 2.6.x-rcy. This is the only way to do it - I often have to change linux-next.patch due to rejects and it's unreasonable to expect people to base off the OK, we'll give it a shot, thanks. --
Thanks for the info Thanks! -- Balbir --
