Re: [PATCH 1/1] mm_owner: fix cgroup null dereference

Previous thread: [2.6.27-rc3] XFS logbsize/logbufs option problem... by Daniel J Blueman on Thursday, August 14, 2008 - 12:55 pm. (2 messages)

Next thread: [PATCH] perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds by Aristeu Rozanski on Thursday, August 14, 2008 - 1:32 pm. (6 messages)
From: Jiri Slaby
Date: Thursday, August 14, 2008 - 1:16 pm

Hi,

found this in mmotm, a fix for
mm-owner-fix-race-between-swap-and-exit.patch

--

mm->owner is set to NULL prior to calling cgroup_mm_owner_callbacks,
but it should be set after that to not pass NULL pointer as the old
owner which otherwise results in an oops (shortened):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000580
Oops: 0000 [1] SMP
Pid: 3396, comm: nscd Tainted: G        W 2.6.27-rc3-mm1_64 #439
RIP: 0010:[<ffffffff8027035a>]  [<ffffffff8027035a>] cgroup_mm_owner_callbacks+0x3a/0x90
RAX: 0000000000000000 RBX: ffffffff80589720 RCX: ffff880079f503e8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff806c36e0
RBP: ffff880078291bd8 R08: ffff880078290000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800787f8e20
Call Trace:
 [<ffffffff8023d42a>] mm_update_next_owner+0x1ca/0x240
 [<ffffffff8023d5aa>] exit_mm+0x10a/0x150
 [<ffffffff8023f1fc>] do_exit+0x1dc/0x940
Code: 89 fe 41 55 49 89 f5 41 54 53 74 68 48 c7 c3 20 97 58 80 45 31 e4 0f 1f 00 48 8b 3b 4d 85 ed 48 63 47 58 48 8d 14 c5 00 00 00 00 <49> 8b 86 80 05 00 00 48 8d 44 10 38 48 8b 00 48 8b 30 74 12 49

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 kernel/exit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3f47470..3a2a42a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -637,8 +637,8 @@ retry:
 	 * the callback and take action
 	 */
 	down_write(&mm->mmap_sem);
-	mm->owner = NULL;
 	cgroup_mm_owner_callbacks(mm->owner, NULL);
+	mm->owner = NULL;
 	up_write(&mm->mmap_sem);
 	return;
 
-- 
1.5.6.5

--

From: Balbir Singh
Date: Thursday, August 14, 2008 - 1:58 pm

Thanks for catching this

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Balbir
--

From: Jiri Slaby
Date: Monday, August 18, 2008 - 2:22 pm

Hmm, unfortunately there is one more issue with this patch.
memrlimit_cgroup_subsys doesn't expect NULL new cgroup and NULL task.
memrlimit_cgroup_mm_owner_changed blows up (called from
cgroup_mm_owner_callbacks as ss->mm_owner_changed).

I'm out for few days, please solve it somehow. Full oops message follows.

BUG: unable to handle kernel NULL pointer dereference at 00000000000004ac
IP: [<ffffffff8056c469>] _spin_lock+0x9/0x20
PGD 3ef84067 PUD 0
Oops: 0002 [2] SMP
last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
CPU 0
Modules linked in: usblp ath5k mac80211 cfg80211 arc4 ecb cryptomgr aead
crypto_blkcipher crypto_algapi usbhid ohci1394 hid led_class ieee1394 evdev
rtc_cmos floppy ff_memless [last unloaded: cfg80211]
Pid: 27360, comm: automount Tainted: G      D W 2.6.27-rc3-mm1_64 #441
RIP: 0010:[<ffffffff8056c469>]  [<ffffffff8056c469>] _spin_lock+0x9/0x20
RSP: 0018:ffff88007839fd98  EFLAGS: 00010282
RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff807d5f28 RDI: 00000000000004ac
RBP: ffff88007839fd98 R08: 00000000cc62b515 R09: 00000000f8f36e74
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000004ac
R13: ffffffff807d5f28 R14: ffff880047e2e500 R15: ffff880047e2e680
FS:  00007fca50fa66f0(0000) GS:ffffffff8070a480(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000004ac CR3: 000000004510f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process automount (pid: 27360, threadinfo ffff88007839e000, task ffff880047e2e500)
Stack:  ffff88007839fdb8 ffffffff80238ab3 ffffffff80589750 0000000000000000
 ffff88007839fde8 ffffffff802b3f4b ffff8800010205b0 ffffffff80589750
 0000000000000000 0000000000000000 ffff88007839fe18 ffffffff80270446
Call Trace:
 [<ffffffff80238ab3>] get_task_mm+0x23/0x60
 [<ffffffff802b3f4b>] memrlimit_cgroup_mm_owner_changed+0x1b/0x80
 ...
From: Balbir Singh
Date: Monday, August 18, 2008 - 8:37 pm

Could you please help me with the steps to reproduce the problem.  I don't seem
to be hitting the mm->owner changed callback. I did have a test case for it when


-- 
	Balbir
--

From: Jiri Slaby
Date: Tuesday, August 19, 2008 - 2:49 am

I have no idea. My config is at:
http://decibel.fi.muni.cz/~xslaby/config-memrlimit-oops
I don't play with cgroups or anything, I just work on the system. Do you need a
test case, it's obvious from the code as far as I can see?
--

From: Balbir Singh
Date: Tuesday, August 19, 2008 - 3:36 am

Yes, the problem is obvious, but I usually use small test cases or test
scenarios to verify that the problem is fixed.

-- 
	Balbir
--

From: Balbir Singh
Date: Tuesday, August 19, 2008 - 7:13 am

Does the patch below fix your problem, it's against mmotm 19th August
2008.

 
Reported-by: jirislaby@gmail.com

Jiri reported a problem and saw an oops when the memrlimit-fix-race-with-swap
patch is applied. He sent his patch on top to fix the problem, but ran into
another issue. The root cause of the problem is that we are not suppose
to call task_cgroup on NULL tasks. This patch reverts Jiri's patch and
does not call task_cgroup if the passed task_struct (old) is NULL.


Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 kernel/cgroup.c |    5 +++--
 kernel/exit.c   |    2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff -puN kernel/exit.c~memrlimit-fix-race-with-swap-oops kernel/exit.c
--- linux-2.6.27-rc3/kernel/exit.c~memrlimit-fix-race-with-swap-oops	2008-08-19 18:50:39.000000000 +0530
+++ linux-2.6.27-rc3-balbir/kernel/exit.c	2008-08-19 18:51:05.000000000 +0530
@@ -641,8 +641,8 @@ retry:
 	 * the callback and take action
 	 */
 	down_write(&mm->mmap_sem);
-	cgroup_mm_owner_callbacks(mm->owner, NULL);
 	mm->owner = NULL;
+	cgroup_mm_owner_callbacks(mm->owner, NULL);
 	up_write(&mm->mmap_sem);
 	return;
 
diff -puN kernel/cgroup.c~memrlimit-fix-race-with-swap-oops kernel/cgroup.c
--- linux-2.6.27-rc3/kernel/cgroup.c~memrlimit-fix-race-with-swap-oops	2008-08-19 18:50:39.000000000 +0530
+++ linux-2.6.27-rc3-balbir/kernel/cgroup.c	2008-08-19 18:55:38.000000000 +0530
@@ -2743,13 +2743,14 @@ void cgroup_fork_callbacks(struct task_s
  */
 void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
 {
-	struct cgroup *oldcgrp, *newcgrp = NULL;
+	struct cgroup *oldcgrp = NULL, *newcgrp = NULL;
 
 	if (need_mm_owner_callback) {
 		int i;
 		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
-			oldcgrp = task_cgroup(old, ss->subsys_id);
+			if (old)
+				oldcgrp = task_cgroup(old, ss->subsys_id);
 			if (new)
 				newcgrp = task_cgroup(new, ss->subsys_id);
 			if (oldcgrp == ...
From: Li Zefan
Date: Tuesday, August 19, 2008 - 10:20 pm

I just triggered this bug. I think you also need the following change:

make memrlimit_cgroup_mm_owner_changed() aware that old_cgrp can be NULL,
and note we can't call memrlimit_cgroup_from_cgrp() with NULL argument.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

diff --git a/mm/memrlimitcgroup.c.orig b/mm/memrlimitcgroup.c
index f7536dc..6559470 100644
--- a/mm/memrlimitcgroup.c.orig
+++ b/mm/memrlimitcgroup.c
@@ -249,17 +249,23 @@ static void memrlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
 	struct mm_struct *mm = get_task_mm(p);
 
 	BUG_ON(!mm);
-	memrcg = memrlimit_cgroup_from_cgrp(cgrp);
-	old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
 
 	/*
 	 * If we don't have a new cgroup, we just uncharge from the old one.
 	 * It means that the task is going away
 	 */
-	if (memrcg &&
-	    res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
-		goto out;
-	res_counter_uncharge(&old_memrcg->as_res, (mm->total_vm << PAGE_SHIFT));
+	if (cgrp) {
+		memrcg = memrlimit_cgroup_from_cgrp(cgrp);
+		if (res_counter_charge(&memrcg->as_res,
+				       mm->total_vm << PAGE_SHIFT))
+			goto out;
+	}
+
+	if (old_cgrp) {
+		old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
+		res_counter_uncharge(&old_memrcg->as_res,
+				     mm->total_vm <<PAGE_SHIFT);
+	}
 out:
 	mmput(mm);

--

From: KAMEZAWA Hiroyuki
Date: Tuesday, August 19, 2008 - 11:59 pm

On Wed, 20 Aug 2008 13:20:40 +0800

Thank you, it seems you fixed a problem which I met in my test.

(*) When I migrate task under cpuset repeatedly, I can see Oops easily.


--

Previous thread: [2.6.27-rc3] XFS logbsize/logbufs option problem... by Daniel J Blueman on Thursday, August 14, 2008 - 12:55 pm. (2 mes