Re: [PATCH 1/4] INIT_TASK() should initialize ->thread_group list

Previous thread: [PATCH] Staging: wlags49_h2: fixed C99 comments style issues in wl_profile.c by Prashant P. Shah on Sunday, May 9, 2010 - 10:12 am. (3 messages)

Next thread: [PATCH] thinkpad_acpi: add support for thinkpad x100e by Andrej Gelenberg on Sunday, May 9, 2010 - 12:05 pm. (2 messages)
From: Oleg Nesterov
Date: Sunday, May 9, 2010 - 11:45 am

sorry for delay, vacation.


sorry again, I'll try to comment this later...


Thanks Mathias.

I think this should be fixed anyway. Could you try the patch below?

In any case swapper should be immune to signals, and its ->thread_group

Looks like, you kernel is old. Any chance you can also test the recent

setpgid(0,0) just moves the caller's pgrp from PGID 0, that is why it


Thanks.

I didn't try it, but it looks overcomplicated to trigger this bug, or
I missed something? Afaics, init could be just

	int main(void)
	{
		kill(0, SIGGKILL);
	}

No?

Oleg.

We should also change INIT_SIGHAND, but _hopefully_ this is enough
to fix the crash.

--- x/include/linux/init_task.h
+++ x/include/linux/init_task.h
@@ -172,6 +172,7 @@ extern struct cred init_cred;
 		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
 		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
 	},								\
+	.thread_group	= LIST_HEAD_INIT(tsk.thread_group),		\
 	.dirties = INIT_PROP_LOCAL_SINGLE(dirties),			\
 	INIT_IDS							\
 	INIT_PERF_EVENTS(tsk)						\


--

From: Oleg Nesterov
Date: Sunday, May 9, 2010 - 12:06 pm

forgot to mention...


Also. I strongly believe we should change INIT_STRUCT_PID.tasks, it
should be hlist_empty(). But this needs another discussion.

Oleg.

--

From: Mathias Krause
Date: Monday, May 10, 2010 - 12:20 am

Hello  Oleg,




It's old because it's the result of bisecting the cause of the problem.
It's actually some 2.6.24 kernel but I could reproduce the bug with


Yes, sure. Killing the process group, while having a PGID of 0 are the
only prerequisites to trigger this bug. In my example I forked a child
and let it do the call to kill to not have init  (PID 1) beeing killed,
too. The kernel doesn't like that. :)

This works for me. Thanks.

Tested-by: Mathias Krause <mathias.krause@secunet.com>
--

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 12:49 pm

Hello,

Mathias Krause reports that a buggy (or special) /sbin/init can
crash the kernel if it sends a signal to its pgrp/sid before it
changes its initial (0,0) pids. See the changelog for 1/4.

git-bisect blames "start the global /sbin/init with 0,0 special pids"
commit 430c623121ea88ca80595c99fdc63b7f8a803ae5, but in fact the
problem was caused by another change, see 2/4.

The patches do not depend on each other, 3/4 fixes another problem,
4/4 is purely cosmetic.

Oleg.

--

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 12:49 pm

The trivial /sbin/init doing

	int main(void)
	{
		kill(0, SIGKILL)
	}

crashes the kernel.

This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL
to the swapper process which runs with the uninitialized ->thread_group.

Change INIT_TASK() to initialize ->thread_group properly.

Note: the real problem is that the swapper process must not be visible to
signals, see the next patch. But this change is right anyway and fixes
the crash.

Reported-and-tested-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/init_task.h |    1 +
 1 file changed, 1 insertion(+)

--- 34-rc1/include/linux/init_task.h~1_INIT_TASK_THREAD_GROUP	2010-05-10 19:44:19.000000000 +0200
+++ 34-rc1/include/linux/init_task.h	2010-05-10 19:45:27.000000000 +0200
@@ -172,6 +172,7 @@ extern struct cred init_cred;
 		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
 		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
 	},								\
+	.thread_group	= LIST_HEAD_INIT(tsk.thread_group),		\
 	.dirties = INIT_PROP_LOCAL_SINGLE(dirties),			\
 	INIT_IDS							\
 	INIT_PERF_EVENTS(tsk)						\

--

From: Serge E. Hallyn
Date: Tuesday, May 11, 2010 - 12:52 am

From: Sukadev Bhattiprolu
Date: Tuesday, May 11, 2010 - 7:15 pm

Oleg Nesterov [oleg@redhat.com] wrote:
| The trivial /sbin/init doing
| 
| 	int main(void)
| 	{
| 		kill(0, SIGKILL)
| 	}
| 
| crashes the kernel.

Really subtle. Good catch.

So, now init is not part of any process group until it calls setsid().
So the above SIGKILL is lost right ? - i.e it does not kill even init
itself.

In my quick test, the following init process lives on inspite of the
SIGKILL.

	main()
	{
		kill(0, SIGKILL);

		while(1)
			sleep(1); 
	}

I don't have a better solution. Maybe a hung init is better than a
crashed kernel. the patches look good.

Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
--

From: Oleg Nesterov
Date: Wednesday, May 12, 2010 - 8:54 am

No, no. swapper != init. With or without these patches (more precisely,

Yes, /sbin/init is not killable, that is why it survies.


Yes. if /sbin/init exits the kernel panics. The real test-case shouldn't

Agreed!!! I sent the patch a long ago. But security people do not
like it, they use exit() from init to provoke the crash intentionally.

Thanks!

Oleg.

--

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 12:50 pm

"statically initialize struct pid for swapper" commit 820e45db says:

	Statically initialize a struct pid for the swapper process (pid_t == 0)
	and attach it to init_task.  This is needed so task_pid(), task_pgrp()
	and task_session() interfaces work on the swapper process also.

OK, but:

	- it doesn't make sense to add init_task.pids[].node into
	  init_struct_pid.tasks[], and in fact this just wrong.

	  idle threads are special, they shouldn't be visible on any
	  global list. In particular do_each_pid_task(init_struct_pid)
	  shouldn't see swapper.

	  This is the actual reason why kill(0, SIGKILL) from /sbin/init
	  (which starts with 0,0 special pids) crashes the kernel. The
	  signal sent to pgid/sid == 0 must never see idle threads, even
	  if the previous patch fixed the crash itself.

	- we have other idle threads running on the non-boot CPUs, see
	  the next patch.

Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed
hlist_head/hlist_node. Like any other idle thread swapper can never exit,
so detach_pid()->__hlist_del() is not possible, but we could change
INIT_PID_LINK() to set pprev = &next if needed.

All we need is the valid swapper->pids[].pid == &init_struct_pid.

Reported-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/init_task.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- 34-rc1/include/linux/init_task.h~2_INIT_PID_DONT_LINK_SWAPPER	2010-05-10 19:45:27.000000000 +0200
+++ 34-rc1/include/linux/init_task.h	2010-05-10 20:26:08.000000000 +0200
@@ -53,9 +53,9 @@ extern struct group_info init_groups;
 #define INIT_STRUCT_PID {						\
 	.count 		= ATOMIC_INIT(1),				\
 	.tasks		= {						\
-		{ .first = &init_task.pids[PIDTYPE_PID].node },		\
-		{ .first = &init_task.pids[PIDTYPE_PGID].node },	\
-		{ .first = &init_task.pids[PIDTYPE_SID].node },		\
+		{ .first = NULL },					\
+		{ .first = NULL },					\
+		{ .first = NULL },					\
 ...
From: Serge E. Hallyn
Date: Tuesday, May 11, 2010 - 2:54 am

Crimey, trying to find some way this could get dereferenced, finding
myself impressed with the likes of set_ftrace_swapper().


Acked-by: Serge E. Hallyn <serue@us.ibm.com>

thanks,
-serge
--

From: Oleg Nesterov
Date: Wednesday, May 12, 2010 - 9:03 am

Yes, I was worried too. But afaics we should never use this hlist_node.

Thanks for review!

Oleg.

--

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 12:50 pm

copy_process(pid => &init_struct_pid) doesn't do attach_pid/etc.

It shouldn't, but this means that the idle threads run with the wrong
pids copied from the caller's task_struct. In x86 case the caller is
either kernel_init() thread or keventd.

In particular, this means that after the series of cpu_up/cpu_down an
idle thread (which never exits) can run with .pid pointing to nowhere.

Change fork_idle() to initialize idle->pids[] correctly. We only set
.pid = &init_struct_pid but do not add .node to list, INIT_TASK() does
the same for the boot-cpu idle thread (swapper).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 kernel/fork.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

--- 34-rc1/kernel/fork.c~3_FORK_IDLE_SET_PIDS	2010-03-24 18:07:03.000000000 +0100
+++ 34-rc1/kernel/fork.c	2010-05-10 20:45:33.000000000 +0200
@@ -1339,6 +1339,16 @@ noinline struct pt_regs * __cpuinit __at
 	return regs;
 }
 
+static inline void init_idle_pids(struct pid_link *links)
+{
+	enum pid_type type;
+
+	for (type = PIDTYPE_PID; type < PIDTYPE_MAX; ++type) {
+		INIT_HLIST_NODE(&links[type].node); /* not really needed */
+		links[type].pid = &init_struct_pid;
+	}
+}
+
 struct task_struct * __cpuinit fork_idle(int cpu)
 {
 	struct task_struct *task;
@@ -1346,8 +1356,10 @@ struct task_struct * __cpuinit fork_idle
 
 	task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL,
 			    &init_struct_pid, 0);
-	if (!IS_ERR(task))
+	if (!IS_ERR(task)) {
+		init_idle_pids(task->pids);
 		init_idle(task, cpu);
+	}
 
 	return task;
 }

--

From: Serge E. Hallyn
Date: Tuesday, May 11, 2010 - 1:54 am

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 12:51 pm

Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make
it more readable and grep-friendly.

Note: probably SIG_IGN makes more sense, we could kill ignore_signals().
But then kernel_init() should do flush_signal_handlers() before exec().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/init_task.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 34-rc1/include/linux/init_task.h~4_INIT_SIGHAND_USE_SIG_DFL	2010-05-10 20:26:08.000000000 +0200
+++ 34-rc1/include/linux/init_task.h	2010-05-10 21:06:03.000000000 +0200
@@ -43,7 +43,7 @@ extern struct nsproxy init_nsproxy;
 
 #define INIT_SIGHAND(sighand) {						\
 	.count		= ATOMIC_INIT(1), 				\
-	.action		= { { { .sa_handler = NULL, } }, },		\
+	.action		= { { { .sa_handler = SIG_DFL, } }, },		\
 	.siglock	= __SPIN_LOCK_UNLOCKED(sighand.siglock),	\
 	.signalfd_wqh	= __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh),	\
 }

--

From: Serge E. Hallyn
Date: Tuesday, May 11, 2010 - 1:54 am

From: Andrew Morton
Date: Monday, May 10, 2010 - 2:08 pm

On Mon, 10 May 2010 21:49:17 +0200

Do you see a need to merge these into 2.6.34?  (I don't)
--

From: Oleg Nesterov
Date: Monday, May 10, 2010 - 2:41 pm

No, the problem is minor, it is not possible to exploit it unless
/sbin/init does "bad things".

And the long CC asks for review. Although 1/4 is "obviously good" in
any case and I strongly believe 2/4 is right at least in general.

Oleg.

--

From: Roland McGrath
Date: Monday, May 10, 2010 - 4:55 pm

All 4
Acked-by: Roland McGrath <roland@redhat.com>
--

Previous thread: [PATCH] Staging: wlags49_h2: fixed C99 comments style issues in wl_profile.c by Prashant P. Shah on Sunday, May 9, 2010 - 10:12 am. (3 messages)

Next thread: [PATCH] thinkpad_acpi: add support for thinkpad x100e by Andrej Gelenberg on Sunday, May 9, 2010 - 12:05 pm. (2 messages)