[PATCH 2/4] ptrace children revamp

Previous thread: ALSA vs. non coherent DMA by Benjamin Herrenschmidt on Monday, May 5, 2008 - 5:08 pm. (6 messages)

Next thread: Re: Linux 2.6.26-rc1 - pgtable_32.c:178 pmd_bad by Jeff Chua on Monday, May 5, 2008 - 6:06 pm. (44 messages)
From: Roland McGrath
Date: Monday, May 5, 2008 - 5:32 pm

This breaks out the guts of do_wait into three subfunctions.
The control flow is less nonobvious without so much goto.
do_wait_thread and ptrace_do_wait contain the main work of the outer loop.
wait_consider_task contains the main work of the inner loop.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 kernel/exit.c |  212 +++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 132 insertions(+), 80 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 1510f78..934371a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1232,7 +1232,7 @@ static int wait_noreap_copyout(struct task_struct *p, pid_t pid, uid_t uid,
  * the lock and this task is uninteresting.  If we return nonzero, we have
  * released the lock and the system call should return.
  */
-static int wait_task_zombie(struct task_struct *p, int noreap,
+static int wait_task_zombie(struct task_struct *p, int options,
 			    struct siginfo __user *infop,
 			    int __user *stat_addr, struct rusage __user *ru)
 {
@@ -1240,7 +1240,10 @@ static int wait_task_zombie(struct task_struct *p, int noreap,
 	int retval, status, traced;
 	pid_t pid = task_pid_vnr(p);
 
-	if (unlikely(noreap)) {
+	if (!likely(options & WEXITED))
+		return 0;
+
+	if (unlikely(options & WNOWAIT)) {
 		uid_t uid = p->uid;
 		int exit_code = p->exit_code;
 		int why, status;
@@ -1391,13 +1394,16 @@ static int wait_task_zombie(struct task_struct *p, int noreap,
  * released the lock and the system call should return.
  */
 static int wait_task_stopped(struct task_struct *p,
-			     int noreap, struct siginfo __user *infop,
+			     int options, struct siginfo __user *infop,
 			     int __user *stat_addr, struct rusage __user *ru)
 {
 	int retval, exit_code, why;
 	uid_t uid = 0; /* unneeded, required by compiler */
 	pid_t pid;
 
+	if (!(p->ptrace & PT_PTRACED) && !(options & WUNTRACED))
+		return 0;
+
 	exit_code = 0;
 	spin_lock_irq(&p->sighand->siglock);
 
@@ -1415,7 +1421,7 @@ static int ...
From: Roland McGrath
Date: Monday, May 5, 2008 - 5:33 pm

ptrace no longer fiddles with the children/sibling links, and the
old ptrace_children list is gone.  Now ptrace, whether of one's own
children or another's via PTRACE_ATTACH, just uses the new ptraced
list instead.

There should be no user-visible difference that matters.  The only
change is the order in which do_wait() sees multiple stopped
children and stopped ptrace attachees.  Since wait_task_stopped()
was changed earlier so it no longer reorders the children list, we
already know this won't cause any new problems.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 include/linux/init_task.h |    4 +-
 include/linux/sched.h     |   26 +++---
 kernel/exit.c             |  224 ++++++++++++++++++++++++---------------------
 kernel/fork.c             |    6 +-
 kernel/ptrace.c           |   37 +++++---
 5 files changed, 159 insertions(+), 138 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b24c287..2a53494 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -161,8 +161,8 @@ extern struct group_info init_groups;
 		.nr_cpus_allowed = NR_CPUS,				\
 	},								\
 	.tasks		= LIST_HEAD_INIT(tsk.tasks),			\
-	.ptrace_children= LIST_HEAD_INIT(tsk.ptrace_children),		\
-	.ptrace_list	= LIST_HEAD_INIT(tsk.ptrace_list),		\
+	.ptraced	= LIST_HEAD_INIT(tsk.ptraced),			\
+	.ptrace_entry	= LIST_HEAD_INIT(tsk.ptrace_entry),		\
 	.real_parent	= &tsk,						\
 	.parent		= &tsk,						\
 	.children	= LIST_HEAD_INIT(tsk.children),			\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 03c2380..3be5c96 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1074,12 +1074,6 @@ struct task_struct {
 #endif
 
 	struct list_head tasks;
-	/*
-	 * ptrace_list/ptrace_children forms the list of my children
-	 * that were stolen by a ptracer.
-	 */
-	struct list_head ptrace_children;
-	struct list_head ptrace_list;
 
 	struct mm_struct *mm, *active_mm;
 
@@ -1101,18 +1095,25 @@ struct task_struct {
 ...
From: Roland McGrath
Date: Monday, May 5, 2008 - 5:33 pm

This reverts the effect of commit f2cc3eb133baa2e9dc8efd40f417106b2ee520f3
"do_wait: fix security checks".  That change reverted the effect of commit
73243284463a761e04d69d22c7516b2be7de096c.  The rationale for the original
commit still stands.  The inconsistent treatment of children hidden by
ptrace was an unintended omission in the original change and in no way
invalidates its purpose.

This makes do_wait return the error returned by security_task_wait()
(usually -EACCES) in place of -ECHILD when there are some children the
caller would be able to wait for if not for the permission failure.  A
permission error will give the user a clue to look for security policy
problems, rather than for mysterious wait bugs.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 kernel/exit.c |   30 ++++++++++++++++++++----------
 1 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c95765f..42980e2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1193,14 +1193,10 @@ static int eligible_child(enum pid_type type, struct pid *pid, int options,
 		return 0;
 
 	err = security_task_wait(p);
-	if (likely(!err))
-		return 1;
+	if (err)
+		return err;
 
-	if (type != PIDTYPE_PID)
-		return 0;
-	/* This child was explicitly requested, abort */
-	read_unlock(&tasklist_lock);
-	return err;
+	return 1;
 }
 
 static int wait_noreap_copyout(struct task_struct *p, pid_t pid, uid_t uid,
@@ -1530,7 +1526,8 @@ static int wait_task_continued(struct task_struct *p, int options,
  * -ECHILD should be in *@retval before the first call.
  * Returns nonzero for a final return, when we have unlocked tasklist_lock.
  * Returns zero if the search for a child should continue; then *@retval is
- * 0 if @p is an eligible child, or still -ECHILD.
+ * 0 if @p is an eligible child, or another error from security_task_wait(),
+ * or still -ECHILD.
  */
 static int wait_consider_task(struct task_struct *parent, int ptrace,
 			      struct task_struct *p, ...
From: Roland McGrath
Date: Monday, May 5, 2008 - 5:34 pm

This fixes an arcane bug that we think was a regression introduced
by commit b2b2cbc4b2a2f389442549399a993a8306420baf.  When a parent
ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap
but they don't if it's using ptrace on them.  When the parent thread
later exits and ceases to ptrace a child but leaves other live
threads in the parent's thread group, any zombie children are left
dangling.  The fix makes them self-reap then, as they would have
done earlier if ptrace had not been in use.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 kernel/exit.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 42980e2..afed083 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -697,6 +697,22 @@ static void exit_mm(struct task_struct * tsk)
 }
 
 /*
+ * Return nonzero if @parent's children should reap themselves.
+ *
+ * Called with write_lock_irq(&tasklist_lock) held.
+ */
+static int ignoring_children(struct task_struct *parent)
+{
+	int ret;
+	struct sighand_struct *psig = parent->sighand;
+	spin_lock(&psig->siglock);
+	ret = (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
+	       (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT));
+	spin_unlock(&psig->siglock);
+	return ret;
+}
+
+/*
  * Detach all tasks we were using ptrace on.
  * Any that need to be release_task'd are put on the @dead list.
  *
@@ -705,6 +721,7 @@ static void exit_mm(struct task_struct * tsk)
 static void ptrace_exit(struct task_struct *parent, struct list_head *dead)
 {
 	struct task_struct *p, *n;
+	int ign = -1;
 
 	list_for_each_entry_safe(p, n, &parent->ptraced, ptrace_entry) {
 		__ptrace_unlink(p);
@@ -720,10 +737,18 @@ static void ptrace_exit(struct task_struct *parent, struct list_head *dead)
 		 * release_task() here because we already hold tasklist_lock.
 		 *
 		 * If it's our own child, there is no notification to do.
+		 * But if our normal children self-reap, then ...
From: Linus Torvalds
Date: Monday, May 5, 2008 - 5:54 pm

I dunno.

I absolutely detest your propensity for multiple return values. You have 
"int ret" for the return value, and then you *also* have a "int *retval" 
for the return value.

Which is what, and why?

I really don't think this helps in the "nonobvious" department. 

Yes, I can see the comments, so I know what it's supposed to be about, but 
it still disturbs me.

I'm pretty sure it should be possible to return a positive value for 
"eligible but not available" and make do with just one return value, but 
if that is just not possible or too complicated, at least don't call it 
"retval" and have totally different semantics from the return value we 
return?

So for example, maybe it could just be count of eligible children, and we 
call could it "int *eligible", and then rather than initialize to -ECHILD, 
initialize to zero, and make the logic be

	if (!eligible)
		return -ECHILD;
	.. otherwise see if we can wait, return -EINTR or whatever it we 
	can't ..

which then looks like a fairly sane thing to do in all contexts (both the 
caller and the callee).

			Linus
--

From: Roland McGrath
Date: Monday, May 5, 2008 - 6:42 pm

The "int ret" (actual return value of wait_consider_task and wait_task_*)
is the nonzero return value if this task produced the final result.  The
"int *retval" is the return value of last resort if there turns out to be

It's not possible and would be too complicated.  (A positive value
already means the pid for a successful result, and the task_struct has been

A better name would be "notask_error".  It is the error to return when there
is no task with status to return.  The error is zero if there were matches,

There is no use for a count, but there is a use for the error value.  In
patch 3/3, we restore the old behavior of keeping track of the error value
other than -ECHILD if there was one (like -EACCES due to bad selinux policy).

I agree the name of the parameter and variable should not be "retval".
A change beyond that just for the sake of making it not be something
like looks like it might the syscall's return value seems gratuitous and
obfuscating, and will complicate the code (since it really is a value
we're keeping around that might be the return value for the syscall).


Thanks,
Roland
--

Previous thread: ALSA vs. non coherent DMA by Benjamin Herrenschmidt on Monday, May 5, 2008 - 5:08 pm. (6 messages)

Next thread: Re: Linux 2.6.26-rc1 - pgtable_32.c:178 pmd_bad by Jeff Chua on Monday, May 5, 2008 - 6:06 pm. (44 messages)