Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED

Previous thread: [PATCH 11/16] signal: prepare for CLD_* notification changes by Tejun Heo on Monday, December 6, 2010 - 9:56 am. (4 messages)

Next thread: [PATCH 04/16] ptrace: kill tracehook_notify_jctl() by Tejun Heo on Monday, December 6, 2010 - 9:56 am. (3 messages)
From: Tejun Heo
Date: Monday, December 6, 2010 - 9:56 am

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean.  The tracer can use the flag to tell the
tracee to retry stop on attach and detach.  On retry, the tracee will
enter the desired state in the correct way.  The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop.  This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop().  This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from userland by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear.  Completing the transition or any event which clears
the group stop states of the task clears the bit and wakes up the
ptracer if waiting.

Oleg:

* Spotted a race condition where a task may retry group stop without
  proper bookkeeping.  Fixed by redoing bookkeeping on retry.

* Pointed out the userland visible intermediate state.  Fixed with
  GROUP_STOP_TRAPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov ...
From: Oleg Nesterov
Date: Monday, December 20, 2010 - 8:00 am

Hmm. we could probably use ->wait_chldexit/__wake_up_parent instead,

I am nervous ;) To me, TASK_KILLABLE makes more sense, and it is


Probably this deserves a minor cleanup,

	relock:
		ret = -ESRCH;
		read_lock(&tasklist_lock);
		if (task_is_traced() || kill) {
			ret = 0;
		} else {
		...


OK. So, ptrace_attach() asks a stopped tracee to s/STOPPED/TRACED/.
If it is not stopped, it should call ptrace_stop() eventually.

This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
ptrace_check_attach() can return the wrong ESRCH after that. Perhaps

Well. I do not know whether this matters, but "hide the transition from
userland" is not 100% correct. I mean, this change is still visible.

ptrace_check_attach()->wait_on_bit() logic fixes the previous example,
but:

	1. the tracer knows that the tracee is stopped

	2. the tracer does ptrace(ATTACH)

	3. the tracer does do_wait()

In this case do_wait() can see the tracee in TASK_RUNNING state,
this breaks wait_task_stopped(ptrace => true).


This looks racy. Suppose that "current" is ptraced, in this case
it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
is set and we have another TASK_STOPPED thead T.

Suppose that another (or same) debugger ataches to this thread T,
wakes it up and sets GROUP_STOP_TRAPPING.

T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
->siglock.

Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).

I think ptrace_stop() should be called in TASK_RUNNING state.

Probably this doesn't really matter, but why do we need to
change the GROUP_STOP_SIGMASK part of t->group_stop? If it
is exiting, this is not needed. If it is already stopped, then

It is no longer needed to set ->exit_code here. The only reason
was s/STOPPED/TRACED/ change in ptrace_check_attach(). Now that
we rely on ptrace_stop() which sets ->exit_state, this can be
removed.


Perhaps it would be more clean to clear ->exit_code here, in the
"else" ...
From: Tejun Heo
Date: Tuesday, December 21, 2010 - 10:31 am

Hmmmm, I actually think that would be cleaner.  I just didn't know it



Ah, thanks for spotting it.  I missed that.  We should be able to

I see.  I can move the transition wait logic into PTRACE_ATTACH.
Would that be good enough?

This is also related to how to wait for attach completion for a new
more transparent attach.  Would it be better for such a request to
make sure the operation to complete before returning or is it
preferable to keep using wait(2) for that?  We'll probably be able to
share the transition wait logic with it.  I think it would be better
to return after the attach is actually complete but is there any

I'm feeling a bit too dense to process the above right now.  I'll

The GROUP_STOP_SIGMASK part of t->group_stop is supposed to track the
signr of the latest stop attempt.  Hmmm... but yeah, not changing the


Hmmm... and dropping current->exit_code clearing from the
do_signal_stop(), right?  I'm a bit confused about the use of
current->exit_code tho.  Why aren't we clearing it from ptrace_stop()?

Thanks.

-- 
tejun
--

From: Tejun Heo
Date: Tuesday, December 21, 2010 - 10:32 am

Ah, never mind.  It's used as the signr return from ptrace signal
trap.

Thanks.

-- 
tejun
--

From: Tejun Heo
Date: Wednesday, December 22, 2010 - 3:54 am

Hello,


Ah, right, the lock drop across arch_ptrace_stop().  Yeah, I agree
calling ptrace_stop() with TASK_RUNNING would solve it.  I'll think
about it a bit more.

Thank you.

-- 
tejun
--

From: Oleg Nesterov
Date: Wednesday, December 22, 2010 - 4:39 am

__wake_up_parent() needs tasklist to pin ->parent. But probably in
this particular case we can rely on rcu, or even ->siglock (given

Perhaps... But then we should wakeup the new child. Perhaps we can
just kill that code, CLONE_STOPPED is deprecated and triggers the

Yes, I thought about this too. But ptrace's semantics is really strange,
even if we move wait_on_bit() into ptrace_attach() we still have a
user-visible change.

sys_ptrace() only works for the single thread who did PTRACE_ATTACH,
but do_wait() should work for its sub-threads.

	1. the tracer knows that the tracee is stopped

	2. the tracer does ptrace(ATTACH)

	3. the tracer's sub-thread does do_wait()

Note! Personally I think we can ignore this "problem", I do not

Oh, I do not know. This is the main problem with ptrace. You can
always understand what the code does, but you can never know what
was the supposed behaviour ;)

That is why I am asking Jan and Roland who understand the userland
needs.

Personally, I _think_ it makes sense to keep do_wait() working after
ptrace_attach(), if it is called by the thread which did attach.

OK ;)

But look. Even if the race doesn't exist. ptrace_stop() can drop
->siglock and call arch_ptrace_stop() which can fault/sleep/whatever.
I think this doesn't really matter, but otoh it would be more clean
to do this in TASK_RUNNING state anyway. At least, in anny case

Oh, the right answer is: ptrace shouldn't use ->exit_code at all ;)

ptrace_report_syscall() and ptrace_signal() check ->exit_code after
return from ptrace_stop(), otherwise we ignore the "data" argument
of ptrace_resume/ptrace_detach.

Oleg.

--

From: Tejun Heo
Date: Wednesday, December 22, 2010 - 8:14 am

But if ptrace(ATTACH) doesn't return until the transition is complete
when the task is already stopped, the tracer's sub-thread's do_wait()
will behave exactly the same.  The only difference would be that
ptrace(ATTACH) may now block and/or is failed by a signal delivery.

How would #3 behave differently if STOPPED -> TRACED transition is

Hmmm... I see.  After this fix / cleanup rounds are complete, I'll
just write up something.  It would be much easier to decide which way
to go with a working implementation and switching between wait(2)

On resume, T is in TASK_RUNNING and the lock is only dropped in

So, yes, the temporary lock dropping can definitely confuse

I agree.  I'll update toward that direction.

Thanks.

-- 
tejun
--

From: Oleg Nesterov
Date: Wednesday, December 22, 2010 - 9:00 am

Ahhh, sorry. I meant, two threads can do 2. and 3. at the same time.

But let me repeat, it is not that I think we should worry. I mentioned
this only because I think it is better to discuss everything we can,
even the really minor things.

Oleg.

--

From: Tejun Heo
Date: Wednesday, December 22, 2010 - 9:21 am

Hello,


Ah, okay, now I see, so WNOHANG wait(2) from a different thread may
fail if it races against ptrace(ATTACH), although it would have
succeeded regardless of the timing between #2 and #3 before the

It's definitely worth documenting.  We can close off the condition by
tweaking wait(2) so that it blocks if the target thread is in
transition, but that might be reaching into the realm of unnecessary
over-engineering.  IMHO, it would be wiser to hold off implementing it
until we know it's actually necessary.  Roland, Jan, do you guys think
this can actually cause a problem?

Thank you.

-- 
tejun
--

Previous thread: [PATCH 11/16] signal: prepare for CLD_* notification changes by Tejun Heo on Monday, December 6, 2010 - 9:56 am. (4 messages)

Next thread: [PATCH 04/16] ptrace: kill tracehook_notify_jctl() by Tejun Heo on Monday, December 6, 2010 - 9:56 am. (3 messages)