Re: [PATCH 3/3] posix timers: use SIGQUEUE_CANCELLED when the timer is destroyed

Previous thread: [PATCH 3/3] posix timers: release_posix_timer: kill the bogus put_task_struct(->it_process); by Oleg Nesterov on Saturday, May 17, 2008 - 11:14 am. (1 message)

Next thread: [PATCH 1/3] signals: sigqueue_free: don't free sigqueue if it is queued by Oleg Nesterov on Saturday, May 17, 2008 - 11:14 am. (13 messages)
To: Andrew Morton <akpm@...>
Cc: Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Linus Torvalds <torvalds@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Saturday, May 17, 2008 - 11:14 am

This change goes as a separate patch for documentation purposes.

Suggested by Linus Torvalds.

Fixes the problem pointed out by Austin Clements. Currently, when the task
execs it could be killed by the fatal signal sent by the posix timer, because
exec flushes the signal handlers.
See http://bugzilla.kernel.org/show_bug.cgi?id=10460

This is a user visible change. With this patch sys_timer_delete() discards
the pending signal which was generated by the timer.

This change goes as a separate patch for documentation purposes. We have many
options how to set SIGQUEUE_CANCELLED while detroying the timer. We could set
this flag in release_posix_timer() before calling sigqueue_free(), or add the
new "int cancel" argument to sigqueue_free(), but since sigqueue_free() plays
with q->flags anyway and nobody else uses this function, this patch changes
sigqueue_free() to set SIGQUEUE_CANCELLED unconditionally.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/signal.c~6_USE_CANCELLED 2008-05-17 17:40:09.000000000 +0400
+++ 25/kernel/signal.c 2008-05-17 18:07:10.000000000 +0400
@@ -1246,7 +1246,7 @@ void sigqueue_free(struct sigqueue *q)
* __exit_signal()->flush_sigqueue().
*/
spin_lock_irqsave(lock, flags);
- q->flags &= ~SIGQUEUE_PREALLOC;
+ q->flags = SIGQUEUE_CANCELLED; /* clears SIGQUEUE_PREALLOC */
/*
* If it is queued it will be freed when dequeued,
* like the "regular" sigqueue.

--

To: Andrew Morton <akpm@...>
Cc: Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Linus Torvalds <torvalds@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Saturday, May 17, 2008 - 11:31 am

If this change is undesirable, we can (for example) do

--- kernel/posix-timers.c
+++ kernel/posix-timers.c
@@ -885,6 +885,7 @@ itimer_delete(struct k_itimer *timer)
timer->it_process = NULL;

unlock_timer(timer, flags);
+ tmr->sigq->flags |= SIGQUEUE_CANCELLED;
release_posix_timer(timer, IT_ID_SET);
}

instead, and still fix the "BUG 10460".

Oleg.

--

To: Oleg Nesterov <oleg@...>
Cc: Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Saturday, May 17, 2008 - 1:11 pm

The only reason I like that better is that it makes me nervous when one
re-initializes the whole flags field. So your original 3/3 patch

- q->flags &= ~SIGQUEUE_PREALLOC;
+ q->flags = SIGQUEUE_CANCELLED; /* clears SIGQUEUE_PREALLOC */

just makes me go "Hmm, what about all the other flag bits?"

Now, admittedly, there are currently (with your patch) just two
SIGQUEUE_xyz bits, so by just doing that single assignment, you really
only modify the two bits you want to modify. But maybe that will change.
So I'd prefer to either write it as

q->flags &= ~SIGQUEUE_PREALLOC;
q->flags |= SIGQUEUE_CANCELLED;

or to use bitfields, or to do something else to make it safe in the
presense of multiple bits.

Your alternate patch obviously doesn't have that issue, since it just sets
the single bit.

But apart from that issue, I have absolutely no preferences either way.
You're effectively the maintainer in this area, you get to choose.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Sunday, May 18, 2008 - 1:15 pm

Initially, I did

q->flags |= SIGQUEUE_CANCELLED;
spin_lock_irqsave(lock, flags);
q->flags &= ~SIGQUEUE_PREALLOC;

to document the fact that SIGQUEUE_CANCELLED can be set lockless, but
then "optimized" the code, couldn't help myself... Besides, the code
above looks really confusing without the fat comment.

Another reason. We are losing the control over "q" here, I don't think
we can have other flags which should be preserved once we set _CANCELLED
and cleared _PREALLOC.

But yes, I agree, this is not a good practice. I'd leave the patch as is
at least for now but I don't mind to redo and resend. Fortunately, this
falls to the "cleanup" category.

Oleg.

--

To: Oleg Nesterov <oleg@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Tuesday, May 20, 2008 - 10:27 pm

Just make it:

spin_lock_irqsave(lock, flags);
q->flags |= SIGQUEUE_CANCELLED;
q->flags &= ~SIGQUEUE_PREALLOC;

and we needn't wax philosophical about the meaning of locking rules. That
patch would have my ACK, but I concur with Linus about the undesireability
of the plain = version.

Thanks,
Roland
--

To: Oleg Nesterov <oleg@...>
Cc: Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Sunday, May 18, 2008 - 1:24 pm

Oh, and the above is *wrong*.

Why?

Becayse if SIGQUEUE_PREALLOC setting needs the lock, then setting any
*other* bit in that word will also need the lock!

That's because

q->flags |= SIGQUEUE_CANCELLED;

writes those other bits too - admittedly with the value they were read
just before, but if it races with something setting SIGQUEUE_PREALLOC that
doesn't matter - the newly written version will simply be wrong.

So the rule is that if one bit of a word needs locking, then they *all*
do.

(On alpha, this is true even for whole bytes or shortwords - because a
byte/shortword write is actually "read word, update byte/short, write
word" sequence on older CPU's. So you cannot do atomic byte updates, and
need to use locks).

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Sunday, May 18, 2008 - 1:46 pm

Ah. I wasn't clear.

Clearing of SIGQUEUE_PREALLOC needs ->siglock, yes. But not because anybody
else can write to q->flags. Nobody can, we (the timer) "own" this sigqueue.

Once we clear SIGQUEUE_PREALLOC, "q" can be freed by the receiver (it doesn't
writes to q->flags, it only reads ->flags). After that we can't trust the
list_empty() check, we just can't dereference this "struct sigqueue *".

Taking ->siglock before "&= ~SIGQUEUE_PREALLOC" ensures that "q" can't be
be freed if it is queued, nothing more.

Oleg.

--

To: Oleg Nesterov <oleg@...>
Cc: Andrew Morton <akpm@...>, Austin Clements <amdragon+kernelbugzilla@...>, Ingo Molnar <mingo@...>, john stultz <johnstul@...>, Michael Kerrisk <mtk.manpages@...>, Roland McGrath <roland@...>, Thomas Gleixner <tglx@...>, <linux-kernel@...>
Date: Sunday, May 18, 2008 - 1:40 pm

Side note: the alternative, of course, is to just use the atomic bit
operations. They aren't generally much (if at all) faster than locking +
doing the operation + unlocking, but they can avoid lock contention, so
if you do a lot of bit ops that need no other locking than the setting and
clearing (possibly with testing), then they are the right choice.

For signals, we obviously need other locking, so the atomic bit ops are a
waste of time (doing *both* locking for other reasons *and* atomic bitops
is obviously much slower than either).

Linus
--

Previous thread: [PATCH 3/3] posix timers: release_posix_timer: kill the bogus put_task_struct(->it_process); by Oleg Nesterov on Saturday, May 17, 2008 - 11:14 am. (1 message)

Next thread: [PATCH 1/3] signals: sigqueue_free: don't free sigqueue if it is queued by Oleg Nesterov on Saturday, May 17, 2008 - 11:14 am. (13 messages)