This change goes as a separate patch for documentation purposes. Suggested by Linus Torvalds. Fixes the problem pointed out by Austin Clements. Currently, when the task execs it could be killed by the fatal signal sent by the posix timer, because exec flushes the signal handlers. See http://bugzilla.kernel.org/show_bug.cgi?id=10460 This is a user visible change. With this patch sys_timer_delete() discards the pending signal which was generated by the timer. This change goes as a separate patch for documentation purposes. We have many options how to set SIGQUEUE_CANCELLED while detroying the timer. We could set this flag in release_posix_timer() before calling sigqueue_free(), or add the new "int cancel" argument to sigqueue_free(), but since sigqueue_free() plays with q->flags anyway and nobody else uses this function, this patch changes sigqueue_free() to set SIGQUEUE_CANCELLED unconditionally. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> --- 25/kernel/signal.c~6_USE_CANCELLED 2008-05-17 17:40:09.000000000 +0400 +++ 25/kernel/signal.c 2008-05-17 18:07:10.000000000 +0400 @@ -1246,7 +1246,7 @@ void sigqueue_free(struct sigqueue *q) * __exit_signal()->flush_sigqueue(). */ spin_lock_irqsave(lock, flags); - q->flags &= ~SIGQUEUE_PREALLOC; + q->flags = SIGQUEUE_CANCELLED; /* clears SIGQUEUE_PREALLOC */ /* * If it is queued it will be freed when dequeued, * like the "regular" sigqueue. --
If this change is undesirable, we can (for example) do --- kernel/posix-timers.c +++ kernel/posix-timers.c @@ -885,6 +885,7 @@ itimer_delete(struct k_itimer *timer) timer->it_process = NULL; unlock_timer(timer, flags); + tmr->sigq->flags |= SIGQUEUE_CANCELLED; release_posix_timer(timer, IT_ID_SET); } instead, and still fix the "BUG 10460". Oleg. --
The only reason I like that better is that it makes me nervous when one re-initializes the whole flags field. So your original 3/3 patch - q->flags &= ~SIGQUEUE_PREALLOC; + q->flags = SIGQUEUE_CANCELLED; /* clears SIGQUEUE_PREALLOC */ just makes me go "Hmm, what about all the other flag bits?" Now, admittedly, there are currently (with your patch) just two SIGQUEUE_xyz bits, so by just doing that single assignment, you really only modify the two bits you want to modify. But maybe that will change. So I'd prefer to either write it as q->flags &= ~SIGQUEUE_PREALLOC; q->flags |= SIGQUEUE_CANCELLED; or to use bitfields, or to do something else to make it safe in the presense of multiple bits. Your alternate patch obviously doesn't have that issue, since it just sets the single bit. But apart from that issue, I have absolutely no preferences either way. You're effectively the maintainer in this area, you get to choose. Linus --
Initially, I did q->flags |= SIGQUEUE_CANCELLED; spin_lock_irqsave(lock, flags); q->flags &= ~SIGQUEUE_PREALLOC; to document the fact that SIGQUEUE_CANCELLED can be set lockless, but then "optimized" the code, couldn't help myself... Besides, the code above looks really confusing without the fat comment. Another reason. We are losing the control over "q" here, I don't think we can have other flags which should be preserved once we set _CANCELLED and cleared _PREALLOC. But yes, I agree, this is not a good practice. I'd leave the patch as is at least for now but I don't mind to redo and resend. Fortunately, this falls to the "cleanup" category. Oleg. --
Oh, and the above is *wrong*. Why? Becayse if SIGQUEUE_PREALLOC setting needs the lock, then setting any *other* bit in that word will also need the lock! That's because q->flags |= SIGQUEUE_CANCELLED; writes those other bits too - admittedly with the value they were read just before, but if it races with something setting SIGQUEUE_PREALLOC that doesn't matter - the newly written version will simply be wrong. So the rule is that if one bit of a word needs locking, then they *all* do. (On alpha, this is true even for whole bytes or shortwords - because a byte/shortword write is actually "read word, update byte/short, write word" sequence on older CPU's. So you cannot do atomic byte updates, and need to use locks). Linus --
Side note: the alternative, of course, is to just use the atomic bit operations. They aren't generally much (if at all) faster than locking + doing the operation + unlocking, but they can avoid lock contention, so if you do a lot of bit ops that need no other locking than the setting and clearing (possibly with testing), then they are the right choice. For signals, we obviously need other locking, so the atomic bit ops are a waste of time (doing *both* locking for other reasons *and* atomic bitops is obviously much slower than either). Linus --
Ah. I wasn't clear. Clearing of SIGQUEUE_PREALLOC needs ->siglock, yes. But not because anybody else can write to q->flags. Nobody can, we (the timer) "own" this sigqueue. Once we clear SIGQUEUE_PREALLOC, "q" can be freed by the receiver (it doesn't writes to q->flags, it only reads ->flags). After that we can't trust the list_empty() check, we just can't dereference this "struct sigqueue *". Taking ->siglock before "&= ~SIGQUEUE_PREALLOC" ensures that "q" can't be be freed if it is queued, nothing more. Oleg. --
Just make it: spin_lock_irqsave(lock, flags); q->flags |= SIGQUEUE_CANCELLED; q->flags &= ~SIGQUEUE_PREALLOC; and we needn't wax philosophical about the meaning of locking rules. That patch would have my ACK, but I concur with Linus about the undesireability of the plain = version. Thanks, Roland --
