The main loop in zap_threads() must skip kthreads which may use the same mm.
Otherwise we "kill" this thread erroneously (for example, it can not fork or
exec after that), and the coredumping task stucks in the TASK_UNINTERRUPTIBLE
state forever because of the wrong ->core_waiters count.Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- 26-rc2/fs/exec.c~3_CD_FIX_RACE_USE_MM 2008-05-31 20:05:21.000000000 +0400
+++ 26-rc2/fs/exec.c 2008-06-01 19:04:39.000000000 +0400
@@ -1568,11 +1568,13 @@ static inline int zap_threads(struct tas
for_each_process(g) {
if (g == tsk->group_leader)
continue;
+ if (g->flags & PF_KTHREAD)
+ continue;p = g;
do {
if (p->mm) {
- if (p->mm == mm) {
+ if (unlikely(p->mm == mm)) {
lock_task_sighand(p, &flags);
zap_process(p);
unlock_task_sighand(p, &flags);--
On Sun, 1 Jun 2008 19:30:45 +0400
This is a bugfix, yes?
How does it get triggered?
Do you think the bug is sufficiently serious to fix it in 2.6.26? In
2.6.25.x? If so, it would be better if this patch were not dependent
upon the preceding ones, which do not appear to be 2.6.26 or -stable
material.--
Yes, I think it fixes a bug. The trigger would be an aio request doing
some work (inside aio_kick_handler) simultaneous with some thread in theIt has probably never been seen for real, but might be possible to produce
with an exploit that works hard to hit the race. I'm not sure off hand
what all the bad effects would be, mainly those of SIGKILL'ing the
workqueue thread (keventd I guess). The core-dumping threads will be stuck
in uninterruptible waits and never be killable.Oleg's cleanups make the fix much nicer because there is an easy persistent
flag to check without races. Probably the most isolated fix for this is
something like the bit below (wholly untested). This is hairy enough that
I think Oleg's 1/3 + 2/3 would be preferable even for -stable.Thanks,
Rolanddiff --git a/fs/exec.c b/fs/exec.c
index 9448f1b..0000000 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1545,8 +1545,23 @@ static inline int zap_threads(struct tasp = g;
do {
- if (p->mm) {
- if (p->mm == mm) {
+ struct mm_struct *pmm = p->mm;
+ if (pmm) {
+ /*
+ * We must ignore a kernel thread (aio)
+ * using PF_BORROWED_MM. But we need
+ * task_lock() to avoid races with use_mm()
+ * or unuse_mm().
+ */
+ if (pmm == mm) {
+ task_lock(p);
+ if (p->flags & PF_BORROWED_MM)
+ pmm = NULL;
+ else
+ pmm = p->mm;
+ task_unlock(p);
+ }
+ if (pmm == mm) {
/*
* p->sighand can't disappear, but
* may be changed by de_thread()
--
OK, thanks.
I'll tentatively queue these three for 2.6.26 and will leave 2.6.25.x
alone. The bug seems sufficiently obscure?(This required a bit of massaging of
coredump-zap_threads-must-skip-kernel-threads.patch in fs/exec.c due, I
assume, to dependencies on other things which we have queued for
2.6.27).--
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Benjamin Herrenschmidt | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
