Hello Everybody, This is another attempt towards process-freezer based cpu-hotplug. This patchset covers just about everything that was discussed on the LKML with respect to the freezer-based cpu-hotplug. Following are new features from the version I last posted: - Enhancements to the freezer interface so that it can be used for events other than Software suspend. - A reentrant process freezer. Software suspend needs since it uses cpu-hotplug. - All non-singlethreaded workqueues freezeable by default. And yes, this time around, I have some numbers to give an idea of the performance. Test m/c configuration: 4-way i386 Xeon Box with 2.5G Memory. Test case: 'make -jN' running parallely with cpu-offline/online operations in a tight loop. With N=256. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kernel: linux-2.6.21-rc5-mm3 vanilla --------------------------------------------------------- Maximum time taken for a hotplug operation : 1m5.357s Minimum time taken for a hotplug operation : 0m0.288s Average number of attempts to offline AND Online cpu1, cpu2 and cpu3 : 6 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kernel: linux-2.6.21-rc5-mm3 + the patchset. --------------------------------------------------------- Maximum time taken for a hotplug operation : 0m1s.261s Minimum time taken for a hotplug operation : 0m0s.808s Average number of attempts to offline and online cpu1, cpu2 and cpu3 : 14 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The numbers look ok. However, on an average it took around 14 attempts to offline AND online cpu1,cpu2, cpu3 when otherwise it should have taken only 6. The failures were due to the freezer failing to freeze all the tasks within the timeout period (20sec). As N increased (512, 1024, 2048 and 8192), even the average number of attempts to online AND offline the 3 cpus increased. At this point of time, I worked towards ensuring that cpu-hotplug works reliably ...
This patch provides an interface to extend the use of the process freezer beyond Suspend. The tasks can selectively mark themselves to be exempted from specific freeze events like SUSPEND /KPROBES/CPU_HOTPLUG. This patch however, *does not* sort non freezable threads into different categories based on the freeze events. Thus all tasks which were previously marked PF_NOFREEZE are now exempted from freezer using freezer_exempt(FE_ALL); which means exempt from all kinds of freezes. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> -- arch/i386/kernel/apm.c | 2 - drivers/block/loop.c | 2 - drivers/char/apm-emulation.c | 6 ++-- drivers/ieee1394/ieee1394_core.c | 2 - drivers/md/md.c | 2 - drivers/mmc/card/queue.c | 3 +- drivers/mtd/mtd_blkdevs.c | 3 +- drivers/scsi/libsas/sas_scsi_host.c | 2 - drivers/scsi/scsi_error.c | 2 - drivers/usb/storage/usb.c | 2 - include/linux/freezer.h | 52 ++++++++++++++++++++++++++++++++---- include/linux/sched.h | 15 +++++++++- init/do_mounts_initrd.c | 2 - kernel/fork.c | 2 - kernel/kprobes.c | 4 +- kernel/power/disk.c | 4 +- kernel/power/main.c | 6 ++-- kernel/power/process.c | 29 +++++++++++--------- kernel/power/user.c | 8 ++--- kernel/rcutorture.c | 4 +- kernel/sched.c | 2 - kernel/softirq.c | 2 - kernel/softlockup.c | 2 - kernel/workqueue.c | 2 - 24 files changed, 110 insertions(+), 50 deletions(-) Index: ...
Hmmm, I do not really like softlockup watchdog running during suspend. Can we make this freezeable and make watchdog shut itself off while Same here. Does this mean we have userland /linuxrc running with PF_NOFREEZE? Eh? Why does kprobes code depend on config_pm? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Why do you think so? I think the freezer should be compiled automatically Generally, I agree, but this patch only replaces the existing instances of PF_NOFREEZE with the new mechanism. The changes you're talking about require a separate patch series (or at least one separate patch), I think, and No, actually it is _required_ for the userland resume to work. Well, perhaps I should place a comment in there so that I don't have to explain this again Because it uses the freezer? ;-) Greetings, Rafael -
Is that why?! Then I guess we can remove it. Because the freezer is thanks gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
(Actually it would be nice to say /* This thread should not be frozen for suspend, becuase it is needed Kconfig can do that. ("select statement"). If we have one such ifdef, Can you put big bold comment there? That is no longer true after this patch... Ugly ifdef above makes sure freezer is there for kprobes. I'm trying to say that #if above is now broken. Actually it was probably always broken, but it just became more so. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
No, it doesn't, but this task spawns linuxrc and then calls sys_wait4() in a loop. Well, actually, I'll try to plant try_to_freeze() in this loop and see if that Agreed. Greetings, Rafael -
Hi.
It works. I've had:
while (pid != sys_wait4(-1, NULL, 0, NULL)) {
yield();
try_to_freeze();
}
there for ages for Suspend2.
Regards,
Nigel
-
OK, thanks. Is there any particular reason to place try_to_freeze() after yield()? Greetings, Rafael -
Hi. Not that I remember. I haven't touched that for years :) Nigel -
I have already removed it from in my version 3. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
So, a kernel_thread should call freezer_exempt(FE_XXX) somewhere at the beginning if it doesn't want to be considered as freezeable. But what if the freezing is already in progress? In that case freezer_exempt() should somehow clear TIF_FREEZE (if exempt_freeze_event doesn't match freeze_event parameter of freeze_processes()), otherwise we may hit a nasty bug, much worse than a freezing failure (which could be restarted). try_to_freeze_tasks() succeeds because the task is !freezeable(), the task goes to refrigerator (while it should not), thaw_tasks() ignores process and it stays frozen. Alternatively, we can do a re-check in refrigerator() to fix this race. This is a real nitpick, but it was hard to me to understand this change. Because it looks as if we have a subtle reason to set TASK_INTERRUPTIBLE before freezer_exempt(). Unless I missed something, I'd suggest to move freezer_exempt() up, before set_current_state(). The same for apm_mainloop(). Oleg. -
That's a nice catch! The problem still exists in the current -mm with In the patch 2/8, I am maintaining a global system_freeze_event_mask. The reads and writes to this mask are serialized by freezer_mutex. But the mutex is held only in the freeze_processes() and thaw_processes() path. So the check in refrigerator() which is not in the freeze_processes()/ thaw_processes() path might need some more thought. Will cook up Ok, no subtle reasons for freezer_exempt()ing after set_current_state(). So no problems changing the order. But (just curious), is there any specific Thanks for the review. Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
No, no, it was just a nitpick :) May be this is just me, but when I see the code like set_current_state(TASK_XXX); something_which_doesnt_need_TASK_XXX(); , I can't read the code further, trying to understand where I was wrong and why do we need to change task->state here. Oleg. -
This patch adds provision to make the process freezer reentrant
for different kinds of freeze events.
Credit to Rafael Wysocki for the system_freeze_event_mask idea.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
--
kernel/power/process.c | 57 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 50 insertions(+), 7 deletions(-)
Index: linux-2.6.21-rc3/kernel/power/process.c
===================================================================
--- linux-2.6.21-rc3.orig/kernel/power/process.c
+++ linux-2.6.21-rc3/kernel/power/process.c
@@ -23,6 +23,13 @@
#define FREEZER_KERNEL_THREADS 0
#define FREEZER_USER_SPACE 1
+/*
+ * Global mask of the events (think suspend, cpu-hotplug, etc) for
+ * which the services of the processes freezer have been
+ * currently employed.
+ */
+static unsigned long system_freeze_event_mask;
+static DEFINE_MUTEX(freezer_mutex);
static inline int freezeable(struct task_struct * p, unsigned long freeze_event)
{
@@ -33,6 +40,25 @@ static inline int freezeable(struct task
return 1;
}
+/* Returns the mask of the events for which p is currently frozen */
+static inline int process_frozen_event_mask(struct task_struct *p)
+{
+ return freezeable_event_mask(p) & system_freeze_event_mask;
+}
+
+static inline int thawable(struct task_struct *p, unsigned long freeze_event)
+{
+ if (!freezeable(p, freeze_event))
+ return 0;
+
+ /* p can be thawed iff the process p has been
+ * frozen for freeze_event alone.
+ */
+ if (process_frozen_event_mask(p) & ~freeze_event)
+ return 0;
+
+ return 1;
+}
/* Refrigerator is place where frozen processes are stored :-). */
void refrigerator(void)
{
@@ -183,20 +209,30 @@ static unsigned int try_to_freeze_tasks(
int freeze_processes(unsigned long freeze_event)
{
unsigned int nr_unfrozen;
-
+ int ret = 0;
+ mutex_lock(&freezer_mutex);
+ if (system_freeze_event_mask & freeze_event)
+ goto ...I am not sure this is correct. Suppose that system_freeze_event_mask == FE_SUSPEND, now freeze_processes(FE_ALL) returns success. Shouldn't we return an error? Oleg. -
Hmm, may be we are getting confused with the word 'reentrant' here. The idea behind this API was to freeze the system for one event only at any given time. However, we can have nested freeze_processes() call for different events. Something like freeze_processes(FE_SUSPEND); /* Do something */ freeze_processes(FE_HOTPLUG_CPU); /* hotplug cpus */ thaw_processes(FE_HOTPLUG_CPU); /* Do something more */ thaw_processes(FE_SUSPEND); So ideally no one is supposed to make a call like freeze_processes(FE_SUSPEND | FE_HOTPLUG_CPU); OR freeze_processes(FE_ALL); Thanks gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
This patch implements process_freezer based cpu-hotplug core. The sailent features are: o No more (un)lock_cpu_hotplug. o No more CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. Hence no per-subsystem hotcpu mutexes. o Calls freeze_process/thaw_processes at the beginning/end of the hotplug operation. Lessons Learnt: o CPU_UP_PREPARE has to be called with the processes frozen. If implemented otherwise, the kernel threads which we create during UP_PREPARE will never be frozen during freeze_processes since we them up only during CPU_ONLIN. Points to ponder: o Can the SYSTEM_RUNNING hack in _cpu_up be avoided by some cleaner means. Signed-off-by : Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by : Gautham R Shenoy <ego@in.ibm.com> Signed-off-by : Rafael J. Wysocki <rjw@sisk.pl> -- include/linux/freezer.h | 10 ++++-- include/linux/notifier.h | 2 - include/linux/sched.h | 5 ++- kernel/cpu.c | 70 ++++++++++++++++++----------------------------- kernel/sched.c | 20 +------------ kernel/softirq.c | 13 ++++---- kernel/softlockup.c | 1 kernel/workqueue.c | 33 ++++++++++------------ mm/slab.c | 6 ---- 9 files changed, 63 insertions(+), 97 deletions(-) Index: linux-2.6.21-rc5/include/linux/sched.h =================================================================== --- linux-2.6.21-rc5.orig/include/linux/sched.h +++ linux-2.6.21-rc5/include/linux/sched.h @@ -1195,10 +1195,13 @@ static inline void put_task_struct(struc #define PF_FE_KPROBES 0x00000010 /* This thread should not be frozen * for Kprobes */ +#define PF_FE_HOTPLUG_CPU 0x00000020 /* Thread should not be frozen for + * cpu hotplug. + */ #define PF_FROZEN 0x00010000 /* frozen for system suspend */ -#define PF_FE_ALL (PF_FE_SUSPEND | PF_FE_KPROBES) +#define PF_FE_ALL (PF_FE_SUSPEND | PF_FE_KPROBES | PF_FE_HOTPLUG_CPU) /* Exempt from all kinds freeze chills*/ #define ...
Off-topic. This is a common pattern. Perhaps it makes sense to Why this change? We are doing kthread_stop() + set_cpus_allowed() on return. Imho, if (IS_ERR(p)) goto out_allowed; goto out_thread; As it was already discussed, this is racy. As Srivatsa (imho rightly) suggested, kthread_stop(p) should thaw process itself. This also allows us to kill at least some of wait_for_die loops. However, the change in kthread_stop(p) in not enough to close the race. We can check kthread_should_stop() in refrigerator(), this looks like a most simple approach for now. Alternatively, Srivatsa suggests to introduce a new task_lock() protected task_struct->freezer_state (so we can reliably set FE_ALL). Surely this is more poweful, but needs more changes. I am not sure. Perhaps we can do this later. In any case, imho "try3" should add thaw_process() to kthread_stop(). Gautham, Srivatsa, do you agree? Oleg. -
Yes, that looks feasible and nice. But I remember making this change for Well, in this case this is not racy. Remember, we're doing a thaw_process(p) in CPU_DEAD where p *is* frozen for cpu hotplug. So the where we might call a freeze_process(p) after we do a thaw_process doesn't seem to be feasible. Why the check kthread_should_stop() refrigerator() ? As vatsa mentioned, we would be doing task_lock(p); freezer_exempt(p, FE_ALL); /* Doesn't exist as of now, but we can work it out */ thaw_process(p); task_unlock(p); wait_for_completion(); This needs an extra field! We're supposed to be miserly when it comes to -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
Please look at http://marc.info/?l=linux-kernel&m=117562018530190, we can't Great! Oleg. -
If I'm understanding correctly, this will cause # echo 0 > /sys/devices/system/cpu/cpuX/online to sometimes fail, and userspace is expected to try again? This will break existing applications. Perhaps drivers/base/cpu.c:store_online should retry as long as cpu_up/down return -EBUSY. That would avoid a userspace-visible interface change. -
yeah. I'd even suggest a freeze_processes_nofail() API instead, that does this internally, without burdening the callsites. (and once the freezer becomes complete then freeze_processes_nofail() == freeze_processes()) Ingo -
Yeah, I just realized that an implementation of my proposal would busy loop in the kernel forever if a silly admin tried to offline the last cpu (we're already using -EBUSY for that case), so freeze_processes_nofail is a better idea :-) -
Hi. If there's only one online cpu, shouldn't it return -EINVAL? Regards, Nigel -
Not sure if we _can_ do freeze_processes_nofail(). If something is wrong (process in D state forever because of driver bug?), it looks better to return error to userspace than looping forever. You may want to pass higher timeout than 20sec. But if you can't freeze everything in 1hour, it is unlikely to ever succeed. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
This patch rips out lock_cpu_hotplug from the kernel.
Good Riddance!! (hopefully :) )
Signed-off-by : Gautham R Shenoy <ego@in.ibm.com>
--
arch/i386/kernel/cpu/mtrr/main.c | 6 ------
arch/i386/kernel/microcode.c | 8 --------
arch/mips/kernel/mips-mt.c | 5 -----
arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 -----
arch/powerpc/platforms/pseries/rtasd.c | 4 ----
include/linux/cpu.h | 20 --------------------
kernel/cpu.c | 17 -----------------
kernel/rcutorture.c | 3 ---
kernel/stop_machine.c | 3 ---
net/core/flow.c | 2 --
10 files changed, 73 deletions(-)
Index: linux-2.6.21-rc4/include/linux/cpu.h
===================================================================
--- linux-2.6.21-rc4.orig/include/linux/cpu.h
+++ linux-2.6.21-rc4/include/linux/cpu.h
@@ -89,18 +89,6 @@ extern struct sysdev_class cpu_sysdev_cl
#ifdef CONFIG_HOTPLUG_CPU
/* Stop CPUs going up and down. */
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{
- mutex_lock(cpu_hp_mutex);
-}
-
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{
- mutex_unlock(cpu_hp_mutex);
-}
-
-extern void lock_cpu_hotplug(void);
-extern void unlock_cpu_hotplug(void);
#define hotcpu_notifier(fn, pri) { \
static struct notifier_block fn##_nb = \
{ .notifier_call = fn, .priority = pri }; \
@@ -113,14 +101,6 @@ int cpu_down(unsigned int cpu);
#else /* CONFIG_HOTPLUG_CPU */
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{ }
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{ }
-
-#define lock_cpu_hotplug() do { } while (0)
-#define unlock_cpu_hotplug() do { } while (0)
-#define lock_cpu_hotplug_interruptible() 0
#define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
#define ...Currently i386 and x86_64 __cpu_up uses the services of the kevents
workqueue to bring the cpu up. Change this and use kthread workqueue
instead which is single_threaded and won't be frozen during
CPU_HOTPLUG.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
--
arch/i386/kernel/smpboot.c | 3 ++-
arch/x86_64/kernel/smpboot.c | 5 +++--
include/linux/kthread.h | 5 +++--
kernel/kthread.c | 16 +++++++++++++---
4 files changed, 21 insertions(+), 8 deletions(-)
Index: linux-2.6.21-rc5/include/linux/kthread.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/kthread.h
+++ linux-2.6.21-rc5/include/linux/kthread.h
@@ -3,7 +3,7 @@
/* Simple interface for creating and stopping kernel threads without mess. */
#include <linux/err.h>
#include <linux/sched.h>
-
+#include <linux/workqueue.h>
struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
const char namefmt[], ...);
@@ -29,5 +29,6 @@ struct task_struct *kthread_create(int (
void kthread_bind(struct task_struct *k, unsigned int cpu);
int kthread_stop(struct task_struct *k);
int kthread_should_stop(void);
-
+extern void kthreadwq_queue_work(struct work_struct *w);
+extern int kthreadwq_up(void);
#endif /* _LINUX_KTHREAD_H */
Index: linux-2.6.21-rc5/kernel/kthread.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/kthread.c
+++ linux-2.6.21-rc5/kernel/kthread.c
@@ -112,7 +112,7 @@ static int kthread(void *_create)
return 0;
}
-/* We are keventd: create a thread. */
+/* We are keventd: create a thread. Hmm, Are we?? */
static void keventd_create_kthread(struct work_struct *work)
{
struct kthread_create_info *create =
@@ -132,6 +132,16 @@ static void keventd_create_kthread(struc
complete(&create->done);
}
+void kthreadwq_queue_work(struct work_struct ...Off-topic question: can't kernel/kmod.c use this workqueue too instead of its own khelper_wq? Oleg. -
This patch
o Makes all non-singlethreaded workqueues freezeable by default.
o Introduces a new API for creating freeze_exempted workqueues.
o Uses the combination of cancel_delayed_work and cancel_work_sync
in Slab during DOWN_PREPARE instead of cancel_rearming_delayed work,
which tries to flush_workqueue in vain, in a frozen context.
Note:
o The singlethreaded workqueues will not be frozen.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Johannes Berg <johannes@sipsolutions.net>
--
include/linux/workqueue.h | 12 +++++-------
kernel/workqueue.c | 35 ++++++++++++++++++++++++++---------
mm/slab.c | 10 +++-------
3 files changed, 34 insertions(+), 23 deletions(-)
Index: linux-2.6.21-rc5/include/linux/workqueue.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/workqueue.h
+++ linux-2.6.21-rc5/include/linux/workqueue.h
@@ -9,7 +9,6 @@
#include <linux/linkage.h>
#include <linux/bitops.h>
#include <asm/atomic.h>
-
struct workqueue_struct;
struct work_struct;
@@ -112,12 +111,11 @@ struct execute_work {
clear_bit(WORK_STRUCT_PENDING, work_data_bits(work))
-extern struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread,
- int freezeable);
-#define create_workqueue(name) __create_workqueue((name), 0, 0)
-#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0)
+extern struct workqueue_struct *create_workqueue(const char *name);
+
+extern struct workqueue_struct *create_freeze_exempted_workqueue(const char *name, int freeze_exempt_events);
+
+extern struct workqueue_struct *create_singlethread_workqueue(const char *name);
extern void destroy_workqueue(struct workqueue_struct *wq);
Index: ...EXPORT_SYMBOL() has a cost, and these functions are simple wrappers. How about static inline? Oleg. -
On Thu, 5 Apr 2007 15:57:14 +0400 something. -
Clean up workqueue.c from the perspective of freezer-based cpu-hotplug.
This patch
o Removes cpu_populated_map as cpu_online_map is safe to use.
o removes cwq_should_stop and uses kthread_should_stop instead.
o Reintroduces take_over_work from the worker_thread of a downed
cpu. This means that all non-singlethreaded workqueues *have* to
be frozen to avoid any races.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
--
kernel/workqueue.c | 118 ++++++++++++++++++++++-------------------------------
1 files changed, 50 insertions(+), 68 deletions(-)
Index: linux-2.6.21-rc4/kernel/workqueue.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/workqueue.c
+++ linux-2.6.21-rc4/kernel/workqueue.c
@@ -47,10 +47,6 @@ struct cpu_workqueue_struct {
struct workqueue_struct *wq;
struct task_struct *thread;
- int status;
- #define CWQ_RUNNING 0
- #define CWQ_SHOULD_STOP 1
- #define CWQ_STOPPED 2
int run_depth; /* Detect run_workqueue() recursion depth */
} ____cacheline_aligned;
@@ -75,7 +71,6 @@ static LIST_HEAD(workqueues);
static int singlethread_cpu __read_mostly;
static cpumask_t cpu_singlethread_map __read_mostly;
/* optimization, we could use cpu_possible_map */
-static cpumask_t cpu_populated_map __read_mostly;
/* If it's single threaded, it isn't in the list of workqueues. */
static inline int is_single_threaded(struct workqueue_struct *wq)
@@ -86,7 +81,7 @@ static inline int is_single_threaded(str
static const cpumask_t *wq_cpu_map(struct workqueue_struct *wq)
{
return is_single_threaded(wq)
- ? &cpu_singlethread_map : &cpu_populated_map;
+ ? &cpu_singlethread_map : &cpu_online_map;
}
static
@@ -270,32 +265,16 @@ static void run_workqueue(struct cpu_wor
spin_unlock_irq(&cwq->lock);
}
-/*
- * NOTE: the caller must not touch *cwq if this func returns true
- */
-static int ...We can't freeze workqueue if some work_struct continuously re-queues itself.
Perhaps this is not a problem (I don't see a good reason for such a behaviour),
I still think that wait_to_die + bind_cpu is unneeded complication.
Why can't we do the following:
static int worker_thread(void *__cwq)
{
...
for (;;) {
try_to_freeze();
prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
if (!kthread_should_stop() && list_empty(&cwq->worklist))
schedule();
finish_wait(&cwq->more_work, &wait);
if (kthread_should_stop(cwq))
break;
run_workqueue(cwq);
}
return 0;
}
Well, this is a matter of taste, but I don't like this change. Now we
should add kthread_bind/wake_up_process calls to __create_workqueue()
This is wrong. CPU_UP_PREPARE doesn't call init_cpu_workqueue().
Easy to fix, but I personally think is is better to initialize
Deadlockable. Suppose that the freezing is in progress, cwq->thread is not
frozen yet. cleanup_workqueue_thread() calls thaw_process(cwq->thread),
This means that the work_struct on single_threaded wq can't use any of
__create_workqueue()
destroy_workqueue()
flush_workqueue()
cancel_work_sync()
, they are all racy wrt workqueue_cpu_callback(), and we don't freeze
single_threaded workqueues. This is bad.
Probaly we should:
- freeze all workqueues, even the single_threaded ones.
- helper_init() explicitely does __create_workqueue(FE_ALL).
this means that we should never use the functions above
with this workqueue.
Oleg.
-
cleanup_workqueue_thread (in Gautham's patches) does this:
thaw_process()
kthread_stop()
There is a chance that after thaw_process() [but before we have posted
the kthread_stop], worker thread can come out of the refrigerator and start
running run_workqueue() - that will simply prolong the subsequent
kthread_stop() and the system freeze time.
We could do what you are suggesting if the thaw_process() part was
integrated into kthread_stop() code [basically thaw_process after
Good catch! Can cleanup_workqueue_thread take some mutex to serialize
with freezer here (say freezer_mutex)?
Or better, since this seems to be a general problem for anyone who wants to do a
kthread_stop, how abt modifying kthread_stop like below:
kthread_stop(p)
{
int old_exempt_flags;
task_lock(p);
old_exempt_flags = p->flags;
p->flags |= PFE_ALL; /* Exempt 'p' from being frozen? */
task_unlock(p);
kthread_stop_info.k = p;
thaw_process(p);
wait_for_completion();
}
Marking 'p' as exempt shouldn't be a problem because freezer would wait
on the thread doing kthread_stop() anyway before declaring system as
The workqueue_mutex() should serialize these with workqueue_cpu_callback() to
Yes I agree, we should target freezing everybody here. It feels much
safer that way!
The only dependency I have seen is stop_machine() called after processes
are frozen. It needs the services of a workqueue to create kthreads. We
need to atleast exempt that worker thread from being frozen. Hopefully
the list of such non-freezable singlethreaded workqueues will be tiny
Ok you seem to have covered what I went out to say above!
Thx for your review so far ..
--
Regards,
vatsa
-
I think it would be nice to do. I believe we can cleanup ksoftirqd() and migration_thread() as well (kill wait_to_die: loop). Probably it is better to introduce a new helper for that, kthread_thaw_stop() or What if is_single_threaded(wq) == true? In that case we should call flush_cpu_workqueue(cpu) only if cpu == singlethread_cpu, otherwise this is unneeded and wrong, because per_cpu_ptr(wq->cpu_wq, cpu) was I agree, we should mark this thread as non-freezable, but we can't modify p->flags, this is racy. "current" owns its ->flags and it is not atomic. Note that thaw_process() checks frozen(p) when it clears PF_FROZEN. Actually, we should do this before destroy_workqueue() calls flush_workqueue(). Otherwise flush_cpu_workqueue() can hang forever in a similar manner. No, no, workqueue_mutex can't help. Just for example: CPU_UP_PREPARE completes and drops workqueue_mutex. __create_workqueue(wq) doesn't see the new cpu, it Good! Oleg. -
I doubt whether we can kill it in migration_thread, since that is another thread which is unfrozen for hotplug (stop_machine relies on its I suspected that we cannot modify p->flags just like that. How abt moving freezer exemption bits to a separate field, which is protected by Yep. I guess these are a class of freezer deadlocks very similar to vfork parent waiting on child case. I get a feeling these should become common outside of kthread too (A waits on B for something, B gets frozen, which means A won't freeze causing freezer to fail). Can freezer detect this Ok ..sure. -- Regards, vatsa -
I changed my mind :) The problem is general, I am starting to believe
Probably yes... In that case it makes sense to move PF_FREEZER_SKIP/PF_FROZEN
to the new field as well.
Perhaps we can ignore this problem for now. Freezer is not 100% reliable
anyway. For example,
worker_thread:
for (;;) {
try_to_freeze();
prepare_to_wait();
if (...)
schedule();
finish_wait();
}
This is racy, we can miss freeze_process()->signal_wake_up() if it happens
between try_to_freeze() and prepare_to_wait(). We have to check TIF_FREEZE
before entering schedule() if we want to fix this race.
Should we? I don't know. This will uglify the code, and the probability
of this race is very low.
Oleg.
-
yes i agree. Although is some cases like destroy_workqueue, we need to mark the target thread non-freezable way before we call kthread_stop (as I wonder if there are some reserved fields in task_struct which we can Would be nice to fix IMO. Atleast serves to show "how to make your code freezer friendly". -- Regards, vatsa -
This is funny. I "noticed" this race a long ago, when the ->freezeable flag was introduced. However, looking at 2.6.20 I see that the patch was correct, and this race was in fact introduced by me in [PATCH 1/1] workqueue: don't migrate pending works from the dead CPU http://marc.info/?l=linux-kernel&m=117062192709871 I'll send a fix on weekend. Oleg. -
I wonder if there is some value in "enforcing" an order in which processes get frozen i.e freeze A first before B. That may solve the deadlocks we have been discussing wrt kthread_stop and flush_workqueue as well. The idea is similar to how deadlock wrt multiple locks are solved - where a ordering is enforced. Take Lock A first before Lock B. If process A waits on B (like in kthread_stop or flush_workqueue), then if we: 1. Insert A and B in a list (freeze_me_first_list) 2. Have freezer scan freeze_me_first_list before the master task-list, so that it: 2a. "freezes A and waits for A to get frozen" first 2b. "freezes B and waits for B to get frozen" next then we would avoid the nastiness of "B getting frozen first and A doesnt freeze because of that" with lesser code changes? -- Regards, vatsa -
In that case, A should insert the dependency into the freeze_me_first_list as B is unaware of the dependency yet. What if by the time A has inserted the dependency B is already frozen? Can very well happen right? A: B: freezer ------------------------------------------------------------------------------- *Check the list. don't find A/B *Mark A freezeable. *Mark B freezeable. *try_to_freeze(); *insert A, B . into the . list. . . . * wait_for_ . completion(done); . /* Freezer fails at this . * point . */ . . *complete(done); *try_to_freeze(); Example: A = a thread doing flush_workqueue. B = worker thread. Of course, we can always use the freezer_skip around this wait_for_completion as well as long as the thread A is not marked PF_NOFREEZE. But with multiple freeze events, it won't be as simple as Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
Perhaps we can add "atomic_t xxx" to task_struct.
int freezing(struct task_struct *p)
{
return test_tsk_thread_flag(p, TIF_FREEZE)
&& atomic_read(&p->xxx) == 0;
}
void xxx_start(struct task_struct *p)
{
atomic_inc(p->xxx);
thaw_process(p);
}
xxx_end(struct task_struct *p)
{
atomic_dec(p->xxx);
}
Now,
xxx_start(p);
... wait for something which depends on p...
xxx_end(p);
Of course we need other changes, freeze_process() should check ->xxx, etc.
I am not sure this makes sense.
Oleg.
-
I think this is racy.
Say, if the task which does xxx_start(p) [let's call it task q] is not freezeable, and task p is
already frozen when q calls xxx_start, then we might be in a situation
where
- Freezer has declared the whole system to be frozen. Hence the thread
issuing the 'freeze'(suspend/hotplug) can go ahead and do whatever it wants to.
- Task 'p' which was supposed to be frozen , is now running and
*surprise* We have a thread running on a cpu which ain't there any more!
I feel we need some kind of a global rwlock.
DEFINE_RWLOCK(freezer_status_lock);
int xxx_start(struct task_struct *p)
{
int ret = 0;
ret = read_trylock(&freezer_status_lock);
if(ret)
/* We've succeeded. So lets thaw the chap */
thaw_process(p);
/* If we've failed to acquire trylock, that means freezer doesn't
* depend on us.
* So lets simply wait without thawing p
*/
return ret;
}
void xxx_end(int state)
{
if(state)
read_unlock(&freezer_status_lock);
}
int try_to_freeze_tasks()
{
do {
/* The regular freezer code */
if (!todo && !write_trylock(&freezer_status_lock));
continue;
/* When the freezer goes back, it will find task 'p'
* woken up and hence wait for it to get frozen
*/
}while(todo);
}
void try_to_thaw_tasks()
{
.
.
.
write_unlock(&freezer_status_lock);
}
int state = xxx_start(p);
... wait for something which depends on p...
xxx_end(state);
However, now that I look at it again, I see that it will fail in our case
where we might need to nest the try_to_freeze_tasks call.
Hmm, we don't have a rwlock variant that allows multiple writers, now do
Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
This patch makes all the kernel_threads (except the migration thread)
freezeable for cpu hotplug.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
arch/i386/kernel/apm.c | 2 +-
drivers/block/loop.c | 2 +-
drivers/char/apm-emulation.c | 6 +++---
drivers/ieee1394/ieee1394_core.c | 2 +-
drivers/md/md.c | 2 +-
drivers/mmc/card/queue.c | 2 +-
drivers/mtd/mtd_blkdevs.c | 2 +-
drivers/scsi/libsas/sas_scsi_host.c | 2 +-
drivers/scsi/scsi_error.c | 2 +-
drivers/usb/storage/usb.c | 2 +-
10 files changed, 12 insertions(+), 12 deletions(-)
Index: linux-2.6.21-rc5/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/kernel/apm.c
+++ linux-2.6.21-rc5/arch/i386/kernel/apm.c
@@ -1395,7 +1395,7 @@ static void apm_mainloop(void)
add_wait_queue(&apm_waitqueue, &wait);
set_current_state(TASK_INTERRUPTIBLE);
- freezer_exempt(FE_ALL);
+ freezer_exempt(FE_SUSPEND_KPROBES);
for (;;) {
try_to_freeze();
schedule_timeout(APM_CHECK_TIMEOUT);
Index: linux-2.6.21-rc5/drivers/block/loop.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/block/loop.c
+++ linux-2.6.21-rc5/drivers/block/loop.c
@@ -587,7 +587,7 @@ static int loop_thread(void *data)
* hence, it mustn't be stopped at all
* because it could be indirectly used during suspension
*/
- freezer_exempt(FE_ALL);
+ freezer_exempt(FE_SUSPEND_KPROBES);
set_user_nice(current, -20);
Index: linux-2.6.21-rc5/drivers/char/apm-emulation.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/char/apm-emulation.c
+++ linux-2.6.21-rc5/drivers/char/apm-emulation.c
@@ -336,7 +336,7 @@ apm_ioctl(struct inode * inode, struct f
* threads.
*/
flags = ...hm, shouldnt the make be frozen immediately? doesnt the 'please freeze ASAP' flag get propagated to all tasks, immediately? After that point any cloning activity should duplicate that we could definitely do that - but i think it should be unnecessary: if we mark all tasks as PF_FREEZING atomically, that should result in _every_ task immediately dropping dead (once they get back from TASK_UNINTERRUPTIBLE). No excuses. If there's some longer delay then that can only be explained by some new cloned task/thread slipping through the net somehow. (i.e. the PF_FREEZING flag not being duplicated across fork?) i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the freezer: are they assumed frozen immediately, or do we wait until they notice their PF_FREEZING and go into try_to_freeze()? I'd expect TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be the primary source for freezing 'failures') Ingo -
afaics, setting the 'please freeze asap' flag is racy wrt dup_task_struct (where the child's tsk->thread_info->flags are copied from its parent?). Secondly, from what I understand, it takes a 'flag to be set + signal marked pending' for the child task to be frozen. If that is the case, then copy_process may not propogae the signal to the child, which could mean mean that we can be in a catch-up game in freeze_processes, trying to freeze processes we didnt see in earlier passes. I think copy_process() can check for something like this: write_lock_irq(&tasklist_lock); ... if (freezing(current)) freeze_process(p); /* function exported by freezer */ ... write_unlock_irq(&tasklist_lock); -- Regards, vatsa -
yeah. (is that safe with tasklist_lock held?) i'm wondering whether we could do even better than the signal approach. I _think_ the best approach would be to only wait for tasks that are _on the runqueue_. I.e. any task that has scheduled away with TASK_UNINTERRUPTIBLE (and might not be able to process signal events for a long time) is still freezable because it scheduled away. the only freeze-unsafe task is one that is on the runqueue, executing some unknown kernel code. But the number of those is typically pretty low, even with very large make -j task-counts. now, the current approach approximates that set of tasks, but not completely: in particular TASK_UNINTERRUPTIBLE sleeping threads can introduce arbitrary long delays (and hence freezing failures). [in addition to any fork-related 'leaks' of freeze-notification] Ingo -
I am slightly uncomfortable with "not waiting for tasks inside the kernel to get out" part, even if it that is done only for TASK_UNINTERRUPTIBLE tasks. For ex: consider this: flush_workqueue() <- One of biggest offenders of lock_cpu_hotplug() to date for_each_online_cpu(cpu) flush_cpu_workqueue TASK_UNINTERRUPTIBLE sleep If we don't wait for this thread from being frozen "voluntarily" (because it is in TASK_UNINTERRUPTIBLE sleep), then flush_workqueue is clearly racy wrt cpu hotplug. I would imagine other situations like this are possible where "not waiting for everyone to /voluntarily/ quiece" can break cpu hotplug. In fact, the biggest reason why we are moving to freezer based hotplug is the fact that it quiesces everyone, leading to (hopefully) zero race conditions. -- Regards, vatsa -
I quickly coded this up and ran my tests again. Unfortunately, the results are negative. I printk'd the state of the unfrozen task and this is how the serial console output looks like: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stopping user space processes timed out after 20 seconds (7 tasks refusing to freeze) make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE make:TASK_UNINTERRUPTIBLE Restarting tasks ... done. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I couldn't see any reduction in the number of freeze-failures. Thus Ingo right. These failures are due to the TASK_UNINTERRUPTIBLE tasks which fail to freeze within the The other option would be to have another state equivalent to TASK_UNINTERRUPTIBLE_NOFREEZE, so that Ingo's solution can be applied for regular TASK_UNINTERRUPTIBLE tasks. Thus TASK_UNINTERRUPTBLE tasks from a for_each_online_cpu context should now do a TASK_UNINTERRUPTIBLE_NOFREEZE instead. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
ok. But the only real problem would be for_each_online_cpu() loops that
might sleep, correct? I did a quick audit and those seem to be in the
minority by a factor of 1:10.
So ... to make the audit obviously safe, how about mechanically
converting 100% of the for_each_online_cpu() loops to something like:
mask = get_each_online_cpu_mask();
for_each_cpu_mask(mask) {
...
}
put_each_online_cpu_mask(mask);
where get_each_online_cpu_mask() also does a preempt_disable()
implicitly, and put_each_online_cpu_mask() does a preempt_enable().
(Note that no locking is needed - only preemption-disabling.)
the 10% loops that _can_ schedule would trigger the __might_sleep()
atomicity test in schedule()), and those would have to be converted a
bit more cleverly, on a case by case basis. (for example a number of
them might not even have to sleep on the for_each_online_cpu() loop)
hm?
Ingo
-
I would shy from saying that "that's the only problem". It could also be for_each_cpu_mask(some_mask) do_something_which_sleeps(); where some_mask is suppsoed to represent a subset of online cpus. For ex: policy->cpus is a mask maintained by cpufreq, which it iterates thr' whenever changing cpu frequency. The cpus in that mask are supposed to be a subset of online cpus (with CPU_ONLINE/DEAD handlers in cpufreq.c adjusting the mask upon hotplug). We could adopt a similar trick (get/put_each_cpu_mask) that you describe The real question is how do we convert over those sleeping for_each_cpu_mask users (for ex: flush_workqueue) such that they don't block freezer/hotplug for long periods? One option is to probably rewrite them to understand that online[or a derived]_map could have changed everytime they come out of a (un)interruptible sleep and deal with arising races appropriately. That would mean a bit of maintenance headache unfortunately (I was hoping freezer will lead to zero maintenance headache :) Besides, how problematic is this in practise (that threads sleep for extended durations in TASK_INTERRUPTIBLE state breaking freezer/suspend/hotplug)? Should we ignore this for the timebeing and take up later as and when users report problems? -- Regards, vatsa -
In my case, the problem of freezer failing was due to the vfork freezer race in do_fork. The parent task was waiting_for_completion() while the child was frozen. Rafael has already sent out the fix for that, but for some reason I don't see it in the -mm. With that fix, freezer and hence hotplug succeeds even when I am running Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
Good to know that! So Ingo/Rafael, should we ignore the problem of "TASK_UNINTERRUPTIBLE sleepers can break freezer" for the timebeing? Mainly because its not trivial to solve and we need to tackle it case by case basis as and when users report specific problems. -- Regards, vatsa -
yeah, i think you are right - and the TASK_UNINTERRUPTIBLE thing is not only quite complex, it also breaks the symmetry of freezer use (sw-suspend obviously cannot freeze uninterruptible tasks). We should watch whether the current latency of freezing is good enough in practice. Ingo -
Ok. Do you have any specific tests in mind which I can run and post the numbers? As of now, I've been stressing the system with (kernbench + ondemand governor) Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
i suspect a fork-intensive application like kernbench should be close to the worst-case already. A more IO-intensive workload would maximize the uninterruptible-sleep latencies perhaps? Ingo -
Ok. I hope the NFS read/write in a tight loop should do the trick. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
I will try again Vatsa's suggestion of having a if (freezing(current)) freeze_process(p); in copy processes() and check if we can do away with the fork race. the TASK_UNINTERRUPTIBLE state for more than the timeout period. The kernel threads have to call try_to_freeze() explicitly and for the userspace tasks, try_to_freeze() is called in get_signal_to_deliver(). The system is considered frozen only when *all* the freezeable tasks call try_to_freeze() one way or the other. This is unlikely in case of a TASK_UNINTERRUPTIBLE task. Question is can we have some task in TASK_UNINTERRUPTIBLE state for such Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
yes, easily so - just have a really long disk queue. Or really heavy mutex contention. i really think we should add a freezing hook to schedule too (no need to change anything else - just add a PF_FREEZE check into the schedule() function) - and add a wakeup method that moves TASK_UNINTERRUPTIBLE tasks to the runqueue but does not touch their task->state. ( the copy_process() handling is still needed, so that no new tasks without PF_FREEZE get created that could slip out of control. ) Such a wakeup will cause them to execute again but without disturbing their task->state value, at which point a second hook in schedule() could catch and freeze them. (and can restart the sleep afterwards, if the task is still TASK_UNINTERRUPTIBLE) i.e. two easy hooks in schedule() plus a try_to_wake_up() variant that does not touch p->state. In fact the second hook could be used instead of the first one so one might be enough. (I can code up the scheduler bits for you if that would be helpful.) Ingo -
Yes, something like this would be necessary to handle TASK_UNINTERRUPTIBLE tasks. _Still_, currently we just fail the freezing if there are uninterruptible tasks, because they can hold locks and may potentially deadlock with device drivers' .suspend() or .resume() routines (please remember that the freezer is used for suspending in the first place :-)). I think we can freeze uninterruptible tasks for the CPU hotplug, but we should We can catch them while we're freezing kernel threads (provided that the kernel threads themselves aren't forking like mad, but this doesn't seem to happen in practice). Greetings, Rafael -
They have to wait. We need all tasks frozen _with no locks held_. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Ok, are some luck. I panic()ed on freezer fail and checked the stacktrace of the unfrozen tasks. The stacktrace of each one looks like: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PID: 7697 TASK: cc354a70 CPU: 7 COMMAND: "make" #0 [cc37fe50] schedule at c0431752 #1 [cc37fec4] wait_for_completion at c04318d0 #2 [cc37ff24] do_fork at c01249a6 #3 [cc37ff94] sys_vfork at c0103c1f #4 [cc37ffb4] system_call at c0104d8d ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rafael had sent out a patch to fix the vfork race, which can be found at http://lkml.org/lkml/2007/3/1/212 However, the following hunk seems to be missing from the latest -mm. @@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags, tracehook_report_clone_complete(clone_flags, nr, p); if (clone_flags & CLONE_VFORK) { + freezer_do_not_count(); wait_for_completion(&vfork); + freezer_count(); tracehook_report_vfork_done(p, nr); } } else { Rafael / Andrew, Any reasons for leaving this hunk out? I reran my tests with this hunk applied, and it work just fine. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
Ok, we might be in some luck. I panic()ed on freezer fail and checked the stacktrace of the unfrozen tasks. The stacktrace of each one looks like: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PID: 7697 TASK: cc354a70 CPU: 7 COMMAND: "make" #0 [cc37fe50] schedule at c0431752 #1 [cc37fec4] wait_for_completion at c04318d0 #2 [cc37ff24] do_fork at c01249a6 #3 [cc37ff94] sys_vfork at c0103c1f #4 [cc37ffb4] system_call at c0104d8d ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rafael had sent out a patch to fix the vfork race, which can be found at http://lkml.org/lkml/2007/3/1/212 However, the hunk @@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags, tracehook_report_clone_complete(clone_flags, nr, p); if (clone_flags & CLONE_VFORK) { + freezer_do_not_count(); wait_for_completion(&vfork); + freezer_count(); tracehook_report_vfork_done(p, nr); } } else { Seems to be missing in the latest -mm's. Rafael / Andrew, Any reasons for leaving this hunk out? Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" -
No, absolutely not. It's needed. Moreover, freezer-fix-vfork-problem.patch from the broken-out -rc5-mm3 contains it, so some other patch must have reverted this change. [looks] Ah, it's utrace-prep-2.patch . Andrew? -
On Tue, 3 Apr 2007 21:34:28 +0200
urgh, I screwed up, sorry.
utrace-prep-2 reverts a bit of the underlying tree so that the utrace
patches (which are against mainline) don't throw a tremendous reject which
has to be fixed each time I pull Roland's tree. I'm supposed to reapply
that change after the utrace patches but forgot.
--- a/kernel/fork.c~undo-utrace-prep-2
+++ a/kernel/fork.c
@@ -1387,7 +1387,9 @@ long do_fork(unsigned long clone_flags,
tracehook_report_clone_complete(clone_flags, nr, p);
if (clone_flags & CLONE_VFORK) {
+ freezer_do_not_count();
wait_for_completion(&vfork);
+ freezer_count();
if (likely(is_user))
tracehook_report_vfork_done(p, nr);
}
_
-
btw., how about pushing utrace to v2.6.22, is anything big holding that up? (other than Roland's modesty, which we should gently but firmly ignore in this case =B-) Ingo -
