login
Login
/
Register
Search
Forums
News
Blogs
Features
Site
Home
»
Mailing list archives
»
linux-kernel
»
2008
»
July
»
16
Re: [PATCH] stopmachine: add stopmachine_timeout v3
view
thread
!MAILaRCHIVE_VOTE_RePLACE
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
[view in full thread]
From:
Peter Zijlstra <peterz@...>
To: Hidetoshi Seto <seto.hidetoshi@...>
Cc: Max Krasnyansky <maxk@...>, <linux-kernel@...>, Rusty Russell <rusty@...>, Heiko Carstens <heiko.carstens@...>, Jeremy Fitzhardinge <jeremy@...>, Christian Borntraeger <borntraeger@...>, <virtualization@...>, Zachary Amsden <zach@...>
Subject:
Re: [PATCH] stopmachine: add stopmachine_timeout v3
Date: Wednesday, July 16, 2008 - 3:33 am
On Wed, 2008-07-16 at 15:51 +0900, Hidetoshi Seto wrote:
quoted text
> If stop_machine() invoked while one of onlined cpu is locked up > by some reason, stop_machine cannot finish its work because the > locked cpu cannot stop. This means all other healthy cpus > will be blocked infinitely by one dead cpu. > > This patch allows stop_machine to return -EBUSY with some printk > messages if any of stop_machine's threads cannot start running on > its target cpu in time. You can enable this timeout via sysctl. > > v3: > - set stopmachine_timeout default to 0 (= never timeout) > > v2: > - remove fix for warning since it will be fixed upcoming typesafe > patches > - make stopmachine_timeout from secs to msecs > - allow disabling timeout by setting the stopmachine_timeout to 0 > > Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
I really don't like this, it means the system is really screwed up and doesn't deserve to continue.
quoted text
> --- > kernel/stop_machine.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-- > kernel/sysctl.c | 15 +++++++++++++ > 2 files changed, 66 insertions(+), 3 deletions(-) > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 5b72c2b..77b7944 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -35,15 +35,18 @@ struct stop_machine_data { > }; > > /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */ > -static unsigned int num_threads; > +static atomic_t num_threads; > static atomic_t thread_ack; > +static cpumask_t prepared_cpus; > static struct completion finished; > static DEFINE_MUTEX(lock); > > +unsigned long stopmachine_timeout = 0; /* msecs, 0 = "never timeout" */ > + > static void set_state(enum stopmachine_state newstate) > { > /* Reset ack counter. */ > - atomic_set(&thread_ack, num_threads); > + atomic_set(&thread_ack, atomic_read(&num_threads)); > smp_wmb(); > state = newstate; > } > @@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata) > enum stopmachine_state curstate = STOPMACHINE_NONE; > int uninitialized_var(ret); > > + cpu_set(smp_processor_id(), prepared_cpus); > + > /* Simple state machine */ > do { > /* Chill out and ensure we re-read stopmachine_state. */ > @@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata) > } > } while (curstate != STOPMACHINE_EXIT); > > + atomic_dec(&num_threads); > local_irq_enable(); > do_exit(0); > } > @@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus) > int i, err; > struct stop_machine_data active, idle; > struct task_struct **threads; > + unsigned long limit; > + > + if (atomic_read(&num_threads)) { > + /* > + * previous stop_machine was timeout, and still there are some > + * unfinished thread (dangling stucked CPU?). > + */ > + return -EBUSY; > + } > > active.fn = fn; > active.data = data; > @@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus) > /* Set up initial state. */ > mutex_lock(&lock); > init_completion(&finished); > - num_threads = num_online_cpus(); > + atomic_set(&num_threads, num_online_cpus()); > set_state(STOPMACHINE_PREPARE); > > for_each_online_cpu(i) { > @@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus) > > /* We've created all the threads. Wake them all: hold this CPU so one > * doesn't hit this CPU until we're ready. */ > + cpus_clear(prepared_cpus); > get_cpu(); > for_each_online_cpu(i) > wake_up_process(threads[i]); > > + /* Wait all others come to life */ > + if (stopmachine_timeout) { > + limit = jiffies + msecs_to_jiffies(stopmachine_timeout); > + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > + if (time_is_before_jiffies(limit)) > + goto timeout; > + cpu_relax(); > + } > + } > + > /* This will release the thread on our CPU. */ > put_cpu(); > wait_for_completion(&finished); > @@ -169,10 +195,32 @@ kill_threads: > for_each_online_cpu(i) > if (threads[i]) > kthread_stop(threads[i]); > + atomic_set(&num_threads, 0); > mutex_unlock(&lock); > > kfree(threads); > return err; > + > +timeout: > + printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n", > + stopmachine_timeout); > + for_each_online_cpu(i) { > + if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id()) > + printk(KERN_CRIT "stopmachine: cpu#%d seems to be " > + "stuck.\n", i); > + /* Unbind threads */ > + set_cpus_allowed(threads[i], cpu_online_map); > + } > + > + /* Let threads go exit */ > + set_state(STOPMACHINE_EXIT); > + > + put_cpu(); > + /* no wait for completion */ > + mutex_unlock(&lock); > + kfree(threads); > + > + return -EBUSY; /* canceled */ > } > > int stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus) > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 2911665..3c7ca98 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -146,6 +146,10 @@ extern int no_unaligned_warning; > extern int max_lock_depth; > #endif > > +#ifdef CONFIG_STOP_MACHINE > +extern unsigned long stopmachine_timeout; > +#endif > + > #ifdef CONFIG_PROC_SYSCTL > static int proc_do_cad_pid(struct ctl_table *table, int write, struct file *filp, > void __user *buffer, size_t *lenp, loff_t *ppos); > @@ -813,6 +817,17 @@ static struct ctl_table kern_table[] = { > .child = key_sysctls, > }, > #endif > +#ifdef CONFIG_STOP_MACHINE > + { > + .ctl_name = CTL_UNNUMBERED, > + .procname = "stopmachine_timeout", > + .data = &stopmachine_timeout, > + .maxlen = sizeof(unsigned long), > + .mode = 0644, > + .proc_handler = &proc_doulongvec_minmax, > + .strategy = &sysctl_intvec, > + }, > +#endif > /* > * NOTE: do not add new entries to this table unless you have read > * Documentation/sysctl/ctl_unnumbered.txt
--
unsubscribe notice
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
Messages in current thread:
[PATCH] stopmachine: add stopmachine_timeout
, Hidetoshi Seto
, (Mon Jul 14, 3:52 am)
[PATCH] stopmachine: add stopmachine_timeout v4
, Hidetoshi Seto
, (Thu Jul 17, 2:12 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v4
, Max Krasnyansky
, (Thu Jul 17, 3:09 am)
[PATCH] stopmachine: add stopmachine_timeout v2
, Hidetoshi Seto
, (Wed Jul 16, 12:27 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Jeremy Fitzhardinge
, (Wed Jul 16, 6:11 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Hidetoshi Seto
, (Wed Jul 16, 11:40 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Rusty Russell
, (Fri Jul 18, 12:18 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Jeremy Fitzhardinge
, (Thu Jul 17, 1:37 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Max Krasnyansky
, (Wed Jul 16, 2:23 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v2
, Hidetoshi Seto
, (Wed Jul 16, 2:35 am)
[PATCH] stopmachine: add stopmachine_timeout v3
, Hidetoshi Seto
, (Wed Jul 16, 2:51 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v3
, Peter Zijlstra
, (Wed Jul 16, 3:33 am)
Re: [PATCH] stopmachine: add stopmachine_timeout v3
, Hidetoshi Seto
, (Wed Jul 16, 4:12 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Christian Borntraeger
, (Mon Jul 14, 7:51 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Mon Jul 14, 8:34 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Jeremy Fitzhardinge
, (Mon Jul 14, 2:56 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Heiko Carstens
, (Mon Jul 14, 5:20 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Hidetoshi Seto
, (Mon Jul 14, 10:24 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Max Krasnyansky
, (Mon Jul 14, 10:37 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Max Krasnyansky
, (Mon Jul 14, 10:24 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Tue Jul 15, 4:09 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Christian Borntraeger
, (Wed Jul 16, 5:15 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Max Krasnyansky
, (Tue Jul 15, 4:51 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Heiko Carstens
, (Tue Jul 15, 4:39 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Heiko Carstens
, (Tue Jul 15, 2:09 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Mon Jul 14, 9:14 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Mon Jul 14, 6:43 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Hidetoshi Seto
, (Mon Jul 14, 9:11 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Tue Jul 15, 3:50 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Hidetoshi Seto
, (Wed Jul 16, 12:05 am)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Rusty Russell
, (Sun Jul 20, 5:45 am)
[PATCH] stopmachine: allow force progress on timeout
, Hidetoshi Seto
, (Mon Jul 21, 11:28 pm)
Re: [PATCH] stopmachine: add stopmachine_timeout
, Hidetoshi Seto
, (Mon Jul 14, 4:19 am)
Navigation
Create content
Mailing list archives
Recent posts
Popular discussions
linux-kernel
:
Greg KH
[GIT PATCH] driver core patches against 2.6.24
Tarkan Erimer
Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Bart Van Assche
Integration of SCST in the mainstream Linux kernel
Jeff Garzik
Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
git
:
linux-netdev
:
David Miller
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock().
Arjan van de Ven
Re: [GIT]: Networking
Gerrit Renker
[PATCH 15/37] dccp: Set per-connection CCIDs via socket options
Natalie Protasevich
[BUG] New Kernel Bugs
openbsd-misc
:
Colocation donated by:
Who's online
There are currently
1 user
and
637 guests
online.
Online users
olecom
Syndicate