The following patches are against 2.6.21.rc6-mm1. Hopefully that is enough to catch most of the recent development activity. I am aiming to remove all kernel threads that handle signals from user space, to remove all calls to daemonize and kernel_thread from non-core kernel code. kernel thrreads handling signals from user space is a problem because it makes the kernel thread part of the user/kernel API which make changing things difficult and it breaks as soon as you are inside of a pid namespace because you won't be able to see your kernel thread. Calling kernel_thread has problems because it returns a pid_t value which once we get to the pid namespace is context depending so it cannot be used to globally identify a process. kernel_thread is also a problem because it traps user space state and requires us to call daemonize to free that state. daemonize is a maintenance problem because every time you play with user space state and limiting things you need to remember to update daemonize. Occasionally it has taken years like in the case of the mount namespace before someone realizes they need to update it. With the kthread api we no longer need daemonize. In addition we don't want kernel threads visible in anything but the initial pid namespace or they will hold a reference to a child pid namespace. However calling kernel_thread from a non-kernel parent in a child pid namespace will give the thread a pid in the child pid namespace, and there is nothing daemonize can do about it. So daemonize appears impossible to support going forward, and I choose to remove all of it's callers rather than attempt to support it. Eric -
Looks like you were missing at least the pcie hotplug driver. Another
one of the horrible thread abuses in drivers/pci/hotpug.
- full conversion to kthread infrastructure
- use wake_up_process to wake the thread up
Like most pci hotplug drivers it still uses very race non-atomic variable
assignment to communicated with the thread, but that's something the
maintainers should look into.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c 2007-04-22 11:36:58.000000000 +0200
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c 2007-04-22 11:42:56.000000000 +0200
@@ -32,14 +32,13 @@
#include <linux/types.h>
#include <linux/smp_lock.h>
#include <linux/pci.h>
+#include <linux/kthread.h>
#include "../pci.h"
#include "pciehp.h"
static void interrupt_event_handler(struct controller *ctrl);
-static struct semaphore event_semaphore; /* mutex for process loop (up if something to process) */
-static struct semaphore event_exit; /* guard ensure thread has exited before calling it quits */
-static int event_finished;
+static struct task_struct *pciehpd_event_thread;
static unsigned long pushbutton_pending; /* = 0 */
static unsigned long surprise_rm_pending; /* = 0 */
@@ -93,8 +92,9 @@ u8 pciehp_handle_attention_button(u8 hp_
info("Button ignore on Slot(%s)\n", slot_name(p_slot));
}
+ /* signal event thread that new event is posted */
if (rc)
- up(&event_semaphore); /* signal event thread that new event is posted */
+ wake_up_process(pciehpd_event_thread);
return 0;
@@ -135,8 +135,9 @@ u8 pciehp_handle_switch_change(u8 hp_slo
taskInfo->event_type = INT_SWITCH_CLOSE;
}
+ /* signal event thread that new event is posted */
if (rc)
- up(&event_semaphore); /* signal event thread that new event is posted */
+ wake_up_process(pciehpd_eve...From: Eric W. Biederman <ebiederm@xmission.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/nfs/nfs4state.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 5fffbdf..d16393f 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -775,8 +775,6 @@ static int reclaimer(void *ptr) struct rpc_cred *cred; int status = 0; - allow_signal(SIGKILL); - /* Ensure exclusive access to NFSv4 state */ lock_kernel(); down_write(&clp->cl_sem); -- 1.5.0.g53756 -
From: Eric W. Biederman <ebiederm@xmission.com>
Modify startup of ipvs sync threads to use kthread_run
instead of a weird combination of calling kernel_thread
to start a fork_sync_thread whose hole purpose in life was
to call kernel_thread again starting the actually sync thread
which called daemonize.
To use kthread_run I had to move the name calcuation from
sync_thread into start_sync_thread resulting in a small
amount of code motion.
The result is simpler and more maintainable piece of code.
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
net/ipv4/ipvs/ip_vs_sync.c | 49 ++++++++++---------------------------------
1 files changed, 12 insertions(+), 37 deletions(-)
diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
index 7ea2d98..c4be9dc 100644
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -29,6 +29,7 @@
#include <linux/in.h>
#include <linux/igmp.h> /* for ip_mc_join_group */
#include <linux/udp.h>
+#include <linux/kthread.h>
#include <net/ip.h>
#include <net/sock.h>
@@ -750,34 +751,23 @@ static int sync_thread(void *startup)
DECLARE_WAITQUEUE(wait, current);
mm_segment_t oldmm;
int state;
- const char *name;
/* increase the module use count */
ip_vs_use_count_inc();
- if (ip_vs_sync_state & IP_VS_STATE_MASTER && !sync_master_pid) {
+ if (ip_vs_sync_state & IP_VS_STATE_MASTER && !sync_master_pid)
state = IP_VS_STATE_MASTER;
- name = "ipvs_syncmaster";
- } else if (ip_vs_sync_state & IP_VS_STATE_BACKUP && !sync_backup_pid) {
+ else if (ip_vs_sync_state & IP_VS_STATE_BACKUP && !sync_backup_pid)
state = IP_VS_STATE_BACKUP;
- name = "ipvs_syncbackup";
- } else {
+ else {
IP_VS_BUG();
ip_vs_use_count_dec();
return -EINVAL;
...Thanks Eric, I'll review this and get back to you shortly. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ -
On Thu, 19 Apr 2007 18:04:36 +0900 There still seems to be quite a lot of complexity in this driver's thread handling which could be removed if we did a full conversion to the kthread API. It all looks.... surprisingly complex in there. -
It is. There quite a few interesting oddities in this code: - creation of a forker thread. This is superflous when using the kthread infrastructure as a thread created by kthread_create always comes from our dedicated forker thread. - the infinite retry on failure looks very bogus, the system doesn't recover very well if you try to fork forever in a loop :) - a lot of very overlapping state variables. My reading of the code suggests that both a 'master' and 'backup' thread can run at the same time. I think the code would benefit a lot from totally separating these codepathes. - start_sync_thread and stop_sync_thread are called with unchecked user supplied arguments and bug if they don't match the expected values. While all this is under capable(CAP_NET_ADMIN) it still sounds like something to fix. - and the usual removal of semaphores and completions for startup/shutdown would benefit the code a lot, as for most thread users. -
From: Eric W. Biederman <ebiederm@xmission.com>
Currently md_thread calls allow_signal so it can receive a
SIGKILL but then does nothing with it except flush the
sigkill so that it not can use an interruptible sleep.
This whole dance is silly so remove the unnecessary
and broken signal handling logic.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/md/md.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1299c23..dfd0cb9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4542,17 +4542,11 @@ static int md_thread(void * arg)
*/
current->flags |= PF_NOFREEZE;
- allow_signal(SIGKILL);
while (!kthread_should_stop()) {
/* We need to wait INTERRUPTIBLE so that
* we don't add to the load-average.
- * That means we need to be sure no signals are
- * pending
*/
- if (signal_pending(current))
- flush_signals(current);
-
wait_event_interruptible_timeout
(thread->wqueue,
test_bit(THREAD_WAKEUP, &thread->flags)
--
1.5.0.g53756
-From: Eric W. Biederman <ebiederm@xmission.com>
Start the reclaimer thread using kthread_run instead
of a combination of kernel_thread and daemonize.
The small amount of signal handling code is also removed
as it makes no sense and is a maintenance problem to handle
signals in kernel threads.
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/lockd/clntlock.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index f4d45d4..83591f6 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/types.h>
#include <linux/time.h>
+#include <linux/kthread.h>
#include <linux/nfs_fs.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
@@ -153,7 +154,7 @@ nlmclnt_recovery(struct nlm_host *host)
if (!host->h_reclaiming++) {
nlm_get_host(host);
__module_get(THIS_MODULE);
- if (kernel_thread(reclaimer, host, CLONE_KERNEL) < 0)
+ if (IS_ERR(kthread_run(reclaimer, host, "%s-reclaim", host->h_name)))
module_put(THIS_MODULE);
}
}
@@ -166,9 +167,6 @@ reclaimer(void *ptr)
struct file_lock *fl, *next;
u32 nsmstate;
- daemonize("%s-reclaim", host->h_name);
- allow_signal(SIGKILL);
-
down_write(&host->h_rwsem);
/* This one ensures that our parent doesn't terminate while the
@@ -193,8 +191,6 @@ restart:
list_del_init(&fl->fl_u.nfs_fl.list);
/* Why are we leaking memory here? --okir */
- if (signalled())
- continue;
if (nlmclnt_reclaim(host, fl) != 0)
continue;
list_add_tail(&fl->fl_u.nfs_fl.list, &host->h_granted);
--
1.5.0.g53756
-Vetoed. Removing stuff just because it doesn't make sense to you is not acceptable. Signal handling in reclaimer threads is there in order to allow administrators to deal with the case where the server never comes up again. Trond -
Doesn't unmount handle that? Regardless kernel threads should be an implementation detail not a part of the user interface. If kernel threads are part of the user interface it makes them very hard to change. So it isn't that it doesn't make sense to me it is that it looks fundamentally broken and like a maintenance nightmare. I would rather kill kernel threads then try and simulate them when the kernel implementation has changed and kernel threads are not visible. If I could be convinced that signal handling in kernel threads is not something that will impede code modifications and refactoring I would have less of a problem, and might not care. With pid namespaces all kernel threads will disappear so how do we cope with the problem when the sysadmin can not see the kernel threads? Eric -
Then you have a usability problem. How does the sysadmin reboot the system if there is no way to shut down the processes that are hanging on an unresponsive filesystem? Trond -
On Thu, 19 Apr 2007 17:19:24 -0400 Using signals to communicate with kernel threads is fairly unpleasant, IMO. We have much simpler, faster and more idiomatic ways of communicating between threads in-kernel and there are better ways in which userspace can communicate with the kernel - system calls, for example... So I think generally any move which gets us away from using signals in Where's the hang? A user process is stuck on h_rwsem? If so, would it be appropriate to convert the user process to use down_foo_interruptible(), so that the operator can just kill the user process as expected, rather than having to futz around killing kernel threads? -
I have yet to see a proposal which did. Eric's patch was eliminating signals in kernel threads that used them without proposing any replacement mechanism or showing that he had plans to do so. That is a If an NFS server reboots, then the locks held by user processes on the client need to be re-established by when it comes up again. Otherwise, the processes that thought they were holding locks will suddenly fail. This recovery job is currently the done by a kernel thread. The question is then what to do if the server crashes again while the kernel thread is re-establishing the locks. Particularly if it never comes back again. Currently, the administrator can intervene by killing anything that has open files on that volume and kill the recovery kernel thread. You'll also note that lockd_down(), nfsd_down() etc all use signals to inform lockd(), nfsd() etc that they should be shutting down. Since the reclaimer thread is started by the lockd() thread using CLONE_SIGHAND, this means that we also automatically kill any lingering recovery threads whenever we shutdown lockd(). These mechanisms need to be replaced _before_ we start shooting down sigallow() etc in the kernel. Trond -
Possibly I just hadn't looked close enough. The signals looked like Maybe I'm missing something but I think you are referring to the semantics of do_group_exit in the presence of CLONE_THREAD. All sharing a sighand should do is cause the sharing of the signal handler. Causing allow_signal and disallow_signal to act on a group of threads instead of a single thread. I don't recall clone_sighand having any Reasonable if these mechanisms are not redundant. Thinking it through because everything having to do with nfs mounting and unmounting is behind the privileged mount operation this is not going to become an issue until we start allowing unprivileged nfs mounts. Because we cannot delegate control of nfs mount and unmount operations until then. Since signals do not pose a immediate barrier to forward progress like daemonize and kernel_thread we can leave things as is until we can sort this out. Eric -
Do they actually always disappear, or do we keep them in the init_pid_namespace? -- Dave -
In the init pid namespace but not in any of it's children. Eric -
From: Eric W. Biederman <ebiederm@xmission.com>
kthread_run replaces the kernel_thread and daemonize calls
during thread startup.
Calls to signal_pending were also removed as it is currently
impossible for the cpci_hotplug thread to receive signals.
CC: Scott Murray <scottm@somanetworks.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/pci/hotplug/cpci_hotplug_core.c | 22 +++++++---------------
1 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/drivers/pci/hotplug/cpci_hotplug_core.c b/drivers/pci/hotplug/cpci_hotplug_core.c
index 6845515..c620c7e 100644
--- a/drivers/pci/hotplug/cpci_hotplug_core.c
+++ b/drivers/pci/hotplug/cpci_hotplug_core.c
@@ -33,6 +33,7 @@
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/smp_lock.h>
+#include <linux/kthread.h>
#include <asm/atomic.h>
#include <linux/delay.h>
#include "cpci_hotplug.h"
@@ -521,17 +522,13 @@ event_thread(void *data)
{
int rc;
- lock_kernel();
- daemonize("cpci_hp_eventd");
- unlock_kernel();
-
dbg("%s - event thread started", __FUNCTION__);
while (1) {
dbg("event thread sleeping");
down_interruptible(&event_semaphore);
dbg("event thread woken, thread_finished = %d",
thread_finished);
- if (thread_finished || signal_pending(current))
+ if (thread_finished)
break;
do {
rc = check_slots();
@@ -562,12 +559,8 @@ poll_thread(void *data)
{
int rc;
- lock_kernel();
- daemonize("cpci_hp_polld");
- unlock_kernel();
-
while (1) {
- if (thread_finished || signal_pending(current))
+ if (thread_finished)
break;
if (controller->ops->query_enum()) {
do {
@@ -592,7 +585,7 @@ poll_thread(void *data)
static int
cpci_start_thread(void)
{
- int pid;
+ struct task_struct *task;
/* initialize our semaphores */
init_MUTEX_LOCKED(&event_semaphore);
@@ -600,14 +593,13 @@ cpci_start_thread(void)
thread_finished = 0...From: Eric W. Biederman <ebiederm@xmission.com>
Start the g4fand using kthread_run not a combination
of kernel_thread and deamonize. This makes the code
a little simpler and more maintainable.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/macintosh/therm_windtunnel.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/macintosh/therm_windtunnel.c b/drivers/macintosh/therm_windtunnel.c
index a1d3a98..5d888e7 100644
--- a/drivers/macintosh/therm_windtunnel.c
+++ b/drivers/macintosh/therm_windtunnel.c
@@ -36,6 +36,7 @@
#include <linux/i2c.h>
#include <linux/slab.h>
#include <linux/init.h>
+#include <linux/kthread.h>
#include <asm/prom.h>
#include <asm/machdep.h>
@@ -62,7 +63,6 @@ I2C_CLIENT_INSMOD;
static struct {
volatile int running;
struct completion completion;
- pid_t poll_task;
struct semaphore lock;
struct of_device *of_dev;
@@ -285,7 +285,6 @@ restore_regs( void )
static int
control_loop( void *dummy )
{
- daemonize("g4fand");
down( &x.lock );
setup_hardware();
@@ -323,7 +322,7 @@ do_attach( struct i2c_adapter *adapter )
if( x.thermostat && x.fan ) {
x.running = 1;
init_completion( &x.completion );
- x.poll_task = kernel_thread( control_loop, NULL, SIGCHLD | CLONE_KERNEL );
+ kthread_run( control_loop, NULL, "g4fand");
}
}
return ret;
--
1.5.0.g53756
-On Thu, 19 Apr 2007 01:58:48 -0600 I had a bit of trouble reviewing this one because I was laughing so hard at the attempted coding-style in that driver. Oh well. I continue creeping into Christoph's camp - there's quite a bit of open-coded gunk which would go away if we were to teach this driver about kthread_should_stop() and kthread_stop(), and the conversion looks awfully easy to do. It's a shame to stop here. Oh well, I guess at least this is some forward progress. -
My main problem with touching that driver is that I don't have the hardware to test. I'll try to find a user to play the ginea pig. Ben. -
From: Eric W. Biederman <ebiederm@xmission.com>
To start the nfsv4-delegreturn thread this patch uses
kthread_run instead of a combination of kernel_thread
and daemonize.
In addition allow_signal(SIGKILL) is removed from
the expire delegations thread.
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/nfs/delegation.c | 11 ++++-------
1 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 841c99a..7b9b88c 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -232,7 +232,6 @@ int nfs_do_expire_all_delegations(void *ptr)
struct nfs_delegation *delegation;
struct inode *inode;
- allow_signal(SIGKILL);
restart:
spin_lock(&clp->cl_lock);
if (test_bit(NFS4CLNT_STATE_RECOVER, &clp->cl_state) != 0)
@@ -310,8 +309,6 @@ static int recall_thread(void *data)
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_delegation *delegation;
- daemonize("nfsv4-delegreturn");
-
nfs_msync_inode(inode);
down_read(&clp->cl_sem);
down_write(&nfsi->rwsem);
@@ -350,18 +347,18 @@ int nfs_async_inode_return_delegation(struct inode *inode, const nfs4_stateid *s
.inode = inode,
.stateid = stateid,
};
- int status;
+ struct task_struct *task;
init_completion(&data.started);
__module_get(THIS_MODULE);
- status = kernel_thread(recall_thread, &data, CLONE_KERNEL);
- if (status < 0)
+ task = kthread_run(recall_thread, &data, "nfsv4-delegreturn");
+ if (IS_ERR(task))
goto out_module_put;
wait_for_completion(&data.started);
return data.result;
out_module_put:
module_put(THIS_MODULE);
- return status;
+ return PTR_ERR(task);
}
/*
--
1.5.0.g53756
-Again vetoed, for the same reason. Trond -
-
From: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/smbfs/smbiod.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/fs/smbfs/smbiod.c b/fs/smbfs/smbiod.c
index 3e61b44..67176af 100644
--- a/fs/smbfs/smbiod.c
+++ b/fs/smbfs/smbiod.c
@@ -298,8 +298,6 @@ out:
*/
static int smbiod(void *unused)
{
- allow_signal(SIGKILL);
-
VERBOSE("SMB Kernel thread starting (%d) ...\n", current->pid);
for (;;) {
--
1.5.0.g53756
-On Thu, 19 Apr 2007 01:59:03 -0600 Why is it unnecessary? afaict we can presently terminate smbiod with a SIGKILL, and this change will alter (ie: break) that behaviour? -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the startup of kecardd to use
kthread_run not a kernel_thread combination of kernel_thread
and daemonize. Making the code slightly simpler and more
maintainable.
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
arch/arm/kernel/ecard.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm/kernel/ecard.c b/arch/arm/kernel/ecard.c
index f1c0fb9..6c15f5f 100644
--- a/arch/arm/kernel/ecard.c
+++ b/arch/arm/kernel/ecard.c
@@ -40,6 +40,7 @@
#include <linux/device.h>
#include <linux/init.h>
#include <linux/mutex.h>
+#include <linux/kthread.h>
#include <asm/dma.h>
#include <asm/ecard.h>
@@ -263,8 +264,6 @@ static int ecard_init_mm(void)
static int
ecard_task(void * unused)
{
- daemonize("kecardd");
-
/*
* Allocate a mm. We're not a lazy-TLB kernel task since we need
* to set page table entries where the user space would be. Note
@@ -1058,13 +1057,14 @@ ecard_probe(int slot, card_type_t type)
*/
static int __init ecard_init(void)
{
- int slot, irqhw, ret;
+ struct task_struct *task;
+ int slot, irqhw;
- ret = kernel_thread(ecard_task, NULL, CLONE_KERNEL);
- if (ret < 0) {
+ task = kthread_run(ecard_task, NULL, "kecardd");
+ if (IS_ERR(task)) {
printk(KERN_ERR "Ecard: unable to create kernel thread: %d\n",
- ret);
- return ret;
+ PTR_ERR(task));
+ return PTR_ERR(task);
}
printk("Probing expansion cards\n");
--
1.5.0.g53756
-Looks good. Given that this is non-modular and there's no exit function there is no need for further action. -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch starts kbenpd using kthread_run replacing
a combination of kernel_thread and daemonize. Making
the code a little simpler and more maintainable.
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
net/bluetooth/bnep/core.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
index a9f1e88..de3caed 100644
--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -32,6 +32,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
+#include <linux/kthread.h>
#include <linux/sched.h>
#include <linux/signal.h>
#include <linux/init.h>
@@ -473,7 +474,6 @@ static int bnep_session(void *arg)
BT_DBG("");
- daemonize("kbnepd %s", dev->name);
set_user_nice(current, -15);
init_waitqueue_entry(&wait, current);
@@ -539,6 +539,7 @@ static struct device *bnep_get_device(struct bnep_session *session)
int bnep_add_connection(struct bnep_connadd_req *req, struct socket *sock)
{
+ struct task_struct *task;
struct net_device *dev;
struct bnep_session *s, *ss;
u8 dst[ETH_ALEN], src[ETH_ALEN];
@@ -598,9 +599,10 @@ int bnep_add_connection(struct bnep_connadd_req *req, struct socket *sock)
__bnep_link_session(s);
- err = kernel_thread(bnep_session, s, CLONE_KERNEL);
- if (err < 0) {
+ task = kthread_run(bnep_session, s, "kbnepd %s", dev->name);
+ if (IS_ERR(task)) {
/* Session thread start failed, gotta cleanup. */
+ err = PTR_ERR(task);
unregister_netdev(dev);
__bnep_unlink_session(s);
goto failed;
--
1.5.0.g53756
-On Thu, 19 Apr 2007 01:58:51 -0600
while (!atomic_read(&s->killed)) {
It's unusual to have a kernel thread which has a space in its name. That
could trip up infufficient-defensive userspace tools.
-Note that this also stands against a full kthread conversion. Marcel put my old patches for a full kthread conversion in, but they didn't deal properly with some of the premaure exit cases, and causes OOPSes. I don't remember what the problems where, but the case of a thread terminating earlier and possibly asynchronously is one of the cases we'll probably have to add to the kthread infrastructure before all uses of kernel_thread in drivers can be converted. -
This patch implements the kthread helper functions kthread_start
and kthread_end which make it simple to support a kernel thread
that may decided to exit on it's own before we request it to.
It is still assumed that eventually we will get around to requesting
that the kernel thread stop.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
include/linux/kthread.h | 23 +++++++++++++++++++++++
kernel/kthread.c | 18 ++++++++++++++++++
2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a8ea31d..4f1eff1 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -28,6 +28,29 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
void kthread_bind(struct task_struct *k, unsigned int cpu);
int kthread_stop(struct task_struct *k);
+/**
+ * kthread_start - create and wake a thread.
+ * @threadfn: the function to run until kthread_should_stop().
+ * @data: data ptr for @threadfn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * get_task_struct() and wake_up_process. kthread_start should be paired
+ * with kthread_end() so we don't leak task structs.
+ *
+ * Returns the kthread or ERR_PTR(-ENOMEM).
+ */
+#define kthread_start(threadfn, data, namefmt, ...) \
+({ \
+ struct task_struct *__k \
+ = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
+ if (!IS_ERR(__k)) { \
+ get_task_struct(__k); \
+ wake_up_process(__k); \
+ } \
+ __k; \
+})
+int kthread_end(struct task_struct *k);
static inline int __kthread_should_stop(struct task_struct *tsk)
{
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9b3c19f..d6d63c6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -179,6 +179,24 @@ int kthread_stop(struct task_struct *tsk)
}
EXPORT_SYMBOL(kthread_stop);
+/**
+ * kthread_end - signal ...I don't think having to parallel APIs is a good idea, people will get utterly confused which one to use. Better always grab a reference in kthread_create and drop it in kthread_stop. For normal thread no change in behaviour and only slightly more code in the slowpath. Of course it will need an audit for half-assed kthread conversion first to avoid task_struct reference count leaks. In addition to that kthrad_end implementation look wrong. When the kthread has exited prematurely no one will call complete on kthread_stop_info.done before it's been setup. Interestingly the comment there indicates someone thought about threads exiting early, but it became defunkt during all the rewrites of the -
I *am* already confused... a driver of mine does:
static __init int thkd_init(void)
{
touch_task = kthread_run(touch_thread, Device, "thkd");
...
}
and
static __exit void thkd_exit(void)
{
kthread_stop(touch_task);
/* I bet something is missing */
}
now what good would kthread_run do me?
Jan
--
-Please read the kerneldoc documentation for kthread_create -
In that case it is better to grab a reference in kthread(). This also close the race when a new thread is woken (freezer) and exits before This is not true anymore, see another patch from Eric kthread-enhance-kthread_stop-to-abort-interruptible-sleeps.patch Oleg. -
Ok. Thinking about it I agree with Christoph that parallel API's can be a problem. However we do still need to support kernel threads where kthread_stop will never be called. There appear to be a few legitimate cases where someone wants to fire off a thread and have it do some work but don't care at all for stopping it before it is done. So I propose we add a kthread_orphan as a basic primitive to decrement the count on the task_struct if we want a kthread to simply exit after it has done some work. And as a helper function we can have a kthread_run_orphan. I think having a kthread_orphan will document what we are doing better and make it easy to find kernel threads that don't use kthread_stop. The pain is that this requires an audit of all kernel kthread creators so that we call kthread_orphan on the right ones, or else we will have a task_struct leak. At least that isn't a fatal condition. Eric -
Speaking about helpers, could we also add kthread_start(), which should be used instead of direct wake_up_process() ? Not that it is terribly important, but still. Note that "kthread_create() pins the task_struct" allows us to cleanup the code. Look at this ugly "wait_to_die:" label in migration_thread(). Is is because migration_thread() can't exit until CPU_DEAD reaps it. Other reasons were already solved by kthread-enhance-kthread_stop-to-abort-interruptible-sleeps.patch Oleg. -
There's two cases where it's valid that kthread_stop is not called:
a) the user is always builtin and the thread runs until the kernel halts.
examples: voyager, arm ecard
b) the thread is normally started/stopped, e.g. at module_init/module_exit
but there is some reason why it could terminate earlier.
examples: the various bluetooth threads, nfs-related threads that
can be killed using signals
c) we have some kind of asynchronous helper thread.
examples: various s390 drivers, usbatm, therm_pm72
d) a driver has threadpools were we need to start/stop threads on demand.
examples: nfsd, xpc
case a)
is trivial, we can just ignore the refcounting issue.
case b)
is what refcounting the task struct and proper handling in
kthread_stop will deal with.
case c)
should get a new kthread_create_async api which starts a thread
without blocking, so we can get rid of the workqueues in the
s390 drivers. it should probably also be safe to be called from
irq context. What makes this a bit complicated is the need to
make sure no more thread is running in case the caller terminates
(shutdown of the structure it's associated with or module removal)
case d)
should be deal with with a kthread_pool api
-yes. we need something like :
- while (!atomic_read(&s->killed)) {
+ while (1) {
try_to_freeze();
set_current_state(TASK_INTERRUPTIBLE);
+ if (atomic_read(&s->killed))
+ break;
+
but we can't just change it, can we ? i could be used by a user space tool
to check if the thread is running.
C.
-here's the refreshed version not taking into account the space in its
kernel thread name.
C.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
---
net/bluetooth/bnep/core.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
Index: 2.6.21-rc6-mm1/net/bluetooth/bnep/core.c
===================================================================
--- 2.6.21-rc6-mm1.orig/net/bluetooth/bnep/core.c
+++ 2.6.21-rc6-mm1/net/bluetooth/bnep/core.c
@@ -47,6 +47,7 @@
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
+#include <linux/kthread.h>
#include <asm/unaligned.h>
@@ -473,16 +474,18 @@ static int bnep_session(void *arg)
BT_DBG("");
- daemonize("kbnepd %s", dev->name);
set_user_nice(current, -15);
init_waitqueue_entry(&wait, current);
add_wait_queue(sk->sk_sleep, &wait);
- while (!atomic_read(&s->killed)) {
+ while (1) {
try_to_freeze();
set_current_state(TASK_INTERRUPTIBLE);
+ if (atomic_read(&s->killed))
+ break;
+
// RX
while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
skb_orphan(skb);
@@ -539,6 +542,7 @@ static struct device *bnep_get_device(st
int bnep_add_connection(struct bnep_connadd_req *req, struct socket *sock)
{
+ struct task_struct *task;
struct net_device *dev;
struct bnep_session *s, *ss;
u8 dst[ETH_ALEN], src[ETH_ALEN];
@@ -598,9 +602,10 @@ int bnep_add_connection(struct bnep_conn
__bnep_link_session(s);
- err = kernel_thread(bnep_session, s, CLONE_KERNEL);
- if (err < 0) {
- /* Session thread start failed, gotta cleanup. */
+ task = kthread_run(bnep_session, s, "kbnepd %s", dev->name);
+ if (IS_ERR(task)) {
+ /* Session thread start failed, gotta cleanup. */
+ err = PTR_ERR(task);
unregister_netdev(dev);
__bnep_unlink_session(s);
goto failed;
-Hello, But all kernel threads are supposed to be only *in-kernel* implementation details. Isn't a userspace tool whose behaviour relies on the existence (or even the knowledge of the existence) of any Yes, so although userspace shouldn't be bothering with kernel threads in the first place, that does not mean that such tools do not exist. So we'll have to live with this (unfortunate) naming for some time, till we can get rid of it later. Which is similar to the habit of some kernel threads in there that actually *do* want to export the knowledge of their existence (and even a signals-based interface!) to userspace. Eric did receive some nacks on his patches that tried to remove the signals business from kernel threads on this account, but perhaps that too is something that we could get rid of later (hopefully by that time those using signals in kernel threads would have realized their folly and shifted to something else :-) Satyam -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the startup of the media_bay_task
to use kthread_run and not a combination of kernel_thread,
deamonize and sigfillset.
In addition since we now always want to ignore signals
the MB_IGNORE_SIGNALS define is removed along with the
test for signal_pending.
The result is slightly simpler code that is more
maintainable.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/macintosh/mediabay.c | 11 ++---------
1 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/drivers/macintosh/mediabay.c b/drivers/macintosh/mediabay.c
index c803d2b..90c853e 100644
--- a/drivers/macintosh/mediabay.c
+++ b/drivers/macintosh/mediabay.c
@@ -20,6 +20,7 @@
#include <linux/stddef.h>
#include <linux/init.h>
#include <linux/ide.h>
+#include <linux/kthread.h>
#include <asm/prom.h>
#include <asm/pgtable.h>
#include <asm/io.h>
@@ -35,7 +36,6 @@
#define MB_DEBUG
-#define MB_IGNORE_SIGNALS
#ifdef MB_DEBUG
#define MBDBG(fmt, arg...) printk(KERN_INFO fmt , ## arg)
@@ -622,11 +622,6 @@ static int media_bay_task(void *x)
{
int i;
- strcpy(current->comm, "media-bay");
-#ifdef MB_IGNORE_SIGNALS
- sigfillset(&current->blocked);
-#endif
-
for (;;) {
for (i = 0; i < media_bay_count; ++i) {
down(&media_bays[i].lock);
@@ -636,8 +631,6 @@ static int media_bay_task(void *x)
}
msleep_interruptible(MB_POLL_DELAY);
- if (signal_pending(current))
- return 0;
}
}
@@ -699,7 +692,7 @@ static int __devinit media_bay_attach(struct macio_dev *mdev, const struct of_de
/* Startup kernel thread */
if (i == 0)
- kernel_thread(media_bay_task, NULL, CLONE_KERNEL);
+ kthread_run(media_bay_task, NULL, "media-bay");
return 0;
--
1.5.0.g53756
-On Thu, 19 Apr 2007 01:58:50 -0600 Looks OK - there's no way of stopping the kernel thread anyway. It appears that nobody has tried to use this driver at the same time as software-suspend. At least, not successfully. A strategic try_to_freeze() should fix it. This will become (a little) more serious when cpu hotplug is switched to use the process freezer, and perhaps it breaks kprobes already. -
I'll dig a box with that hardware and do some tests, but it looks nice. Thanks Eric ! There should be no problem with cpu hotplug, the only machines using the media bay driver are old Apple laptops with only one CPU and no HW threads. Ben. -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the startup of rtasd to use kthread_run instaed of
a combination of kernel_thread and daemonize. Making the code a little
simpler and more maintainble.
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
arch/powerpc/platforms/pseries/rtasd.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/rtasd.c b/arch/powerpc/platforms/pseries/rtasd.c
index 77d0937..919a374 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -20,6 +20,7 @@
#include <linux/spinlock.h>
#include <linux/cpu.h>
#include <linux/delay.h>
+#include <linux/kthread.h>
#include <asm/uaccess.h>
#include <asm/io.h>
@@ -429,8 +430,6 @@ static int rtasd(void *unused)
int event_scan = rtas_token("event-scan");
int rc;
- daemonize("rtasd");
-
if (event_scan == RTAS_UNKNOWN_SERVICE || get_eventscan_parms() == -1)
goto error;
@@ -497,7 +496,7 @@ static int __init rtas_init(void)
else
printk(KERN_ERR "Failed to create error_log proc entry\n");
- if (kernel_thread(rtasd, NULL, CLONE_FS) < 0)
+ if (IS_ERR(kthread_run(rtasd, NULL, "rtasd")))
printk(KERN_ERR "Failed to start RTAS daemon\n");
return 0;
--
1.5.0.g53756
-Looks okay, but I have some questions about the original code. Why does the driver only check if it really needs to run in the thread and calls vmalloc from it? If we did all these in the initialization function we could actually properly unwind. -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the kcmptd_ctr_%d daemon using kthread_run
instead of a combination of kernel_thread and daemonize making
the code a little simpler and more maintainable.
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
net/bluetooth/cmtp/core.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index e1b9db9..993303f 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -35,6 +35,7 @@
#include <linux/file.h>
#include <linux/init.h>
#include <linux/freezer.h>
+#include <linux/kthread.h>
#include <net/sock.h>
#include <linux/isdn/capilli.h>
@@ -286,7 +287,6 @@ static int cmtp_session(void *arg)
BT_DBG("session %p", session);
- daemonize("kcmtpd_ctr_%d", session->num);
set_user_nice(current, -15);
init_waitqueue_entry(&wait, current);
@@ -329,6 +329,7 @@ static int cmtp_session(void *arg)
int cmtp_add_connection(struct cmtp_connadd_req *req, struct socket *sock)
{
struct cmtp_session *session, *s;
+ struct task_struct *task;
bdaddr_t src, dst;
int i, err;
@@ -375,8 +376,9 @@ int cmtp_add_connection(struct cmtp_connadd_req *req, struct socket *sock)
__cmtp_link_session(session);
- err = kernel_thread(cmtp_session, session, CLONE_KERNEL);
- if (err < 0)
+ task = kthread_run(cmtp_session, session, "kcmtpd_ctr_%d", session->num);
+ err = PTR_ERR(task);
+ if (IS_ERR(task))
goto unlink;
if (!(session->flags & (1 << CMTP_LOOPBACK))) {
--
1.5.0.g53756
-From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the startup of kafscmd, kafsasyncd, and kafstimod
to use kthread_run instead of a combination of kernel_thread and
daemonize making the code slightly simpler and more maintainable.
In addition since by default all signals are ignored when delivered
to a kernel thread the code to flush signals has been removed.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/afs/cmservice.c | 10 +++++-----
fs/afs/internal.h | 11 -----------
fs/afs/kafsasyncd.c | 17 ++++++-----------
fs/afs/kafstimod.c | 16 ++++++----------
4 files changed, 17 insertions(+), 37 deletions(-)
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 3d097fd..f7e2355 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -13,6 +13,7 @@
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/completion.h>
+#include <linux/kthread.h>
#include "server.h"
#include "cell.h"
#include "transport.h"
@@ -120,8 +121,6 @@ static int kafscmd(void *arg)
printk(KERN_INFO "kAFS: Started kafscmd %d\n", current->pid);
- daemonize("kafscmd");
-
complete(&kafscmd_alive);
/* loop around looking for things to attend to */
@@ -133,7 +132,6 @@ static int kafscmd(void *arg)
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
if (!list_empty(&kafscmd_attention_list) ||
- signal_pending(current) ||
kafscmd_die)
break;
@@ -297,8 +295,10 @@ int afscm_start(void)
down_write(&afscm_sem);
if (!afscm_usage) {
- ret = kernel_thread(kafscmd, NULL, 0);
- if (ret < 0)
+ struct task_struct *task;
+ task = kthread_run(kafscmd, NULL, "kafscmd");
+ ret = PTR_ERR(task);
+ if (IS_ERR(task))
goto out;
wait_for_completion(&kafscmd_alive);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 5151d5d..2d667b7 100644
--- a/fs/afs/internal....Please drop this patch for the moment as I have my own patches to convert them to keventd-type threads, in addition to implementing a host of other changes. David -
From: Eric W. Biederman <ebiederm@xmission.com>
This patch modifies the startup of krxtimod, krxiod, and krxsecd
to use kthread_run instead of a combination of kernel_thread
and daemonize making the code slightly simpler and more maintainable.
In addition since by default all signals are ignored when delivered
to a kernel thread the code to flush signals has been removed.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
net/rxrpc/internal.h | 11 -----------
net/rxrpc/krxiod.c | 16 ++++++++--------
net/rxrpc/krxsecd.c | 16 ++++++++--------
net/rxrpc/krxtimod.c | 15 ++++++---------
4 files changed, 22 insertions(+), 36 deletions(-)
diff --git a/net/rxrpc/internal.h b/net/rxrpc/internal.h
index cc0c579..1dd69aa 100644
--- a/net/rxrpc/internal.h
+++ b/net/rxrpc/internal.h
@@ -49,17 +49,6 @@ __RXACCT_DECL(extern atomic_t rxrpc_message_count);
#define _net(FMT, a...) do { if (rxrpc_knet) knet (FMT , ##a); } while(0)
#endif
-static inline void rxrpc_discard_my_signals(void)
-{
- while (signal_pending(current)) {
- siginfo_t sinfo;
-
- spin_lock_irq(&current->sighand->siglock);
- dequeue_signal(current, &current->blocked, &sinfo);
- spin_unlock_irq(&current->sighand->siglock);
- }
-}
-
/*
* call.c
*/
diff --git a/net/rxrpc/krxiod.c b/net/rxrpc/krxiod.c
index bbbcd6c..c590ccd 100644
--- a/net/rxrpc/krxiod.c
+++ b/net/rxrpc/krxiod.c
@@ -14,6 +14,7 @@
#include <linux/spinlock.h>
#include <linux/init.h>
#include <linux/freezer.h>
+#include <linux/kthread.h>
#include <rxrpc/krxiod.h>
#include <rxrpc/transport.h>
#include <rxrpc/peer.h>
@@ -43,8 +44,6 @@ static int rxrpc_krxiod(void *arg)
printk("Started krxiod %d\n",current->pid);
- daemonize("krxiod");
-
/* loop around waiting for work to do */
do {
/* wait for work or to be told to exit */
@@ -57,8 +56,7 @@ static i...Again, please drop in favour of my RxRPC patches. David -
On Thu, 19 Apr 2007 10:32:38 +0100 Do those patches convert all this code over to full use of the kthread API? Because it seems that a conversion would be straightforward, and is needed. -
No. They delete all that code entirely and use workqueues instead. So, I suppose merging Eric's patches first should be a simple matter of just deleting his revised code instead. David -
What is the ETA on your patches? Eric -
That depends on Dave Miller now, I think. I'm assuming they need to go through the network GIT tree to get to Linus. Certainly Andrew Morton seems to think so. David -
From: David Howells <dhowells@redhat.com> I applied already the patches I thought were appropriate, you had some crypto layer changes that you need to work out with Herbert Xu before the rest can be applied. Let me know what the status is of that stuff. Either way, I don't think it should block Eric's work here. If his stuff is more ready first, it should go in and then you have a little bit of patch merging to do that's all. -
Should the rest of it go via Andrew's tree then? David -
