POSIX and SUS say that setpriority(PRIO_PROCESS) should affect
all threads in the process:
SYNOPSIS
#include <sys/resource.h>
int getpriority(int which, id_t who);
int setpriority(int which, id_t who, int value);
DESCRIPTION
...
The nice value set with setpriority() shall be applied to
the process. If the process is multi-threaded, the nice value
shall affect all system scope threads in the process.
Currently, this is not the case. While PRIO_PGRP and PRIO_USER
do set priority to the selected group of processes, PRIO_PROCESS
sets priority only for the thread with tid == who.
This mostly goes unnoticed because single-threaded processes have
tid == pid.
However, in multi-threaded processes
setpriority(PRIO_PROCESS, getpid(), value)
sets new priority only for the thread with tid == pid, leaving
all other threads unaffected. This is wrong.
Attached patch changes setpriority(PRIO_PROCESS) to set priority
for all threads with selected pid. getpriority is changed accordingly,
to return the (numerical) max of all threads' priority.
In order to allow priority of individual threads to be manipulated,
patch adds PRIO_THREAD which acts on single thread, always.
Since there may be programs which use the fact that
setpriority(PRIO_PROCESS, tid, value)
prior to this patch was setting priority for selected thread,
this behavior is retained in case when tid != pid.
IOW: with PRIO_PROCESS, if pid specifies a thread group leader,
all threads' prios are set. Otherwise, only selected thread's priority
is set. (Alternative can be to just fail with ESRCH).
getpriority() is acting similarly.
Patch is run tested. I will post test program etc as a reply.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda
diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
...Test program is attached. Build with "gcc -o prio_vda3 prio_vda3.c -lpthread". Output: start test program CHILD1: TID:3269 getpriority:0 CHILD1: PID:3269 getpriority:0 setpriority(PRIO_PROCESS, 3269, -10) THREAD CHILD: TID:3270 getpriority:-2 THREAD CHILD: PID:3269 getpriority:-10 THREAD CHILD: TID:3271 getpriority:-4 THREAD CHILD: PID:3269 getpriority:-10 THREAD CHILD: TID:3272 getpriority:-6 THREAD CHILD: PID:3269 getpriority:-10 THREAD CHILD: TID:3273 getpriority:-8 THREAD CHILD: PID:3269 getpriority:-10 CHILD2: TID:3269 getpriority:-10 CHILD2: PID:3269 getpriority:-10 Joined! Joined! Joined! Joined! log of instrumented kernel: (no threads created yet, only one tid exists) getpriority: PRIO_PROCESS: pid 3269 tgid 3269: niceval=20 PRIO_PROCESS: returning retval=20 PRIO_PROCESS: pid 3269 tgid 3269: niceval=20 PRIO_PROCESS: returning retval=20 (5 threads are created) setpriority(PRIO_PROCESS, pid, -10): setting prio of pid 3269 tgid 3269 to -10 setting prio of pid 3270 tgid 3269 to -10 setting prio of pid 3271 tgid 3269 to -10 setting prio of pid 3272 tgid 3269 to -10 setting prio of pid 3273 tgid 3269 to -10 setpriority(3 /*PRIO_THREAD*/, syscall(SYS_gettid), -1); setting prio of pid 3270 tgid 3269 to -1 setpriority(PRIO_PROCESS, syscall(SYS_gettid), -2); setting prio of pid 3270 tgid 3269 to -2 Above you see how setpriority(PRIO_PROCESS) can set prio for all threads or only for one. prio = getpriority(PRIO_PROCESS, tid); PRIO_PROCESS: pid 3270 tgid 3269: niceval=22 PRIO_PROCESS: returning retval=22 prio = getpriority(PRIO_PROCESS, pid); PRIO_PROCESS: pid 3269 tgid 3269: niceval=30 PRIO_PROCESS: pid 3270 tgid 3269: niceval=22 PRIO_PROCESS: pid 3271 tgid 3269: niceval=30 PRIO_PROCESS: pid 3272 tgid 3269: niceval=30 PRIO_PROCESS: pid 3273 tgid 3269: niceval=30 PRIO_PROCESS: returning retval=30 Same for getpriority. The rest is analogous: setting prio of pid 3271 tgid 3269 to -3 setting prio of pid 3271 tgid 3269 to ...
Looks like Evolution word-wrapped the patch. Let me try again.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda
diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
#define PRIO_PROCESS 0
#define PRIO_PGRP 1
#define PRIO_USER 2
+#define PRIO_THREAD 3
/*
* Limit the stack by to some sane default: root can always
diff --git a/kernel/sys.c b/kernel/sys.c
index 038a7bc..d339c1a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -142,9 +142,9 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
struct task_struct *g, *p;
struct user_struct *user;
int error = -EINVAL;
- struct pid *pgrp;
+ struct pid *pgrp, *pid;
- if (which > PRIO_USER || which < PRIO_PROCESS)
+ if (which > PRIO_THREAD || which < PRIO_PROCESS)
goto out;
/* normalize: avoid signed division (rounding problems) */
@@ -156,7 +156,7 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
read_lock(&tasklist_lock);
switch (which) {
- case PRIO_PROCESS:
+ case PRIO_THREAD:
if (who)
p = find_task_by_vpid(who);
else
@@ -164,6 +164,19 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
if (p)
error = set_one_prio(p, niceval, error);
break;
+ case PRIO_PROCESS:
+ if (who)
+ pid = find_vpid(who);
+ else {
+ pid = task_pid(current);
+ who = current->pid;
+ }
+ do_each_pid_thread(pid, PIDTYPE_PID, p) {
+ if (who == p->pid || who == p->tgid) {
+ error = set_one_prio(p, niceval, error);
+ }
+ } while_each_pid_thread(pid, PIDTYPE_PID, p);
+ break;
case PRIO_PGRP:
if (who)
pgrp = find_vpid(who);
@@ -206,14 +219,14 @@ asmlinkage long sys_getpriority(int which, int who)
struct task_struct *g, *p;
struct user_struct *user;
long niceval, retval = -ESRCH;
- struct pid *pgrp;
+ struct pid *pgrp, *pid;
...Patch looks simple enough, although a few comments below. Also, I guess the glibc people (Ulrich added to CC) might have an I worry about destroying the return value here, support one thread --
Ok - got fooled by this funny set_one_prio() function. It passes the old error value and maintains it if no new error occurs (except for -ESRCH, but I guess people know wth they're doing). So I'll retract my concern. --
Hmm. I think we should fail only if they all failed. Yes. This is analogous to what happens with PRIO_USER etc, no surprises here. -- vda --
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Looks good. I just wish we could get something similar for setuid and friends. Perhaps after David cred patches? - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAki8LVUACgkQ2ijCOnn/RHTWYQCePQKadvZWkC8ZoH+DOHd/N+et KwAAoKxASCCigXbO67/rSwggRIkemPKw =MixL -----END PGP SIGNATURE----- --
Denys, Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Please do CC me on API changes, so that they might get documented in the man pages. Thanks for this work. (I'm not sure whether or not it's a response to my bug report, http://bugzilla.kernel.org/show_bug.cgi?id=6258 ) I tested your patch. Most things seem to work as I would expect, but there is one strangeness. I would expect setpriority(PRIO_PROCESS, getpid()) and setpriority(PRIO_PROCESS, 0) to have the same affect (because: which == PRIO_PRCESS, who == 0 conventionally means "the calling process"). But they do not: the latter call only changes the priority of the calling thread. Is this intended? My test program is at the end of this mail. Here are two tests demonstrating the behavior in each of the above cases: $ sudo ./thread_nice_test p p root's password: main(): TID = 6574, PID = 6574 main(): initial nice value is 0 Thread 1: TID = 6578 Thread 2: TID = 6579 Loop 0 process nice value is 0 main(): nice value is 0 Thread 1: nice value is 0 Thread 2: nice value is 0 Thread 1: setpriority() succeeded (0, 6574) Loop 1 process nice value is -10 main(): nice value is -10 Thread 1: nice value is -10 Thread 2: nice value is -10 Loop 2 process nice value is -10 main(): nice value is -10 Thread 1: nice value is -10 Thread 2: nice value is -10 $ sudo ./thread_nice_test p 0 main(): TID = 6590, PID = 6590 main(): initial nice value is 0 Thread 1: TID = 6594 Thread 2: TID = 6595 Loop 0 process nice value is 0 main(): nice value is 0 Thread 1: nice value is 0 Thread 2: nice value is 0 Thread 1: setpriority() succeeded (0, 0) Loop 1 process nice value is -10 main(): nice value is 0 Thread 1: nice value is -10 Thread 2: nice value is 0 Loop 2 process nice value is -10 main(): nice value is 0 Thread 1: nice value is -10 Thread 2: nice value is 0 ===== /* ...
The patch interprets 0 as the current pid rather than the current tgid. It's up for discussion whether we should preserve old behaviour when specifying 0, or use a new and arguably more logical behaviour but possibly break old apps. Chris --
Chris, AFAICS, interpreting 0 as the TGID would be more consistent with those parts of the interface that are specified by POSIX.1. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
I worked on it because of this bug report:
No, it was not intended. I think this error is here:
+ case PRIO_PROCESS:
+ if (who)
+ pid = find_vpid(who);
+ else {
+ pid = task_pid(current);
+ who = current->pid;
+ }
I was confused. ->pid is TID, had to use ->tgid to get PID.
The fix: replace
who = current->pid;
with
who = current->tgid;
There are two places where you need to do it.
Updated patch is below.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda
diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
#define PRIO_PROCESS 0
#define PRIO_PGRP 1
#define PRIO_USER 2
+#define PRIO_THREAD 3
/*
* Limit the stack by to some sane default: root can always
diff --git a/kernel/sys.c b/kernel/sys.c
index 038a7bc..d339c1a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -142,9 +142,9 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
struct task_struct *g, *p;
struct user_struct *user;
int error = -EINVAL;
- struct pid *pgrp;
+ struct pid *pgrp, *pid;
- if (which > PRIO_USER || which < PRIO_PROCESS)
+ if (which > PRIO_THREAD || which < PRIO_PROCESS)
goto out;
/* normalize: avoid signed division (rounding problems) */
@@ -156,7 +156,7 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
read_lock(&tasklist_lock);
switch (which) {
- case PRIO_PROCESS:
+ case PRIO_THREAD:
if (who)
p = find_task_by_vpid(who);
else
@@ -164,6 +164,19 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
if (p)
error = set_one_prio(p, niceval, error);
break;
+ case PRIO_PROCESS:
+ if (who)
+ pid = find_vpid(who);
+ else {
+ pid = ...Denys, (That report is inaccessible, although I have some kind of bugzilla (Indeed as I read Chris Freiesen's mail I almost tripped up with the same confusion. The source code names of these fields can easily lead Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> With your suggested change, things now work as I would expect. Thanks! Cheers, -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
Bad idea, you silently change the existing interface, and programs that used to work around the old Linux behvaiour silently break. Just keep PRIO_PROCESS as it was and add a new PRIO_TGROUP that does the Posix functionality for the whole thread group. Glibc can then implement the library-PRIO_PROCESS as PRIO_TGROUP for newly linked applications without breaking existing ones. --
