Re: [PATCH] make setpriority POSIX compliant; introduce PRIO_THREAD extension

Previous thread: IRQ routing / assignment issue by Radu Rendec on Monday, September 1, 2008 - 5:59 am. (1 message)

Next thread: mtrr madness by Lukas Hejtmanek on Monday, September 1, 2008 - 7:25 am. (1 message)
From: Denys Vlasenko
Date: Monday, September 1, 2008 - 7:12 am

POSIX and SUS say that setpriority(PRIO_PROCESS) should affect
all threads in the process:

SYNOPSIS
       #include <sys/resource.h>
       int getpriority(int which, id_t who);
       int setpriority(int which, id_t who, int value);
DESCRIPTION
       ...
       The  nice value set with setpriority() shall be applied to
       the process. If the process is multi-threaded, the nice value
       shall affect all system scope threads in the process.

Currently, this is not the case. While PRIO_PGRP and PRIO_USER
do set priority to the selected group of processes, PRIO_PROCESS
sets priority only for the thread with tid == who.

This mostly goes unnoticed because single-threaded processes have
tid == pid.

However, in multi-threaded processes

setpriority(PRIO_PROCESS, getpid(), value)

sets new priority only for the thread with tid == pid, leaving
all other threads unaffected. This is wrong.

Attached patch changes setpriority(PRIO_PROCESS) to set priority
for all threads with selected pid. getpriority is changed accordingly,
to return the (numerical) max of all threads' priority.

In order to allow priority of individual threads to be manipulated,
patch adds PRIO_THREAD which acts on single thread, always.

Since there may be programs which use the fact that

setpriority(PRIO_PROCESS, tid, value)

prior to this patch was setting priority for selected thread,
this behavior is retained in case when tid != pid.

IOW: with PRIO_PROCESS, if pid specifies a thread group leader,
all threads' prios are set. Otherwise, only selected thread's priority
is set. (Alternative can be to just fail with ESRCH).

getpriority() is acting similarly.

Patch is run tested. I will post test program etc as a reply.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda



diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
 ...
From: Denys Vlasenko
Date: Monday, September 1, 2008 - 7:17 am

Test program is attached.
Build with "gcc -o prio_vda3 prio_vda3.c -lpthread".
Output:

start test program
CHILD1: TID:3269 getpriority:0
CHILD1: PID:3269 getpriority:0
setpriority(PRIO_PROCESS, 3269, -10)
THREAD CHILD: TID:3270 getpriority:-2
THREAD CHILD: PID:3269 getpriority:-10
THREAD CHILD: TID:3271 getpriority:-4
THREAD CHILD: PID:3269 getpriority:-10
THREAD CHILD: TID:3272 getpriority:-6
THREAD CHILD: PID:3269 getpriority:-10
THREAD CHILD: TID:3273 getpriority:-8
THREAD CHILD: PID:3269 getpriority:-10
CHILD2: TID:3269 getpriority:-10
CHILD2: PID:3269 getpriority:-10
Joined!
Joined!
Joined!
Joined!

log of instrumented kernel:

 (no threads created yet, only one tid exists)
 getpriority:
PRIO_PROCESS: pid 3269 tgid 3269: niceval=20
PRIO_PROCESS: returning retval=20
PRIO_PROCESS: pid 3269 tgid 3269: niceval=20
PRIO_PROCESS: returning retval=20
 (5 threads are created)
 setpriority(PRIO_PROCESS, pid, -10):
setting prio of pid 3269 tgid 3269 to -10
setting prio of pid 3270 tgid 3269 to -10
setting prio of pid 3271 tgid 3269 to -10
setting prio of pid 3272 tgid 3269 to -10
setting prio of pid 3273 tgid 3269 to -10
 setpriority(3 /*PRIO_THREAD*/, syscall(SYS_gettid), -1);
setting prio of pid 3270 tgid 3269 to -1
 setpriority(PRIO_PROCESS, syscall(SYS_gettid), -2);
setting prio of pid 3270 tgid 3269 to -2

Above you see how setpriority(PRIO_PROCESS) can set prio for all threads
or only for one.

  prio = getpriority(PRIO_PROCESS, tid);
PRIO_PROCESS: pid 3270 tgid 3269: niceval=22
PRIO_PROCESS: returning retval=22
  prio = getpriority(PRIO_PROCESS, pid);
PRIO_PROCESS: pid 3269 tgid 3269: niceval=30
PRIO_PROCESS: pid 3270 tgid 3269: niceval=22
PRIO_PROCESS: pid 3271 tgid 3269: niceval=30
PRIO_PROCESS: pid 3272 tgid 3269: niceval=30
PRIO_PROCESS: pid 3273 tgid 3269: niceval=30
PRIO_PROCESS: returning retval=30

Same for getpriority.

The rest is analogous:

setting prio of pid 3271 tgid 3269 to -3
setting prio of pid 3271 tgid 3269 to ...
From: Denys Vlasenko
Date: Monday, September 1, 2008 - 7:42 am

Looks like Evolution word-wrapped the patch. Let me try again.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda


diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
 #define	PRIO_PROCESS	0
 #define	PRIO_PGRP	1
 #define	PRIO_USER	2
+#define	PRIO_THREAD	3
 
 /*
  * Limit the stack by to some sane default: root can always
diff --git a/kernel/sys.c b/kernel/sys.c
index 038a7bc..d339c1a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -142,9 +142,9 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	int error = -EINVAL;
-	struct pid *pgrp;
+	struct pid *pgrp, *pid;
 
-	if (which > PRIO_USER || which < PRIO_PROCESS)
+	if (which > PRIO_THREAD || which < PRIO_PROCESS)
 		goto out;
 
 	/* normalize: avoid signed division (rounding problems) */
@@ -156,7 +156,7 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
 
 	read_lock(&tasklist_lock);
 	switch (which) {
-		case PRIO_PROCESS:
+		case PRIO_THREAD:
 			if (who)
 				p = find_task_by_vpid(who);
 			else
@@ -164,6 +164,19 @@ asmlinkage long sys_setpriority(int which, int who, int niceval)
 			if (p)
 				error = set_one_prio(p, niceval, error);
 			break;
+		case PRIO_PROCESS:
+			if (who)
+				pid = find_vpid(who);
+			else {
+				pid = task_pid(current);
+				who = current->pid;
+			}
+			do_each_pid_thread(pid, PIDTYPE_PID, p) {
+				if (who == p->pid || who == p->tgid) {
+					error = set_one_prio(p, niceval, error);
+				}
+			} while_each_pid_thread(pid, PIDTYPE_PID, p);
+			break;
 		case PRIO_PGRP:
 			if (who)
 				pgrp = find_vpid(who);
@@ -206,14 +219,14 @@ asmlinkage long sys_getpriority(int which, int who)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	long niceval, retval = -ESRCH;
-	struct pid *pgrp;
+	struct pid *pgrp, *pid;
 ...
From: Peter Zijlstra
Date: Monday, September 1, 2008 - 8:08 am

Patch looks simple enough, although a few comments below.
Also, I guess the glibc people (Ulrich added to CC) might have an

I worry about destroying the return value here, support one thread


--

From: Peter Zijlstra
Date: Monday, September 1, 2008 - 8:19 am

Ok - got fooled by this funny set_one_prio() function. It passes the old
error value and maintains it if no new error occurs (except for -ESRCH,
but I guess people know wth they're doing).

So I'll retract my concern.

--

From: Denys Vlasenko
Date: Monday, September 1, 2008 - 8:20 am

Hmm. I think we should fail only if they all failed.

Yes. This is analogous to what happens with PRIO_USER etc,
no surprises here.
--
vda


--

From: Ulrich Drepper
Date: Monday, September 1, 2008 - 10:58 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Looks good.

I just wish we could get something similar for setuid and friends.
Perhaps after David cred patches?

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAki8LVUACgkQ2ijCOnn/RHTWYQCePQKadvZWkC8ZoH+DOHd/N+et
KwAAoKxASCCigXbO67/rSwggRIkemPKw
=MixL
-----END PGP SIGNATURE-----
--

From: Michael Kerrisk
Date: Tuesday, September 9, 2008 - 8:45 am

Denys,


Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>

Please do CC me on API changes, so that they might get documented in
the man pages.

Thanks for this work.  (I'm not sure whether or not it's a response to
my bug report, http://bugzilla.kernel.org/show_bug.cgi?id=6258 )

I tested your patch.  Most things seem to work as I would expect, but
there is one strangeness.

I would expect
setpriority(PRIO_PROCESS, getpid())
and
setpriority(PRIO_PROCESS, 0)
to have the same affect (because: which == PRIO_PRCESS, who == 0
conventionally means "the calling process").

But they do not: the latter call only changes the priority of the
calling thread.  Is this intended?

My test program is at the end of this mail.  Here are two tests
demonstrating the behavior in each of the above cases:

$ sudo ./thread_nice_test p p
root's password:
main(): TID = 6574, PID = 6574
main(): initial nice value is 0

Thread 1: TID = 6578
Thread 2: TID = 6579
Loop 0
    process nice value is 0
    main(): nice value is 0
    Thread 1: nice value is 0
    Thread 2: nice value is 0
Thread 1: setpriority() succeeded (0, 6574)
Loop 1
    process nice value is -10
    main(): nice value is -10
    Thread 1: nice value is -10
    Thread 2: nice value is -10
Loop 2
    process nice value is -10
    main(): nice value is -10
    Thread 1: nice value is -10
    Thread 2: nice value is -10


$ sudo ./thread_nice_test p 0
main(): TID = 6590, PID = 6590
main(): initial nice value is 0

Thread 1: TID = 6594
Thread 2: TID = 6595
Loop 0
    process nice value is 0
    main(): nice value is 0
    Thread 1: nice value is 0
    Thread 2: nice value is 0
Thread 1: setpriority() succeeded (0, 0)
Loop 1
    process nice value is -10
    main(): nice value is 0
    Thread 1: nice value is -10
    Thread 2: nice value is 0
Loop 2
    process nice value is -10
    main(): nice value is 0
    Thread 1: nice value is -10
    Thread 2: nice value is 0

=====

/* ...
From: Chris Friesen
Date: Tuesday, September 9, 2008 - 9:42 am

The patch interprets 0 as the current pid rather than the current tgid.

It's up for discussion whether we should preserve old behaviour when 
specifying 0, or use a new and arguably more logical behaviour but 
possibly break old apps.

Chris
--

From: Michael Kerrisk
Date: Tuesday, September 9, 2008 - 11:45 am

Chris,


AFAICS, interpreting 0 as the TGID would be more consistent with those
parts of the interface that are specified by POSIX.1.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--

From: Denys Vlasenko
Date: Wednesday, September 10, 2008 - 2:28 am

I worked on it because of this bug report:


No, it was not intended. I think this error is here:

+               case PRIO_PROCESS:
+                       if (who)
+                               pid = find_vpid(who);
+                       else {
+                               pid = task_pid(current);
+                               who = current->pid;
+                       }

I was confused. ->pid is TID, had to use ->tgid to get PID.
The fix: replace
who = current->pid;
with
who = current->tgid;
There are two places where you need to do it.

Updated patch is below.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--
vda



diff --git a/include/linux/resource.h b/include/linux/resource.h
index aaa423a..f292690 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -51,6 +51,7 @@ struct rlimit {
 #define	PRIO_PROCESS	0
 #define	PRIO_PGRP	1
 #define	PRIO_USER	2
+#define	PRIO_THREAD	3
 
 /*
  * Limit the stack by to some sane default: root can always
diff --git a/kernel/sys.c b/kernel/sys.c
index 038a7bc..d339c1a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -142,9 +142,9 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	int error = -EINVAL;
-	struct pid *pgrp;
+	struct pid *pgrp, *pid;
 
-	if (which > PRIO_USER || which < PRIO_PROCESS)
+	if (which > PRIO_THREAD || which < PRIO_PROCESS)
 		goto out;
 
 	/* normalize: avoid signed division (rounding problems) */
@@ -156,7 +156,7 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
 
 	read_lock(&tasklist_lock);
 	switch (which) {
-		case PRIO_PROCESS:
+		case PRIO_THREAD:
 			if (who)
 				p = find_task_by_vpid(who);
 			else
@@ -164,6 +164,19 @@ asmlinkage long sys_setpriority(int which, int who,
int niceval)
 			if (p)
 				error = set_one_prio(p, niceval, error);
 			break;
+		case PRIO_PROCESS:
+			if (who)
+				pid = find_vpid(who);
+			else {
+				pid = ...
From: Michael Kerrisk
Date: Wednesday, September 10, 2008 - 4:57 am

Denys,


(That report is inaccessible, although I have some kind of bugzilla

(Indeed as I read Chris Freiesen's mail I almost tripped up with the
same confusion.  The source code names of these fields can easily lead

Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>

With your suggested change, things now work as I would expect.  Thanks!

Cheers,



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--

From: Christoph Hellwig
Date: Wednesday, September 10, 2008 - 5:08 am

Bad idea, you silently change the existing interface, and programs that
used to work around the old Linux behvaiour silently break.  Just keep
PRIO_PROCESS as it was and add a new PRIO_TGROUP that does the Posix
functionality for the whole thread group.  Glibc can then implement
the library-PRIO_PROCESS as PRIO_TGROUP for newly linked applications
without breaking existing ones.

--

Previous thread: IRQ routing / assignment issue by Radu Rendec on Monday, September 1, 2008 - 5:59 am. (1 message)

Next thread: mtrr madness by Lukas Hejtmanek on Monday, September 1, 2008 - 7:25 am. (1 message)