Re: [RFC/PATCH -v2] Add sysfs control to modify a user's cpu share

Previous thread: [ANNOUNCE] The Linux Test Project has been Released for SEPTEMBER 2007 by Subrata Modak on Monday, October 1, 2007 - 6:57 am. (1 message)

Next thread: [PATCH] Sysace: Labels in C code should not be indented. by Grant Likely on Monday, October 1, 2007 - 7:19 am. (1 message)
From: Dhaval Giani
Date: Monday, October 1, 2007 - 7:04 am

Hi Ingo,

Adds tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

	/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an 
administrator is allowed to modify a user's cpu share.

Ex: 
	# cd /sys/kernel/uids/
	# cat 512/cpu_share
	1024
	# echo 2048 > 512/cpu_share
	# cat 512/cpu_share
	2048
	#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
 include/linux/sched.h |   11 +++++
 kernel/ksysfs.c       |    4 +
 kernel/sched.c        |    5 ++
 kernel/user.c         |  110 +++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 129 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc8-sched-devel/include/linux/sched.h
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/include/linux/sched.h
+++ linux-2.6.23-rc8-sched-devel/include/linux/sched.h
@@ -86,6 +86,7 @@ struct sched_param {
 #include <linux/timer.h>
 #include <linux/hrtimer.h>
 #include <linux/task_io_accounting.h>
+#include <linux/kobject.h>
 
 #include <asm/processor.h>
 
@@ -598,9 +599,18 @@ struct user_struct {
 
 #ifdef CONFIG_FAIR_USER_SCHED
 	struct task_group *tg;
+ 	struct kset kset;
+ 	struct subsys_attribute user_attr;
+ 	struct work_struct work;
 #endif
 };
 
+#ifdef CONFIG_FAIR_USER_SCHED
+extern int uids_kobject_init(void);
+#else
+static inline int uids_kobject_init(void) { return 0; }
+#endif
+
 extern struct user_struct *find_user(uid_t);
 
 extern struct user_struct root_user;
@@ -1846,6 +1856,7 @@ extern struct task_group *sched_create_g
 extern void sched_destroy_group(struct task_group *tg);
 extern void sched_move_task(struct task_struct *tsk);
 extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
+extern unsigned long ...
From: Ingo Molnar
Date: Monday, October 1, 2007 - 7:44 am

hi Dhaval,


looks good to me! I think this API is pretty straightforward. I've put 
this into my tree and have updated the sched-devel git tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

but there seems to be a bug in it, i get this:

 kobject_add failed for 500 with -EEXIST, don't try to register things with the same name in the same directory.
  [<c01e1730>] kobject_shadow_add+0x13b/0x169
  [<c01e1795>] kobject_set_name+0x28/0x91
  [<c01299d5>] user_kobject_create+0x6a/0x90
  [<c0129d3c>] alloc_uid+0x141/0x181
  [<c012d05c>] set_user+0x1c/0x91
  [<c012e4c7>] sys_setresuid+0xd9/0x17d
  [<c0103cf6>] sysenter_past_esp+0x5f/0x85
  =======================

	Ingo
-

From: Srivatsa Vaddagiri
Date: Monday, October 1, 2007 - 8:32 am

Looks like a remove_dir/create_dir race ..

	free_user()			alloc_uid()

	   uid_hash_remove(up);

					uid_hash_insert(new, hashent);

	   schedule_work()
					user_kobject_create();
					kobject_add();	-> Boom!

	   remove_user_sysfs_dir();

/me looks up sysfs programming examples ..

-- 
Regards,
vatsa
-

From: Eric St-Laurent
Date: Tuesday, October 2, 2007 - 3:12 pm

While a sysfs interface is OK and somewhat orthogonal to the interface
proposed the containers patches, I think maybe a new syscall should be
considered.

Since we now have a fair share cpu scheduler, maybe an interface to
specify the cpu share directly (alternatively to priority) make sense.

For processes, it may become more intuitive (and precise) to set the
processing share directly than setting a priority which is converted to
a share.

Maybe something similar to ioprio_set() and ioprio_get() syscalls:

- per user cpu share
- per user group cpu share
- per process cpu share
- per process group cpu share


Best regards,

- Eric


-

From: Srivatsa Vaddagiri
Date: Tuesday, October 2, 2007 - 9:09 pm

We had discussed syscall vs filesystem based interface for resource
management [1] and there was a heavy bias favoring filesystem based interface,
based on which the container (now "cgroup") filesystem evolved.

Where we already have one interface defined, I would be against adding 
an equivalent syscall interface.

Note that this "fair-user" scheduling can in theory be accomplished
using the same cgroup based interface, but requires some extra setup in
userspace (either to run a daemon which moves tasks to appropriate
control groups/containers upon their uid change OR to modify initrd to mount 
cgroup filesystem at early bootup time). I expect most distros to enable
CONFIG_FAIR_CGROUP_SCHED (control group based fair group scheduler) and not 
CONFIG_FAIR_USER_SHCED (user id based fair group scheduler). The only
reason why we are providing CONFIG_FAIR_USER_SCHED and the associated
sysfs interface is to help test group scheduler w/o requiring knowledge
of cgroup filesystem.

Reference:

1. http://marc.info/?l=linux-kernel&m=116231242201300&w=2

-- 
Regards,
vatsa
-

From: Dhaval Giani
Date: Wednesday, October 3, 2007 - 10:10 am

Hi Ingo,

Can you please drop commit b1add858a10cece3a68b2d8cb9e7350843700a58 (last
version of this patch) and try this instead?
---
Adds tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

	/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
	# cd /sys/kernel/uids/
	# cat 512/cpu_share
	1024
	# echo 2048 > 512/cpu_share
	# cat 512/cpu_share
	2048
	#

Changelog since v1:
1. Added a mutex to serialize directory creation/destruction for a user in
   sysfs
2. Added a spinlock in the task_group structure to serialize writes to
   tg->shares.
3. Removed /proc/root_user_cpu_shares.
4. Added Documentation about the group scheduler.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
 Documentation/sched-design-CFS.txt |   67 ++++++++++
 include/linux/sched.h              |   11 +
 kernel/ksysfs.c                    |    8 +
 kernel/sched.c                     |   14 +-
 kernel/sched_debug.c               |   48 -------
 kernel/user.c                      |  242 ++++++++++++++++++++++++++++++++-----
 6 files changed, 310 insertions(+), 80 deletions(-)

Index: linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/Documentation/sched-design-CFS.txt
+++ linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
@@ -117,3 +117,70 @@ Some implementation details:
    iterators of the scheduling modules are used. The balancing code got
    quite a bit simpler as a result.
 
+
+Group scheduler extension to CFS
+================================
+
+Normally the scheduler operates on individual tasks and strives to provide
+fair CPU time to each task. ...
From: Ingo Molnar
Date: Thursday, October 4, 2007 - 12:57 am

thanks - I have added this to the queue.

i'm wondering about the following: could not (yet) existing UIDs be made 
configurable too? I.e. if i do this in a bootup script:

  echo 2048 > /sys/kernel/uids/500/cpu_share

this should just work too, regardless of there not being any UID 500 
tasks yet. Likewise, once configured, the /sys/kernel/uids/* directories 
(with the settings in them) should probably not go away either.

	Ingo
-

From: Heiko Carstens
Date: Thursday, October 4, 2007 - 1:54 am

Shouldn't that be done via uevents? E.g. UID x gets added to the sysfs tree,
generates a uevent and a script then figures out the cpu_share and sets it.
That way you also don't need to keep the directories. No?
-

From: Bill Davidsen
Date: Thursday, October 4, 2007 - 9:02 am

That sounds complex administratively. It means that instead of setting a 
higher or lower than default once and having it persist until reboot I 
have to create a script, which *will* in some cases be left in place 
even after the need has gone.

I agree with Ingo, once it's done it should be persistent.

And as another administrative convenience I can look at that set of 
values and see what shares are being used, even when the user is not 
currently active.

Final question, how do setuid processes map into this implementation?

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

From: Srivatsa Vaddagiri
Date: Thursday, October 4, 2007 - 10:20 am

Although the need seems very real, I am thinking about the implementation 
aspect of this in the kernel i.e how will this be implementable?

1. The current patch proposes a sysfs based interface, where a new
   directory is created for every new user created who logs into the 
   system. To meet the requirement Ingo suggested, it would require the
   ability to create directories in sysfs in advance of (user_struct) objects 
   that aren't yet there - which is not possible to implement in sysfs
   afaik

2. configfs seems to allow creation of directories (i.e kernel objects) from 
   userland. Every new directory created should translate to a
   user_struct object being created in the kernel (and inserted in
   uid_hash table). Would this be acceptable?

Also, IMHO, CONFIG_FAIR_USER_SCHED is there only as a toy, to test fair group 
scheduling and I expect distros to support CONFIG_FAIR_CGROUP_SCHED instead 
which allows "control group" (or process containers) based fair group 
scheduling.

Using CONFIG_FAIR_CGROUP_SCHED it is still possible to provide user-id based 
fair group scheduling, in two ways:

	1. Have a daemon which listens for UID change events
	   (PROC_EVENT_UID) and move the task to appropriate "control
	   groups" and set the "control group" shares
	2. Implement a "user" subsystem registered with "cgroup" core,
  	   which automatically creates new "control groups" whenever
 	   a new user is being added to the system. This is very similar
	   to "ns" subsystem (kernel/ns_cgroup.c in -mm tree).
 	   Thus in order to provide fair user scheduling with this option,
	   distro needs to modify initrd to:

		# mkdir /dev/usercpucontrol
		# mount -t cgroup -ouser,cpu none /dev/usercpucontrol

Using a combination of these two options and a /etc configuration file
which specifies the cpu shares to be given to a user, it should be

We seem to be going by the real uid of a task (which is what tsk->user
points at) to decide its CPU bandwidth. Is that a cause of ...
From: Valdis.Kletnieks
Date: Thursday, October 4, 2007 - 2:32 pm

That would tend to be a tad racy - a site may want to set limits in the
hypothetical /sys/kernel/uids/NNN before the program has a chance to fork-bomb
or otherwise make it difficult to set a limitfrom within another userspace
process.  It's similar to why you want a process to be launched with all its
ulimit's set, rather than set them after the fork/exec happens...

From: Srivatsa Vaddagiri
Date: Friday, October 5, 2007 - 12:01 am

Note that there is a default limit enforced on every new user (1024
value for the user's cpu share). This limit should contain any fork-bomb that





-- 
Regards,
vatsa
-

From: Srivatsa Vaddagiri
Date: Tuesday, October 9, 2007 - 8:12 am

Heiko,
	Thanks for the hint. Here's a patch to enable generation of
uevents for user creation/deletion. These uevents can be handled in
userspace to configure a new user's cpu share.

Note : After bootup I could test that new user's cpu share is configured
as per a configuration file (/etc/user_cpu_share.conf). However this
mechanism didnt work for root user. Perhaps uevent for root user is
generated way too early?

A HOWTO text file is also attached explaining how to make use of these
uevents in userspace.

Ingo,
	This patch applies on top of latest sched-devel tree. Pls review
and apply ..


---

Generate uevents when a user is being created/destroyed. These events
could be used to configure cpu share of a new user.

Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by : Dhaval Giani <dhaval@linux.vnet.ibm.com>


---
 kernel/user.c |    4 ++++
 1 files changed, 4 insertions(+)

Index: current/kernel/user.c
===================================================================
--- current.orig/kernel/user.c
+++ current/kernel/user.c
@@ -174,6 +174,8 @@ static int user_kobject_create(struct us
 	if (error)
 		kobject_del(kobj);
 
+	kobject_uevent(kobj, KOBJ_ADD);
+
 done:
 	return error;
 }
@@ -189,6 +191,7 @@ int __init uids_kobject_init(void)
 
 	/* create under /sys/kernel dir */
 	uids_kobject.parent = &kernel_subsys.kobj;
+	uids_kobject.kset = &kernel_subsys;
 	kobject_set_name(&uids_kobject, "uids");
 	kobject_init(&uids_kobject);
 
@@ -228,6 +231,7 @@ static void remove_user_sysfs_dir(struct
 		goto done;
 
 	sysfs_remove_file(kobj, &up->user_attr.attr);
+	kobject_uevent(kobj, KOBJ_REMOVE);
 	kobject_del(kobj);
 
 	sched_destroy_user(up);



-- 
Regards,
vatsa
From: Ingo Molnar
Date: Wednesday, October 10, 2007 - 12:42 am

thanks, applied. This looks reassuringly simple and straightforward!

	Ingo
-

From: Dave Jones
Date: Monday, October 1, 2007 - 9:12 am

On Mon, Oct 01, 2007 at 07:34:54PM +0530, Dhaval Giani wrote:
 > Hi Ingo,
 > 
 > Adds tunables in sysfs to modify a user's cpu share.
 > 
 > A directory is created in sysfs for each new user in the system.
 > 
 > 	/sys/kernel/uids/<uid>/cpu_share
 > 
 > Reading this file returns the cpu shares granted for the user.
 > Writing into this file modifies the cpu share for the user. Only an 
 > administrator is allowed to modify a user's cpu share.
 > 
 > Ex: 
 > 	# cd /sys/kernel/uids/
 > 	# cat 512/cpu_share
 > 	1024
 > 	# echo 2048 > 512/cpu_share
 > 	# cat 512/cpu_share
 > 	2048
 > 	#

Can we start adding stuff to Documentation/ for new files created
in sysfs ?  There's so much undocumented stuff in there now that
it's unfunny.

A great start would be 'wtf is a cpu_share and why would I want to
change it' ?

	Dave

-- 
http://www.codemonkey.org.uk
-

From: Srivatsa Vaddagiri
Date: Monday, October 1, 2007 - 9:37 am

Hi Dave,
	We will in the next version of the patch. At this point, the patch was 
sent out to get an idea of whether sysfs is the right place for this interface 

Quick answer before we release the next version : This has to do with
the CONFIG_FAIR_USER_SCHED feature which allows dividing CPU bandwidth
equally between all users of a system. Before this patch, CPU bandwidth was 
divided equally among all non-root users. After this patch, it will
be possible to control CPU bandwidth allocation to each user separately.

For ex: by setting user "vatsa"'s cpu_share to be twice that of user
"guest", it would be possible for user "vatsa" to get 2x CPU bandwidth that of 
user "guest"


--
Regards,
vatsa
-

Previous thread: [ANNOUNCE] The Linux Test Project has been Released for SEPTEMBER 2007 by Subrata Modak on Monday, October 1, 2007 - 6:57 am. (1 message)

Next thread: [PATCH] Sysace: Labels in C code should not be indented. by Grant Likely on Monday, October 1, 2007 - 7:19 am. (1 message)