Hi Ingo,
Adds tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
---
include/linux/sched.h | 11 +++++
kernel/ksysfs.c | 4 +
kernel/sched.c | 5 ++
kernel/user.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 129 insertions(+), 1 deletion(-)
Index: linux-2.6.23-rc8-sched-devel/include/linux/sched.h
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/include/linux/sched.h
+++ linux-2.6.23-rc8-sched-devel/include/linux/sched.h
@@ -86,6 +86,7 @@ struct sched_param {
#include <linux/timer.h>
#include <linux/hrtimer.h>
#include <linux/task_io_accounting.h>
+#include <linux/kobject.h>
#include <asm/processor.h>
@@ -598,9 +599,18 @@ struct user_struct {
#ifdef CONFIG_FAIR_USER_SCHED
struct task_group *tg;
+ struct kset kset;
+ struct subsys_attribute user_attr;
+ struct work_struct work;
#endif
};
+#ifdef CONFIG_FAIR_USER_SCHED
+extern int uids_kobject_init(void);
+#else
+static inline int uids_kobject_init(void) { return 0; }
+#endif
+
extern struct user_struct *find_user(uid_t);
extern struct user_struct root_user;
@@ -1846,6 +1856,7 @@ extern struct task_group *sched_create_g
extern void sched_destroy_group(struct task_group *tg);
extern void sched_move_task(struct task_struct *tsk);
extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
+extern unsigned long ...hi Dhaval, looks good to me! I think this API is pretty straightforward. I've put this into my tree and have updated the sched-devel git tree: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git but there seems to be a bug in it, i get this: kobject_add failed for 500 with -EEXIST, don't try to register things with the same name in the same directory. [<c01e1730>] kobject_shadow_add+0x13b/0x169 [<c01e1795>] kobject_set_name+0x28/0x91 [<c01299d5>] user_kobject_create+0x6a/0x90 [<c0129d3c>] alloc_uid+0x141/0x181 [<c012d05c>] set_user+0x1c/0x91 [<c012e4c7>] sys_setresuid+0xd9/0x17d [<c0103cf6>] sysenter_past_esp+0x5f/0x85 ======================= Ingo -
Looks like a remove_dir/create_dir race .. free_user() alloc_uid() uid_hash_remove(up); uid_hash_insert(new, hashent); schedule_work() user_kobject_create(); kobject_add(); -> Boom! remove_user_sysfs_dir(); /me looks up sysfs programming examples .. -- Regards, vatsa -
While a sysfs interface is OK and somewhat orthogonal to the interface proposed the containers patches, I think maybe a new syscall should be considered. Since we now have a fair share cpu scheduler, maybe an interface to specify the cpu share directly (alternatively to priority) make sense. For processes, it may become more intuitive (and precise) to set the processing share directly than setting a priority which is converted to a share. Maybe something similar to ioprio_set() and ioprio_get() syscalls: - per user cpu share - per user group cpu share - per process cpu share - per process group cpu share Best regards, - Eric -
We had discussed syscall vs filesystem based interface for resource management [1] and there was a heavy bias favoring filesystem based interface, based on which the container (now "cgroup") filesystem evolved. Where we already have one interface defined, I would be against adding an equivalent syscall interface. Note that this "fair-user" scheduling can in theory be accomplished using the same cgroup based interface, but requires some extra setup in userspace (either to run a daemon which moves tasks to appropriate control groups/containers upon their uid change OR to modify initrd to mount cgroup filesystem at early bootup time). I expect most distros to enable CONFIG_FAIR_CGROUP_SCHED (control group based fair group scheduler) and not CONFIG_FAIR_USER_SHCED (user id based fair group scheduler). The only reason why we are providing CONFIG_FAIR_USER_SCHED and the associated sysfs interface is to help test group scheduler w/o requiring knowledge of cgroup filesystem. Reference: 1. http://marc.info/?l=linux-kernel&m=116231242201300&w=2 -- Regards, vatsa -
Hi Ingo,
Can you please drop commit b1add858a10cece3a68b2d8cb9e7350843700a58 (last
version of this patch) and try this instead?
---
Adds tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#
Changelog since v1:
1. Added a mutex to serialize directory creation/destruction for a user in
sysfs
2. Added a spinlock in the task_group structure to serialize writes to
tg->shares.
3. Removed /proc/root_user_cpu_shares.
4. Added Documentation about the group scheduler.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
---
Documentation/sched-design-CFS.txt | 67 ++++++++++
include/linux/sched.h | 11 +
kernel/ksysfs.c | 8 +
kernel/sched.c | 14 +-
kernel/sched_debug.c | 48 -------
kernel/user.c | 242 ++++++++++++++++++++++++++++++++-----
6 files changed, 310 insertions(+), 80 deletions(-)
Index: linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/Documentation/sched-design-CFS.txt
+++ linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
@@ -117,3 +117,70 @@ Some implementation details:
iterators of the scheduling modules are used. The balancing code got
quite a bit simpler as a result.
+
+Group scheduler extension to CFS
+================================
+
+Normally the scheduler operates on individual tasks and strives to provide
+fair CPU time to each task. ...thanks - I have added this to the queue. i'm wondering about the following: could not (yet) existing UIDs be made configurable too? I.e. if i do this in a bootup script: echo 2048 > /sys/kernel/uids/500/cpu_share this should just work too, regardless of there not being any UID 500 tasks yet. Likewise, once configured, the /sys/kernel/uids/* directories (with the settings in them) should probably not go away either. Ingo -
Shouldn't that be done via uevents? E.g. UID x gets added to the sysfs tree, generates a uevent and a script then figures out the cpu_share and sets it. That way you also don't need to keep the directories. No? -
That sounds complex administratively. It means that instead of setting a higher or lower than default once and having it persist until reboot I have to create a script, which *will* in some cases be left in place even after the need has gone. I agree with Ingo, once it's done it should be persistent. And as another administrative convenience I can look at that set of values and see what shares are being used, even when the user is not currently active. Final question, how do setuid processes map into this implementation? -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot -
Although the need seems very real, I am thinking about the implementation aspect of this in the kernel i.e how will this be implementable? 1. The current patch proposes a sysfs based interface, where a new directory is created for every new user created who logs into the system. To meet the requirement Ingo suggested, it would require the ability to create directories in sysfs in advance of (user_struct) objects that aren't yet there - which is not possible to implement in sysfs afaik 2. configfs seems to allow creation of directories (i.e kernel objects) from userland. Every new directory created should translate to a user_struct object being created in the kernel (and inserted in uid_hash table). Would this be acceptable? Also, IMHO, CONFIG_FAIR_USER_SCHED is there only as a toy, to test fair group scheduling and I expect distros to support CONFIG_FAIR_CGROUP_SCHED instead which allows "control group" (or process containers) based fair group scheduling. Using CONFIG_FAIR_CGROUP_SCHED it is still possible to provide user-id based fair group scheduling, in two ways: 1. Have a daemon which listens for UID change events (PROC_EVENT_UID) and move the task to appropriate "control groups" and set the "control group" shares 2. Implement a "user" subsystem registered with "cgroup" core, which automatically creates new "control groups" whenever a new user is being added to the system. This is very similar to "ns" subsystem (kernel/ns_cgroup.c in -mm tree). Thus in order to provide fair user scheduling with this option, distro needs to modify initrd to: # mkdir /dev/usercpucontrol # mount -t cgroup -ouser,cpu none /dev/usercpucontrol Using a combination of these two options and a /etc configuration file which specifies the cpu shares to be given to a user, it should be We seem to be going by the real uid of a task (which is what tsk->user points at) to decide its CPU bandwidth. Is that a cause of ...
That would tend to be a tad racy - a site may want to set limits in the hypothetical /sys/kernel/uids/NNN before the program has a chance to fork-bomb or otherwise make it difficult to set a limitfrom within another userspace process. It's similar to why you want a process to be launched with all its ulimit's set, rather than set them after the fork/exec happens...
Note that there is a default limit enforced on every new user (1024 value for the user's cpu share). This limit should contain any fork-bomb that -- Regards, vatsa -
Heiko, Thanks for the hint. Here's a patch to enable generation of uevents for user creation/deletion. These uevents can be handled in userspace to configure a new user's cpu share. Note : After bootup I could test that new user's cpu share is configured as per a configuration file (/etc/user_cpu_share.conf). However this mechanism didnt work for root user. Perhaps uevent for root user is generated way too early? A HOWTO text file is also attached explaining how to make use of these uevents in userspace. Ingo, This patch applies on top of latest sched-devel tree. Pls review and apply .. --- Generate uevents when a user is being created/destroyed. These events could be used to configure cpu share of a new user. Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by : Dhaval Giani <dhaval@linux.vnet.ibm.com> --- kernel/user.c | 4 ++++ 1 files changed, 4 insertions(+) Index: current/kernel/user.c =================================================================== --- current.orig/kernel/user.c +++ current/kernel/user.c @@ -174,6 +174,8 @@ static int user_kobject_create(struct us if (error) kobject_del(kobj); + kobject_uevent(kobj, KOBJ_ADD); + done: return error; } @@ -189,6 +191,7 @@ int __init uids_kobject_init(void) /* create under /sys/kernel dir */ uids_kobject.parent = &kernel_subsys.kobj; + uids_kobject.kset = &kernel_subsys; kobject_set_name(&uids_kobject, "uids"); kobject_init(&uids_kobject); @@ -228,6 +231,7 @@ static void remove_user_sysfs_dir(struct goto done; sysfs_remove_file(kobj, &up->user_attr.attr); + kobject_uevent(kobj, KOBJ_REMOVE); kobject_del(kobj); sched_destroy_user(up); -- Regards, vatsa
thanks, applied. This looks reassuringly simple and straightforward! Ingo -
On Mon, Oct 01, 2007 at 07:34:54PM +0530, Dhaval Giani wrote: > Hi Ingo, > > Adds tunables in sysfs to modify a user's cpu share. > > A directory is created in sysfs for each new user in the system. > > /sys/kernel/uids/<uid>/cpu_share > > Reading this file returns the cpu shares granted for the user. > Writing into this file modifies the cpu share for the user. Only an > administrator is allowed to modify a user's cpu share. > > Ex: > # cd /sys/kernel/uids/ > # cat 512/cpu_share > 1024 > # echo 2048 > 512/cpu_share > # cat 512/cpu_share > 2048 > # Can we start adding stuff to Documentation/ for new files created in sysfs ? There's so much undocumented stuff in there now that it's unfunny. A great start would be 'wtf is a cpu_share and why would I want to change it' ? Dave -- http://www.codemonkey.org.uk -
Hi Dave, We will in the next version of the patch. At this point, the patch was sent out to get an idea of whether sysfs is the right place for this interface Quick answer before we release the next version : This has to do with the CONFIG_FAIR_USER_SCHED feature which allows dividing CPU bandwidth equally between all users of a system. Before this patch, CPU bandwidth was divided equally among all non-root users. After this patch, it will be possible to control CPU bandwidth allocation to each user separately. For ex: by setting user "vatsa"'s cpu_share to be twice that of user "guest", it would be possible for user "vatsa" to get 2x CPU bandwidth that of user "guest" -- Regards, vatsa -
