Re: [PATCH sched-devel] Generate uevents for user creation/destruction

Previous thread: [ANNOUNCE] The Linux Test Project has been Released for SEPTEMBER 2007 by Subrata Modak on Monday, October 1, 2007 - 9:57 am. (1 message)

Next thread: [PATCH] Sysace: Labels in C code should not be indented. by Grant Likely on Monday, October 1, 2007 - 10:19 am. (1 message)
To: Ingo Molnar <mingo@...>
Cc: Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Monday, October 1, 2007 - 10:04 am

Hi Ingo,

Adds tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
include/linux/sched.h | 11 +++++
kernel/ksysfs.c | 4 +
kernel/sched.c | 5 ++
kernel/user.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 129 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc8-sched-devel/include/linux/sched.h
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/include/linux/sched.h
+++ linux-2.6.23-rc8-sched-devel/include/linux/sched.h
@@ -86,6 +86,7 @@ struct sched_param {
#include <linux/timer.h>
#include <linux/hrtimer.h>
#include <linux/task_io_accounting.h>
+#include <linux/kobject.h>

#include <asm/processor.h>

@@ -598,9 +599,18 @@ struct user_struct {

#ifdef CONFIG_FAIR_USER_SCHED
struct task_group *tg;
+ struct kset kset;
+ struct subsys_attribute user_attr;
+ struct work_struct work;
#endif
};

+#ifdef CONFIG_FAIR_USER_SCHED
+extern int uids_kobject_init(void);
+#else
+static inline int uids_kobject_init(void) { return 0; }
+#endif
+
extern struct user_struct *find_user(uid_t);

extern struct user_struct root_user;
@@ -1846,6 +1856,7 @@ extern struct task_group *sched_create_g
extern void sched_destroy_group(struct task_group *tg);
extern void sched_move_task(struct task_struct *tsk);
extern int sched_group_set_shares(struct task_group *tg, unsigned lon...

To: Dhaval Giani <dhaval@...>
Cc: Ingo Molnar <mingo@...>, Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Monday, October 1, 2007 - 12:12 pm

On Mon, Oct 01, 2007 at 07:34:54PM +0530, Dhaval Giani wrote:
> Hi Ingo,
>
> Adds tunables in sysfs to modify a user's cpu share.
>
> A directory is created in sysfs for each new user in the system.
>
> /sys/kernel/uids/<uid>/cpu_share
>
> Reading this file returns the cpu shares granted for the user.
> Writing into this file modifies the cpu share for the user. Only an
> administrator is allowed to modify a user's cpu share.
>
> Ex:
> # cd /sys/kernel/uids/
> # cat 512/cpu_share
> 1024
> # echo 2048 > 512/cpu_share
> # cat 512/cpu_share
> 2048
> #

Can we start adding stuff to Documentation/ for new files created
in sysfs ? There's so much undocumented stuff in there now that
it's unfunny.

A great start would be 'wtf is a cpu_share and why would I want to
change it' ?

Dave

--
http://www.codemonkey.org.uk
-

To: Dave Jones <davej@...>, Dhaval Giani <dhaval@...>, Ingo Molnar <mingo@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Monday, October 1, 2007 - 12:37 pm

Hi Dave,
We will in the next version of the patch. At this point, the patch was
sent out to get an idea of whether sysfs is the right place for this interface

Quick answer before we release the next version : This has to do with
the CONFIG_FAIR_USER_SCHED feature which allows dividing CPU bandwidth
equally between all users of a system. Before this patch, CPU bandwidth was
divided equally among all non-root users. After this patch, it will
be possible to control CPU bandwidth allocation to each user separately.

For ex: by setting user "vatsa"'s cpu_share to be twice that of user
"guest", it would be possible for user "vatsa" to get 2x CPU bandwidth that of
user "guest"

--
Regards,
vatsa
-

To: Dhaval Giani <dhaval@...>
Cc: Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Monday, October 1, 2007 - 10:44 am

hi Dhaval,

looks good to me! I think this API is pretty straightforward. I've put
this into my tree and have updated the sched-devel git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

but there seems to be a bug in it, i get this:

kobject_add failed for 500 with -EEXIST, don't try to register things with the same name in the same directory.
[<c01e1730>] kobject_shadow_add+0x13b/0x169
[<c01e1795>] kobject_set_name+0x28/0x91
[<c01299d5>] user_kobject_create+0x6a/0x90
[<c0129d3c>] alloc_uid+0x141/0x181
[<c012d05c>] set_user+0x1c/0x91
[<c012e4c7>] sys_setresuid+0xd9/0x17d
[<c0103cf6>] sysenter_past_esp+0x5f/0x85
=======================

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Wednesday, October 3, 2007 - 1:10 pm

Hi Ingo,

Can you please drop commit b1add858a10cece3a68b2d8cb9e7350843700a58 (last
version of this patch) and try this instead?
---
Adds tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#

Changelog since v1:
1. Added a mutex to serialize directory creation/destruction for a user in
sysfs
2. Added a spinlock in the task_group structure to serialize writes to
tg->shares.
3. Removed /proc/root_user_cpu_shares.
4. Added Documentation about the group scheduler.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
Documentation/sched-design-CFS.txt | 67 ++++++++++
include/linux/sched.h | 11 +
kernel/ksysfs.c | 8 +
kernel/sched.c | 14 +-
kernel/sched_debug.c | 48 -------
kernel/user.c | 242 ++++++++++++++++++++++++++++++++-----
6 files changed, 310 insertions(+), 80 deletions(-)

Index: linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
===================================================================
--- linux-2.6.23-rc8-sched-devel.orig/Documentation/sched-design-CFS.txt
+++ linux-2.6.23-rc8-sched-devel/Documentation/sched-design-CFS.txt
@@ -117,3 +117,70 @@ Some implementation details:
iterators of the scheduling modules are used. The balancing code got
quite a bit simpler as a result.

+
+Group scheduler extension to CFS
+================================
+
+Normally the scheduler operates on individual tasks and strives to provide
+fair CPU...

To: Dhaval Giani <dhaval@...>
Cc: Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Thursday, October 4, 2007 - 3:57 am

thanks - I have added this to the queue.

i'm wondering about the following: could not (yet) existing UIDs be made
configurable too? I.e. if i do this in a bootup script:

echo 2048 > /sys/kernel/uids/500/cpu_share

this should just work too, regardless of there not being any UID 500
tasks yet. Likewise, once configured, the /sys/kernel/uids/* directories
(with the settings in them) should probably not go away either.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Dhaval Giani <dhaval@...>, Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Thursday, October 4, 2007 - 4:54 am

Shouldn't that be done via uevents? E.g. UID x gets added to the sysfs tree,
generates a uevent and a script then figures out the cpu_share and sets it.
That way you also don't need to keep the directories. No?
-

To: Heiko Carstens <heiko.carstens@...>
Cc: Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Tuesday, October 9, 2007 - 11:12 am

Heiko,
Thanks for the hint. Here's a patch to enable generation of
uevents for user creation/deletion. These uevents can be handled in
userspace to configure a new user's cpu share.

Note : After bootup I could test that new user's cpu share is configured
as per a configuration file (/etc/user_cpu_share.conf). However this
mechanism didnt work for root user. Perhaps uevent for root user is
generated way too early?

A HOWTO text file is also attached explaining how to make use of these
uevents in userspace.

Ingo,
This patch applies on top of latest sched-devel tree. Pls review
and apply ..

---

Generate uevents when a user is being created/destroyed. These events
could be used to configure cpu share of a new user.

Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by : Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
kernel/user.c | 4 ++++
1 files changed, 4 insertions(+)

Index: current/kernel/user.c
===================================================================
--- current.orig/kernel/user.c
+++ current/kernel/user.c
@@ -174,6 +174,8 @@ static int user_kobject_create(struct us
if (error)
kobject_del(kobj);

+ kobject_uevent(kobj, KOBJ_ADD);
+
done:
return error;
}
@@ -189,6 +191,7 @@ int __init uids_kobject_init(void)

/* create under /sys/kernel dir */
uids_kobject.parent = &kernel_subsys.kobj;
+ uids_kobject.kset = &kernel_subsys;
kobject_set_name(&uids_kobject, "uids");
kobject_init(&uids_kobject);

@@ -228,6 +231,7 @@ static void remove_user_sysfs_dir(struct
goto done;

sysfs_remove_file(kobj, &up->user_attr.attr);
+ kobject_uevent(kobj, KOBJ_REMOVE);
kobject_del(kobj);

sched_destroy_user(up);

--
Regards,
vatsa

To: Srivatsa Vaddagiri <vatsa@...>
Cc: Heiko Carstens <heiko.carstens@...>, Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Wednesday, October 10, 2007 - 3:42 am

thanks, applied. This looks reassuringly simple and straightforward!

Ingo
-

To: Heiko Carstens <heiko.carstens@...>
Cc: Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Thursday, October 4, 2007 - 5:32 pm

That would tend to be a tad racy - a site may want to set limits in the
hypothetical /sys/kernel/uids/NNN before the program has a chance to fork-bomb
or otherwise make it difficult to set a limitfrom within another userspace
process. It's similar to why you want a process to be launched with all its
ulimit's set, rather than set them after the fork/exec happens...

To: <Valdis.Kletnieks@...>
Cc: Heiko Carstens <heiko.carstens@...>, Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Friday, October 5, 2007 - 3:01 am

Note that there is a default limit enforced on every new user (1024
value for the user's cpu share). This limit should contain any fork-bomb that

--
Regards,
vatsa
-

To: Heiko Carstens <heiko.carstens@...>
Cc: Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Thursday, October 4, 2007 - 12:02 pm

That sounds complex administratively. It means that instead of setting a
higher or lower than default once and having it persist until reboot I
have to create a script, which *will* in some cases be left in place
even after the need has gone.

I agree with Ingo, once it's done it should be persistent.

And as another administrative convenience I can look at that set of
values and see what shares are being used, even when the user is not
currently active.

Final question, how do setuid processes map into this implementation?

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-

To: Bill Davidsen <davidsen@...>
Cc: Heiko Carstens <heiko.carstens@...>, Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Thursday, October 4, 2007 - 1:20 pm

Although the need seems very real, I am thinking about the implementation
aspect of this in the kernel i.e how will this be implementable?

1. The current patch proposes a sysfs based interface, where a new
directory is created for every new user created who logs into the
system. To meet the requirement Ingo suggested, it would require the
ability to create directories in sysfs in advance of (user_struct) objects
that aren't yet there - which is not possible to implement in sysfs
afaik

2. configfs seems to allow creation of directories (i.e kernel objects) from
userland. Every new directory created should translate to a
user_struct object being created in the kernel (and inserted in
uid_hash table). Would this be acceptable?

Also, IMHO, CONFIG_FAIR_USER_SCHED is there only as a toy, to test fair group
scheduling and I expect distros to support CONFIG_FAIR_CGROUP_SCHED instead
which allows "control group" (or process containers) based fair group
scheduling.

Using CONFIG_FAIR_CGROUP_SCHED it is still possible to provide user-id based
fair group scheduling, in two ways:

1. Have a daemon which listens for UID change events
(PROC_EVENT_UID) and move the task to appropriate "control
groups" and set the "control group" shares
2. Implement a "user" subsystem registered with "cgroup" core,
which automatically creates new "control groups" whenever
a new user is being added to the system. This is very similar
to "ns" subsystem (kernel/ns_cgroup.c in -mm tree).
Thus in order to provide fair user scheduling with this option,
distro needs to modify initrd to:

# mkdir /dev/usercpucontrol
# mount -t cgroup -ouser,cpu none /dev/usercpucontrol

Using a combination of these two options and a /etc configuration file
which specifies the cpu shares to be given to a user, it should be

We seem to be going by the real uid of a task (which is what tsk->user
points at) to decide its CPU bandwidth. Is that a cause of conc...

To: Ingo Molnar <mingo@...>
Cc: Dhaval Giani <dhaval@...>, Srivatsa Vaddagiri <vatsa@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Tuesday, October 2, 2007 - 6:12 pm

While a sysfs interface is OK and somewhat orthogonal to the interface
proposed the containers patches, I think maybe a new syscall should be
considered.

Since we now have a fair share cpu scheduler, maybe an interface to
specify the cpu share directly (alternatively to priority) make sense.

For processes, it may become more intuitive (and precise) to set the
processing share directly than setting a priority which is converted to
a share.

Maybe something similar to ioprio_set() and ioprio_get() syscalls:

- per user cpu share
- per user group cpu share
- per process cpu share
- per process group cpu share

Best regards,

- Eric

-

To: Eric St-Laurent <ericstl34@...>
Cc: Ingo Molnar <mingo@...>, Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Wednesday, October 3, 2007 - 12:09 am

We had discussed syscall vs filesystem based interface for resource
management [1] and there was a heavy bias favoring filesystem based interface,
based on which the container (now "cgroup") filesystem evolved.

Where we already have one interface defined, I would be against adding
an equivalent syscall interface.

Note that this "fair-user" scheduling can in theory be accomplished
using the same cgroup based interface, but requires some extra setup in
userspace (either to run a daemon which moves tasks to appropriate
control groups/containers upon their uid change OR to modify initrd to mount
cgroup filesystem at early bootup time). I expect most distros to enable
CONFIG_FAIR_CGROUP_SCHED (control group based fair group scheduler) and not
CONFIG_FAIR_USER_SHCED (user id based fair group scheduler). The only
reason why we are providing CONFIG_FAIR_USER_SCHED and the associated
sysfs interface is to help test group scheduler w/o requiring knowledge
of cgroup filesystem.

Reference:

1. http://marc.info/?l=linux-kernel&m=116231242201300&w=2

--
Regards,
vatsa
-

To: Ingo Molnar <mingo@...>
Cc: Dhaval Giani <dhaval@...>, Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Dmitry Adamushko <dmitry.adamushko@...>, lkml <linux-kernel@...>, <maneesh@...>, Andrew Morton <akpm@...>, Sudhir Kumar <skumar@...>
Date: Monday, October 1, 2007 - 11:32 am

Looks like a remove_dir/create_dir race ..

free_user() alloc_uid()

uid_hash_remove(up);

uid_hash_insert(new, hashent);

schedule_work()
user_kobject_create();
kobject_add(); -> Boom!

remove_user_sysfs_dir();

/me looks up sysfs programming examples ..

--
Regards,
vatsa
-

Previous thread: [ANNOUNCE] The Linux Test Project has been Released for SEPTEMBER 2007 by Subrata Modak on Monday, October 1, 2007 - 9:57 am. (1 message)

Next thread: [PATCH] Sysace: Labels in C code should not be indented. by Grant Likely on Monday, October 1, 2007 - 10:19 am. (1 message)