Re: [RFC] libcg: design and plans

Previous thread: [PATCH] Make Andrew's life easier by Matthew Wilcox on Tuesday, March 4, 2008 - 8:02 am. (1 message)

Next thread: 我最喜歡聊天、唱歌、看電 by 吳孟恩 on Tuesday, March 4, 2008 - 8:43 am. (1 message)
From: Dhaval Giani
Date: Tuesday, March 4, 2008 - 8:23 am

Hi,

We have been working on a library for control groups which would provide
simple APIs for programmers to utilize from userspace and make use of
control groups.

We are still designing the library and the APIs. I've attached the
design (as of now) to get some feedback from the community whether we
are heading in the correct direction and what else should be addressed.

We have a project on sourceforge.net at
https://sourceforge.net/projects/libcg and the mailing list (cc'd here)
can be found at https://lists.sourceforge.net/lists/listinfo/libcg-devel

Thanks,
--

libcg
1. Aims/Requirements
2. Design
3. APIs
4. Configuration Scheme

1. Aims/Requirements

1.1 What are Control Groups

Control Groups provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with
specialized behaviour [1]. It makes use of a filesystem interface.

1.2 Aims of libcg

libcg aims to provide programmers easily usable APIs to use the control
group file system. It should satisfy the following requirements

1.2.1. Provide a programmable interface for cgroups

This should allow applications to create cgroups using something like
create_cgroup() as opposed to having to go the whole filesystem route.

1.2.2. Provide persistent configuration across reboots

Control Groups have a lifetime of only one boot cycle. The configuration
is lost at reboot. Userspace needs to handle this issue. This is handled
by libcg

1.2.3. Provide a programmable interface for manipulating configurations

This should allow libcg to handle changing application requirements. For
example, while gaming, you might want to reduce the cpu power of other
groups whereas othertimes you would want greater CPU power for those
groups.

2. Design

2.1 Architecture

2.1.1 Global overview

libcg will be consumed in the following fashion

			---------------------------
			|      applications	  |
			---------------------------
			|	  libcg		  ...
From: Xpl++
Date: Tuesday, March 4, 2008 - 10:15 am

Hi,

I was wonder if creating such library makes any sense at all, 
considering the nature of cgroups, the way they work and their possible 
application?
It seems to me that any attempt to create a 'simple' API would actualy 
result in something that will be much harder to use that just making raw 
mkdir/open/read/write/close operations. Another thing is suggested 
config for this lib would be more appropriate for a daemon rather than a 
library.
In general - cgroup is a very flexible subsystem that can be used in a 
wide variety of ways and modes and trying to create a universal simple 
API would more likely result in something hard to manage and work with.
The idea of having multiple container managers (applications that use 
libcg) creates a lots of questions and possible issues. Containers are 
supposed provide a flexible resource management and task grouping 
ability, which somewhat implies that there cannot be two different 
resource managers per node without one knowing well the 
desires/goals/methods of the other and vice versa. And should there be 
only one manager per node - probably it will be easier for it to use 
cgroup subsystem directly rather than using a wrapper library?

Regards,
Peter Litov.

--

From: Balbir Singh
Date: Tuesday, March 4, 2008 - 9:48 pm

The configuration file is important, since it allows us two levels of control.
At one level the system administrator and at the other level applications. Each
application can maintain it's own hierarchy across reboots.

We thought of a daemon, but there were several overheads and disadvantages. For
one, we needed an IPC mechanism to communicate every client request to the
daemon, the client being the application. The daemon also becomes the single
point of failure for all applications. Moreover, we want to provide the ability
to programmatically update the configuration. A daemon, if desired can be built

I agree that cgroups is flexible, but why do you think abstracting common tasks
amongst applications will be hard to manage and work with? We want to provide
API that will allow us to fill in parameters and say -- go create this group or

We don't assume that there cannot be two different resource managers per node.
We don't enforce any policy, we just allow for easy creation and manipulation of
control groups hierarchially.


Could you please elaborate, why is it probably easier?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: Dhaval Giani
Date: Tuesday, March 4, 2008 - 10:26 pm

Hi Peter,

Thanks for your comments.


These simple APIs are nothing but raw mkdir/open/read/write/close

I disagree. Allowing multiple resource managers allows more flexibility.
One thing the configuration subsystem aims to do is to allow permissions
to the groups. With this happening, a resource manager does not need to
about the existence of other groups, it can control only the resources
allotted to it. And since the top level is controlled by the
administrator, he is aware of the groups and their needs. (I've given an
example of such a scenario in the document as well).

Hope this helps answer your question.

Thanks,
-- 
regards,
Dhaval
--

From: Xpl++
Date: Wednesday, March 5, 2008 - 4:56 am

Hi Dhaval,

Then why do you have to make the api? Everybody knows how to user these 
syscalls :)
Imagine having a shared/joint household savings account with your wife, 
and taking money from it without your wife knowing and vice versa .. 
then at some point when you thought that you have some $5K to buy the 
new super duper laptop you dreamt about your entire life - surprise - no 
enough resources :)
This is somewhat the equivalent of multiple independent resourse 
managers :) It won't end well :)
Should they be expected to be adequate in doing their job, they cannot 
be independent since they manage a shared resource.

Regards,
Peter.
--

From: Dhaval Giani
Date: Wednesday, March 5, 2008 - 8:53 am

Well, that's just a small subset of the APIs. There are a lot of other

I don't quite agree with your analogy here. The point here is that each
resource manager operates in its own area, and has already been assigned
some resources which it cannot change. Its more like you can take $x at
the most from the account and your wife $y with x+y<=total money.

-- 
regards,
Dhaval
--

From: Xpl++
Date: Wednesday, March 5, 2008 - 12:36 pm

Hi Dhaval,

Ok .. so imagine that your kid got sick and you need more than $x, while 
you wife does not need any money at that particular moment?
Would you:

a) take a loan (buy more hardware/resources, just because we can), 
despite the fact you already have them in you account, and since we're 
talking about your child it's in the interest of both you and your wife 
(tha family, 'the whole' so to speak) to cure the illness

b) notify your wife there is an emergency and you will need some extra 
money which has to come from her $y quota, take the money, make the kid 
happy, and just continue business as usual? (that is - being smart and 
dynamic in resource allocation)

With the proposed libcg, answer a) seems to be the only option .. and my 
company is not like M$ so I don't have extra resources to waste just 
because I was told resources cannot be managed dynamicaly :)
Why would one need to manage node resources dynamicaly: in our real-life 
production system we have to manage resources dynamicaly, because any 
other solution would require at least twice as much hardware, which will 
also inevitably lead to a necessity to hire more qualified admins, which 
is once again not so wise for small/medium business .. and not to 
mention the extra CO2 caused by the few more dozens of servers :)

Regards,
Peter.
--

From: Dave Hansen
Date: Tuesday, March 4, 2008 - 11:05 am

What kinds of things are lacking the the cgroup interface that you want
to make up for in this library?

-- Dave

--

From: Paul Menage
Date: Tuesday, March 4, 2008 - 11:15 pm

Hi Dhaval,


There are a few things that it would be nice to include in such a
library, if you're going to develop one:

- the ability to create abstract groups of processes, and resource
groups, and have the ability to tie these together arbitrarily. E.g
you might create abstract groups A, B and C, and be able to say that A
and B share memory with each other but not with C, and all three
groups are isolated from each other for CPU. Then libcg would mount
different resource types in different cgroup hierarchies (you would
probably tell it ahead of time which combinations of sharing you would
want, in order that it could minimize the number of mounted
hierarchies). When you tell libcg to move a process into abstract
group A, it would move it into the appropriate resource group in each
hierarchy.

- an interface for gathering usage stats from cgroups.

- support for dynamically migrating processes between groups based on
process connector events (i.e. a finished version of the daemon that
you were working on last year)

Paul
--

From: Denis V. Lunev
Date: Wednesday, March 5, 2008 - 12:17 am

There is one more important thing. In addition to the processes you must
unite or provide a way to unite other objects like sockets. This is
needed to create a group-based socket buffer management.

The mapping between socket and a process does not exists right now and,
we can have (virtually), sockets from from different namespaces in one
process.

Regards,
    Den

--

From: Balbir Singh
Date: Wednesday, March 5, 2008 - 4:48 am

Not sure how any of this is related to the library design we are discussing.
Your talking about writing a controller that groups based on sockets, that is a
totally different thing.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: Dhaval Giani
Date: Wednesday, March 5, 2008 - 3:33 am

I am not very clear about what you are asking for here, so let me try to
rephrase it, and if I have understood it correctly, we can move further
ahead from there.

So there are two different points, /mem and /cpu. /mem has A and C and
/cpu has A, B and C. A and B of /cpu correspond to A of /mem and the C's
are the same. With this is mind, if I say a task should move to B in

Yes, that is a todo. We should get around to it as the functionality

libcg is at a lower level than this. The dynamic migration of processes
can be based on top of libcg, and exploit it (and be more powerful than
the daemon I posted last year) It would be able to utilize the
configuration and other capabilities of libcg.

Thanks,
-- 
regards,
Dhaval
--

From: Paul Menage
Date: Wednesday, March 5, 2008 - 3:41 am

Maybe clearer to say that /mem has two cgroups, AB and C. The
abstraction provided by libcg would be of three groups, A, B and C.
Asking libcg to move a process to abstract group B would result it
moving to /mem/AB and /cpu/B

Paul
--

From: Dhaval Giani
Date: Wednesday, March 5, 2008 - 4:07 am

OK. Hmm, I've not really thought about it. At first thought, it should
not be very difficult. Only thing I am not sure is the arbitrary
grouping of the groups (ok, a bit confusing). If that information is
maintained somewhere, it should be pretty straightforward. (Only thing
is that I am not sure how it will be done, and where the grouping
information should be stored. configuration looks like the logical
place, but I am not sure)

Thanks,
-- 
regards,
Dhaval
--

From: Paul Menage
Date: Wednesday, March 5, 2008 - 4:51 am

I suspect that the main form of composite grouping is going to be
between parents and children. E.g. you might want to say things like:

create_group(A, memory=1G, cpu=100)
create_group(B, parent=A, memory=inherit, cpu=20)
create_group(C, parent=A, memory=inherit, cpu=30)

i.e. both B and C inherit/share their memory limit from their parent,
but have their own CPU groups (child groups of their parent?)

So this would result in a single group A in the memory hierarchy and a
top-level group A and child groups B and C in the cpu hierarchy. libcg
would abstract away the fact that when you moved a process into an
abstract group, it actually had to be moved into multiple real groups.

I think this kind of sharing is fairly easy to specify. Now, there's
no reason that it shouldn't support more complex group sharing as
well, but that might require the user to use lower-level operations,
such as creating resource groups in particular hierarchies, and
associating abstract groups with those resource groups.

The model above (children sharing resource groups with their parents
for some resources) is actually something that I figured could be
supported relatively straightforwardly in the kernel - essentially:

- each subsystem "foo" would have a foo.inherit file provided by
cgroups in each group directory

- setting the foo.inherit flag (i.e. writing 1 to it) would cause
tasks in that cgroup to share the "foo" subsystem state with the
parent cgroup

- from the subsystem's point of view, it would only need to worry
about its own foo_cgroup objects  and which task was associated with
each object; the subsystem wouldn't need to care about which tasks
were part of each cgroup, and which cgroups were sharing state; that
would all be taken care of by the cgroup framework

I'd sketched out a fairly nice design for how it would all work in my
head when I realised that it could actually all be done via multiple
hierarchies in userspace with something like the libcg operations I
suggested ...
From: Balbir Singh
Date: Wednesday, March 5, 2008 - 7:24 am

No, we don't plan on doing that. What we plan on doing is

1. Specify the mount point for each controller
2. In the create group API, specify the name of the group and the various
parameters.

If for example CPU is mounted at /cpu and Memory at /mem

Then a specification for creation of group A would be of the form

create_group(A, cpu=100, memory=100M)

Then,

/cpu/A has shares set to 100 and /mem/A has memory.limit set to 100M

If you want to create subgroups under A, you specify

create_group(A/B, memory=200M, cpu=50)

That would create /cpu/A/B and /mem/A/B

Please note that memory and CPU hierarchy needs work in the kernel. The shares
and hierarchy support is pending. We need to make the res_counters
infrastructure aware of hierarchies.



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: Paul Menage
Date: Wednesday, March 5, 2008 - 11:55 am

I think there are two different kinds of sharing going on here:

- A and B each have individual limits, and you additionally want their
total usage to be capped by some parent limit. E.g. A and B each have
a 100MB memory limit, and you want their total combined usage to not
exceed 150MB. This kind of sharing has to be handled by the resource
counter abstraction

- you want A and B to be treated identically for the purposes of some
particular resource, e.g. you want a single CFS group to which all
threads in A and B are equal members, or a single memory cgroup for
all allocations by A or B, or the same device control table for A and
B (but you want A and B to be treated separately for some other
resource type). This can be handled in userspace in the way I outlined
above, and it would be good if libcg could handle the setup required
for this. It could also be done in the kernel with something like the
parent/child subsystem group inheritance that I also mentioned above,
if there was demand. But if so it should be a property of cgroups
rather than any individual resource controller, since it's a feature
that could be useful for all cgroups.

Paul
--

From: Rik van Riel
Date: Thursday, March 20, 2008 - 3:04 pm

On Tue, 4 Mar 2008 20:53:41 +0530

Since somebody (*cough*openvz*cough*) will no doubt add security and
network separation to the control groups at some point in the future,
why not build on libvirt?

That way admins can also use the same tools for monitoring and managing
resource groups that they use for virtual machines.

http://libvirt.org/

-- 
All Rights Reversed
--

Previous thread: [PATCH] Make Andrew's life easier by Matthew Wilcox on Tuesday, March 4, 2008 - 8:02 am. (1 message)

Next thread: 我最喜歡聊天、唱歌、看電 by 吳孟恩 on Tuesday, March 4, 2008 - 8:43 am. (1 message)