Re: [RFC] [PATCH] cgroup: limit network bandwidth

Previous thread: [PATCH] x86_64: remove duplicated line about x86_bios_cpu_apicid_early_ptr by Yinghai Lu on Wednesday, January 23, 2008 - 4:56 am. (5 messages)

Next thread: Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch] by Abhishek Rai on Wednesday, January 23, 2008 - 5:12 am. (3 messages)
To: Balbir Singh <balbir@...>, Naveen Gupta <ngupta@...>, Paul Menage <menage@...>
Cc: LKML <linux-kernel@...>, David Miller <davem@...>
Date: Wednesday, January 23, 2008 - 5:09 am

Allow to limit the network bandwidth for specific process containers (cgroups)
imposing additional delays in the sockets' sendmsg()/recvmsg() calls made by
those processes that exceed the limits defined in the control group filesystem.

Example:
# mkdir /dev/cgroup
# mount -t cgroup -onet net /dev/cgroup
# cd /dev/cgroup
# mkdir foo
--> the cgroup foo has been created
# /bin/echo $$ > foo/tasks
# /bin/echo 1024 > foo/net.tcp
# /bin/echo 2048 > foo/net.tot
# sh
--> the subshell 'sh' is running in cgroup "foo" that has a maximum network
bandwidth for TCP traffic of 1MB/s and 2MB/s for total network
activities.

The netlimit approach can be easily extended to support additional network
protocols or different socket families or types (PF_UNIX, PF_BLUETOOTH,
SOCK_SEQPACKET, etc.).

Signed-off-by: Andrea Righi <a.righi@cineca.it>
---

diff -urpN linux-2.6.24-rc8/include/linux/cgroup_netlimit.h linux-2.6.24-rc8-cgroup-netlimit/include/linux/cgroup_netlimit.h
--- linux-2.6.24-rc8/include/linux/cgroup_netlimit.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.24-rc8-cgroup-netlimit/include/linux/cgroup_netlimit.h 2008-01-22 21:36:15.000000000 +0100
@@ -0,0 +1,29 @@
+#ifndef CGROUP_NETLIMIT_H
+#define CGROUP_NETLIMIT_H
+
+enum {
+ CGROUP_NETLIMIT_TOT,
+ CGROUP_NETLIMIT_TCP,
+ CGROUP_NETLIMIT_UDP,
+ CGROUP_NETLIMIT_RAW,
+ /* This sets the size of the different netlimit types */
+ CGROUP_NETLIMIT_END,
+};
+
+#define CGROUP_NETLIMIT_FILE(_x, _y) \
+ { \
+ .name = _x, \
+ .read = netlimit_read, \
+ .write_uint = netlimit_write_uint, \
+ .private = _y, \
+ }
+
+#ifdef CONFIG_CGROUP_NETLIMIT
+extern void cgroup_nl_acct(int limit_id, size_t bytes);
+extern void cgroup_nl_throttle(int limit_id, int interruptible);
+#else
+static inline void cgroup_nl_acct(int limit_id, size_t bytes) { }
+static inline void cgroup_nl_throttle(int limit_id, int interruptible) { }
+#endif /* CONFIG_CGROUP_NETLIMIT */
+
+#endif
diff -ur...

To: <righiandr@...>
Cc: Balbir Singh <balbir@...>, Naveen Gupta <ngupta@...>, LKML <linux-kernel@...>, David Miller <davem@...>, Ranjit Manomohan <ranjitm@...>
Date: Wednesday, January 23, 2008 - 5:54 am

An approach that we've been experimenting with at Google is much simpler:

- add a "network class id" subsystem, that lets you associated an id
with each cgroup

- propagate this id to sockets created by that cgroup, and from there
to packets sent/received on that socket

- add a new traffic filter that can select based on a packet's cgroup class id

This is a very small amount of kernel code, but it then lets userspace
set up whatever queues/filters/classes it wants using the standard
Linux traffic API, rather than creating a new traffic control API
that's much more limited. So you can easily do things like controlling
guarantees and limits, have different behaviour for local and remote
packets, have packet/byte accounting for different flow classes,
filter on ToS bits in order to let the cgroup prioritize its own
traffic, etc.

We also have plans (have had for months in fact, but haven't had time
for it yet) to let the cgroup network id be selected on in iptables
rules, and possibly add a new iptable for events such as listen(),
bind(), and connect(), to allow very easy control over what network
connections a cgroup can access. This would let you use the full power
of the existing packet/connection matching available in the standard
iptables rules without having to add a new complex (but still limited)
API.

Paul

--

To: Andrea Righi <righiandr@...>
Cc: Naveen Gupta <ngupta@...>, Paul Menage <menage@...>, LKML <linux-kernel@...>, David Miller <davem@...>
Date: Wednesday, January 23, 2008 - 5:24 am

Hi, Andrea,

I took a quick look at the patches and it looks like we throttle
network (by forcing a schedule_timeout()), if we exceed our bandwidth
limit. That is one way of doing it, but it has some disadvantages, it
does not scale to

1. Implementation of soft limits (limit on contention of resource)
gets harder
2. Why dont use the existing infrastructure for bandwidth limitation
for implementing the network controller?

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Andrea Righi <righiandr@...>, Naveen Gupta <ngupta@...>, Paul Menage <menage@...>, LKML <linux-kernel@...>, David Miller <davem@...>
Date: Wednesday, January 23, 2008 - 12:48 pm

Why? do you mean implementing a grace time when the soft-limit is
exceeded? this could be done in cgroup_nl_throttle() introducing 3
additional attributes to struct netlimit (i.e. hard_limit,
last_time_exceeded grace_time) and perform something like:
...
if ((current_rate > hard_limit) ||
time_after(jiffies, last_time_exceeded + grace_time))
schedule_timeout(sleep);

Yes, the integration with iptables (as Paul said), and traffic shaping
rules would be absolutely the right way(tm) in perspective. I was just
proposing a possible simple API to implement the limiting stuff.

-Andrea
--

To: <righiandr@...>
Cc: Naveen Gupta <ngupta@...>, LKML <linux-kernel@...>, David Miller <davem@...>
Date: Wednesday, January 23, 2008 - 12:59 pm

He's talking about cases where we want the behaviour to be
work-conserving, whilst still offering guarantees in the event of
contention. e.g. cgroups A and B each get a 20% guarantee on the TX
path if they need it, but anyone can use any otherwise-idle bandwidth.
(This is relatively straightforward to set up from userspace with the

But this issue (traffic control for cgroups) is too complex to be
described by a simple API. Any simple API you choose to try to
describe the limiting directly will be insufficient for a good number
of the potential users. Better to just provide a (very simple) API to
hook into the existing (complex) traffic control API and leave the
tricky stuff to userspace, where anyone can construct arbitrarily
complex queueing schemes with a shell script and a few calls to "tc".

Paul
--

To: Paul Menage <menage@...>
Cc: Naveen Gupta <ngupta@...>, LKML <linux-kernel@...>, David Miller <davem@...>
Date: Wednesday, January 23, 2008 - 1:48 pm

OK, thanks for the clarifications.

-Andrea
--

Previous thread: [PATCH] x86_64: remove duplicated line about x86_bios_cpu_apicid_early_ptr by Yinghai Lu on Wednesday, January 23, 2008 - 4:56 am. (5 messages)

Next thread: Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch] by Abhishek Rai on Wednesday, January 23, 2008 - 5:12 am. (3 messages)