Allow to limit the network bandwidth for specific process containers (cgroups)
imposing additional delays in the sockets' sendmsg()/recvmsg() calls made by
those processes that exceed the limits defined in the control group filesystem.Example:
# mkdir /dev/cgroup
# mount -t cgroup -onet net /dev/cgroup
# cd /dev/cgroup
# mkdir foo
--> the cgroup foo has been created
# /bin/echo $$ > foo/tasks
# /bin/echo 1024 > foo/net.tcp
# /bin/echo 2048 > foo/net.tot
# sh
--> the subshell 'sh' is running in cgroup "foo" that has a maximum network
bandwidth for TCP traffic of 1MB/s and 2MB/s for total network
activities.The netlimit approach can be easily extended to support additional network
protocols or different socket families or types (PF_UNIX, PF_BLUETOOTH,
SOCK_SEQPACKET, etc.).Signed-off-by: Andrea Righi <a.righi@cineca.it>
---diff -urpN linux-2.6.24-rc8/include/linux/cgroup_netlimit.h linux-2.6.24-rc8-cgroup-netlimit/include/linux/cgroup_netlimit.h
--- linux-2.6.24-rc8/include/linux/cgroup_netlimit.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.24-rc8-cgroup-netlimit/include/linux/cgroup_netlimit.h 2008-01-22 21:36:15.000000000 +0100
@@ -0,0 +1,29 @@
+#ifndef CGROUP_NETLIMIT_H
+#define CGROUP_NETLIMIT_H
+
+enum {
+ CGROUP_NETLIMIT_TOT,
+ CGROUP_NETLIMIT_TCP,
+ CGROUP_NETLIMIT_UDP,
+ CGROUP_NETLIMIT_RAW,
+ /* This sets the size of the different netlimit types */
+ CGROUP_NETLIMIT_END,
+};
+
+#define CGROUP_NETLIMIT_FILE(_x, _y) \
+ { \
+ .name = _x, \
+ .read = netlimit_read, \
+ .write_uint = netlimit_write_uint, \
+ .private = _y, \
+ }
+
+#ifdef CONFIG_CGROUP_NETLIMIT
+extern void cgroup_nl_acct(int limit_id, size_t bytes);
+extern void cgroup_nl_throttle(int limit_id, int interruptible);
+#else
+static inline void cgroup_nl_acct(int limit_id, size_t bytes) { }
+static inline void cgroup_nl_throttle(int limit_id, int interruptible) { }
+#endif /* CONFIG_CGROUP_NETLIMIT */
+
+#endif
diff -ur...
An approach that we've been experimenting with at Google is much simpler:
- add a "network class id" subsystem, that lets you associated an id
with each cgroup- propagate this id to sockets created by that cgroup, and from there
to packets sent/received on that socket- add a new traffic filter that can select based on a packet's cgroup class id
This is a very small amount of kernel code, but it then lets userspace
set up whatever queues/filters/classes it wants using the standard
Linux traffic API, rather than creating a new traffic control API
that's much more limited. So you can easily do things like controlling
guarantees and limits, have different behaviour for local and remote
packets, have packet/byte accounting for different flow classes,
filter on ToS bits in order to let the cgroup prioritize its own
traffic, etc.We also have plans (have had for months in fact, but haven't had time
for it yet) to let the cgroup network id be selected on in iptables
rules, and possibly add a new iptable for events such as listen(),
bind(), and connect(), to allow very easy control over what network
connections a cgroup can access. This would let you use the full power
of the existing packet/connection matching available in the standard
iptables rules without having to add a new complex (but still limited)
API.Paul
--
Hi, Andrea,
I took a quick look at the patches and it looks like we throttle
network (by forcing a schedule_timeout()), if we exceed our bandwidth
limit. That is one way of doing it, but it has some disadvantages, it
does not scale to1. Implementation of soft limits (limit on contention of resource)
gets harder
2. Why dont use the existing infrastructure for bandwidth limitation
for implementing the network controller?--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
Why? do you mean implementing a grace time when the soft-limit is
exceeded? this could be done in cgroup_nl_throttle() introducing 3
additional attributes to struct netlimit (i.e. hard_limit,
last_time_exceeded grace_time) and perform something like:
...
if ((current_rate > hard_limit) ||
time_after(jiffies, last_time_exceeded + grace_time))
schedule_timeout(sleep);Yes, the integration with iptables (as Paul said), and traffic shaping
rules would be absolutely the right way(tm) in perspective. I was just
proposing a possible simple API to implement the limiting stuff.-Andrea
--
He's talking about cases where we want the behaviour to be
work-conserving, whilst still offering guarantees in the event of
contention. e.g. cgroups A and B each get a 20% guarantee on the TX
path if they need it, but anyone can use any otherwise-idle bandwidth.
(This is relatively straightforward to set up from userspace with theBut this issue (traffic control for cgroups) is too complex to be
described by a simple API. Any simple API you choose to try to
describe the limiting directly will be insufficient for a good number
of the potential users. Better to just provide a (very simple) API to
hook into the existing (complex) traffic control API and leave the
tricky stuff to userspace, where anyone can construct arbitrarily
complex queueing schemes with a shell script and a few calls to "tc".Paul
--
OK, thanks for the clarifications.
-Andrea
--
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
