Incorporated fixes suggested by Li Zefan. Please consider for net-next-2.6. This patch provides a simple resource controller (cgroup_tc) based on the cgroups infrastructure to manage network traffic. The cgroup_tc resource controller can be used to schedule and shape traffic belonging to the task(s) in a particular cgroup. The implementation consists of two parts: 1) A resource controller (cgroup_tc) that is used to associate packets from a particular task belonging to a cgroup with a traffic control class id ( tc_classid). This tc_classid is propagated to all sockets created by tasks in the cgroup and will be used for classifying packets at the link layer. 2) A new traffic control classifier (cls_cgroup) that can classify packets based on the tc_classid field in the socket to specific destination classes. An example of the use of this resource controller would be to limit the traffic from all tasks from a file_server cgroup to 100Mbps. We could achieve this by doing: # make a cgroup of file transfer processes and assign it a arbitrary unique # classid of 0x1234 - this will be used later to direct packets. mkdir -p /dev/cgroup mount -t cgroup tc -otc /dev/cgroup mkdir /dev/cgroup/file_transfer echo 0x1234 > /dev/cgroup/file_transfer/tc.classid echo $PID_OF_FILE_XFER_PROCESS > /dev/cgroup/file_transfer/tasks # Now create a HTB class that rate limits traffic to 100mbits and attach # a filter to direct all traffic from cgroup file_transfer to this new class. tc qdisc add dev eth0 root handle 1: htb tc class add dev eth0 parent 1: classid 1:10 htb rate 100mbit ceil 100mbit tc filter add dev eth0 parent 1: handle 800 protocol ip prio 1 cgroup value 0x1234 classid 1:10 Signed-off-by: Ranjit Manomohan <ranjitm@google.com> --- --
I definitely prefer Thomas Graf's work, this stuff is very ugly and way overengineered. So no, I won't consider for net-next-2.6, sorry. --
Could you be more specific? Thomas' work is almost identical to this (except that he does not store the cgroup id into the socket which is a trivial change which has downsides which I have pointed out). Additionally this approach has only minor modifications to the core networking stack. What portions do you consider ugly and over engineered and what alternative implementations would you prefer? Please see the follow up I have sent to Thomas' proposal about why we need this design approach to handle the inbound case. I'd be ok if you accepted either change since we just want a standard kernel mechanism to do this. -Thanks, --
WRT the inbound case, after some experiments I decided to dismiss the ingress case at all and stick to something as simple as possible for egress. The reason for this is that it is a very expensive operation to associate a packet with a task on classifier level. Taking this cost, it does not add up with the very limited capabilities of ingress shaping. Ingress shaping is best effort at best. It works fairly well with a very limited number of bulk data streams but usualy fails miserably in common congestion situations where a cgroup classifier Agreed. I think your approach is very reasonable but considering the reasons I've given above and in the other thread I found it could be done in a more simple and direct way. --
Could you elaborate on the failure cases? We have found this to be useful in practice to prevent applications from reading large amounts of data off the network so it would be nice if it were supported. -Thanks, --
It works fairly well for a small number of bulk streams when no packets need to be dropped. The results get very inaccurate for higher number of smaller streams when packets start getting dropped. The problem is simply that currently none of the congestion notification mechanism work in (internet) practice. Therefore I think that it is not worth the effort. If you think diffrently, I will be more than glad to review code. So far I haven't seen anything that would work on ingress. --
