> On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 1 Jul 2008 15:11:26 -0400
>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> While development is going on for cgroup and various controllers, we also
>>> need a facility so that an admin/user can specify the group creation and
>>> also specify the rules based on which tasks should be placed in respective
>>> groups. Group creation part will be handled by libcg which is already
>>> under development. We still need to tackle the issue of how to specify
>>> the rules and how these rules are enforced (rules engine).
>>>
>>> I have gathered few views, with regards to how rule engine can possibly be
>>> implemented, I am listing these down.
>>>
>>> Proposal 1
>>> ==========
>>> Let user space daemon hanle all that. Daemon will open a netlink socket
>>> and receive the notifications for various kernel events. Daemon will
>>> also parse appropriate admin specified rules config file and place the
>>> processes in right cgroup based on rules as and when events happen.
>>>
>>> I have written a prototype user space program which does that. Program
>>> can be found here. Currently it is in very crude shape.
>>>
>>>
http://people.redhat.com/vgoyal/misc/rules-engine-daemon/user-id-based-namespaces.patc...
>>>
>>> Various people have raised two main issues with this approach.
>>>
>>> - netlink is not a reliable protocol.
>>> - Messages can be dropped and one can loose message. That means a
>>> newly forked process might never go into right group as meant.
>>>
>>> - How to handle delays in rule exectuion?
>>> - For example, if an "exec" happens and by the time process is moved to
>>> right group, it might have forked off few more processes or might
>>> have done quite some amount of memory allocation which will be
>>> charged to the wring group. Or, newly exec process might get
>>> killed in existing cgroup because of lack of memory (despite the
>>> fact that destination cgroup has sufficient memory).
>>>
>> Hmm, can't we rework the process event connector to use some reliable
>> fast interface besides netlink ? (I mean an interface like eventpoll.)
>> (Or enhance netlink ? ;)
>
> I see following text in netlink man page.
>
> "However, reliable transmissions from kernel to user are impossible in
> any case. The kernel can’t send a netlink message if the socket buffer
> is full: the message will be dropped and the kernel and the userspace
> process will no longer have the same view of kernel state. It is up to
> the application to detect when this happens (via the ENOBUFS error
> returned by recvmsg(2)) and resynchronize."
>
> So at the end of the day, it looks like unreliability comes from the
> fact that we can not allocate memory currently so we will discard the
> packet.
>
> Are there alternatives as compared to dropping packets?
>
> - Let sender cache the packet and retry later. So maybe netlink layer
> can return error if packet can not be queued and connector can cache the
> event and try sending it later. (Hopefully later memory situation became
> better because of OOM or some process exited or something else...).
>
> This looks like a band-aid to handle the temporary congestion kind of
> problems. Will not be able to help if consumer is inherently slow and
> event generation is faster.
>
> This probably can be one possible enhancement to connector, but at the end
> of the day, any kind of user space daemon will have to accept the fact
> that packets can be dropped, leading to lost events. Detect that situation
> (using ENOBUFS) and then let admin know about it (logging). I am not sure
> what admin is supposed to do after that.
>
> I am CCing Thomas Graf. He might have a better idea of netlink limitations
> and is there a way to overcome these.
>