Re: cls_cgroup: Store classid in struct sock

Previous thread: [PATCH 08/29] pcmcia: dev_node removal (drivers with updated printk call) by Dominik Brodowski on Tuesday, May 18, 2010 - 11:42 pm. (1 message)

Next thread: [PATCH] net/mpc52xx_phy: Various code cleanups by Wolfram Sang on Wednesday, May 19, 2010 - 3:21 am. (2 messages)
From: Herbert Xu
Date: Wednesday, May 19, 2010 - 12:57 am

Hi:

tun: Use netif_receive_skb instead of netif_rx

First a bit of history as I recall, Dave can correct me where
he recalls differently :)

1) There was netif_rx and everyone had to use that.
2) Everyone had to use that, including drivers/net/tun.c.
3) NAPI brings us netif_receive_skb.
4) About the same time people noticed that tun.c can cause wild
   fluctuations in latency because of its use of netif_rx with IRQs
   enabled.
5) netif_rx_ni was added to address this.

However, netif_rx_ni was really a bit of a roundabout way of
injecting a packet if you think about it.  What ends up happening
is that we always queue the packet into the backlog, and then
immediately process it.  Which is what would happen if we simply
called netif_receive_skb directly.

So this patch just does the obvious thing and makes tun.c call
netif_receive_skb, albeit through the netif_receive_skb_ni wrapper
which does the necessary things for calling it in process context.

Now apart from potential performance gains from eliminating
unnecessary steps in the process, this has the benefit of keeping
the process context for the packet processing.  This is needed
by cgroups to shape network traffic based on the original process.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..0eed49f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -667,7 +667,7 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
-	netif_rx_ni(skb);
+	netif_receive_skb_ni(skb);
 
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa8b476..34bb405 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1562,6 +1562,18 @@ extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
 extern ...
From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 1:09 am

6) netif_rx() pro is that packet processing is done while stack usage is
guaranteed to be low (from process_backlog, using a special softirq
stack, instead of current stack)

After your patch, tun will use more stack. Is it safe on all contexts ?

Another concern I have is about RPS.

netif_receive_skb() must be called from process_backlog() context, or
there is no guarantee the IPI will be sent if this skb is enqueued for


--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 1:18 am

Hmm, I just checked again, and this is wrong.

In case we enqueue skb on a remote cpu backlog, we also
do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
later.



--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 1:21 am

OK your concern is only with the stack usage, right?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Neil Horman
Date: Wednesday, May 19, 2010 - 5:05 am

But if this happens, then we loose the connection between the packet being
received and the process doing the reception, so the network cgroup classifier
breaks again.

Performance gains are still a big advantage here of course.
Neil

--

From: Neil Horman
Date: Wednesday, May 19, 2010 - 5:55 am

Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
no rps map.

I'll test this patch out in just a bit
--

From: Neil Horman
Date: Wednesday, May 19, 2010 - 11:00 am

I'm currently testing this, unfortunately, and its not breaking anything, but it
doesn't allow cgroups to classify frames comming from tun interfaces.  I'm still
investigating, but I think the issue is that, because we call local_bh_disable
with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count
for the task.  Since the cgroup classifier has this check:

if (softirq_count() != SOFTIRQ_OFFSET))
	return -1;

We still fail to classify the frame.  the cgroup classifier is assuming that any
frame arriving with a softirq count of 1 means we came directly from the
dev_queue_xmit routine and is safe to check current().  Any less than that, and
something is wrong (as we at least need the local_bh_disable in dev_queue_xmit),
and any more implies that we have nested calls to local_bh_disable, meaning
we're really handling a softirq context.

--

From: Neil Horman
Date: Wednesday, May 19, 2010 - 1:24 pm

Just out of curiosity, how unsavory would it be if we were to dedicate the upper
bit in the SOFTIRQ_BITS range to be an indicator of weather we were actually
executing softirqs?  As noted above, we're tripping over the ambiguity here
between running in softirq context and actually just having softirqs disabled.
Would it be against anyones sensibilities if we were to dedicate the upper bit
in the softirq_count range to disambiguate the two conitions (or use a separate
flag for that matter)?

Neil

--

From: Thomas Graf
Date: Wednesday, May 19, 2010 - 1:49 pm

It is a hack but the only method to check for softirq context I found. I
would favor using a flag if there was one.

--

From: Brian Bloniarz
Date: Wednesday, May 19, 2010 - 2:00 pm

Eric probably has some thoughts on this -- his scheduler-batching patch RFC
from last year needed the same bit of info:
http://patchwork.ozlabs.org/patch/24536/
(see the changes to trace_softirq_context).
--

From: David Miller
Date: Wednesday, May 19, 2010 - 7:55 pm

Since it seems now inevitable to me that we'll need to store the
'classid' in struct sk_buff, I've prepared two patches to do that at
zero cost.

Someone please go with this.
--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 7:57 pm

Actually I'm currently working on a patch to store this in struct
sock.  The reason I prefer struct sock is because we can then
centralise the place where we set the classid rather than having
it spread out through every place that creates an skb.

I'll post it soon.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Wednesday, May 19, 2010 - 8:05 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Ok, looking forward to it.
--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 8:34 pm

Here we go:

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Now you may argue that this may not be the same as the thread that
is sending the packet.  However, we already have a precedence where
an fd is passed to a different thread, its security property is
inherited.  In this case, inheriting the classid of the thread
creating the socket is also the logical thing to do.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ...
From: Herbert Xu
Date: Wednesday, May 19, 2010 - 8:42 pm

Oh please add this bit that I forgot about:

This also addresses the case where TCP transmits a queued packet
in response to an inbound ACK.  Currently this isn't classified
at all as we would be in softirq context.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Wednesday, May 19, 2010 - 8:46 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

This looks great to me.
--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 9:54 pm

It means we cache current classid of this process.


It would be more readable to use :

static inline int task_cls_state(struct task_struct *p)
{
	int classid;

	rcu_read_lock();
	classid = container_of(task_subsys_state(p, net_cls_subsys_id),
				struct cgroup_cls_state, css)->classid;
	rcu_read_unlock();
	return classid;
}

...




--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 10:01 pm

Sure, however, I'm just moving the existing code around, so I
wanted it to be exactly the same.  Clean-ups should be done as

You didn't read my patchset description at all, did you? :)

This patchset is trying to fix the issue at hand with the minimum
amount of changes to current behaviour.  Once we're happy with the
new semantics, we can get rid of the softirq stuff.

That will be done in a later patch to minimise the risks.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 10:15 pm

I find this very biased, sorry.

In fact, fd passing is just fine today, if we consider that we classify
packets using the identity of the process *using* the fd, not the one
that *created* it.

Now your patch changes this, to the reverse, and you justify the caching
effect on socket. Sorry, this must be too convoluted for me.




--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 10:20 pm

I'm sorry you find this convoluted, but using the sending process's
classid is inherently broken.

Here is why: consider a TCP socket shared by two processes with
different classids both writing data to it.  Now suppose further
that each writes just one byte, which is then coalesced into a
single skb.

Whose classid should we use on the resulting skb?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 10:36 pm

I am ok with any kind of clarification, if its really documented as
such, not as indirect effects of changes.

Right now, I am not sure classification is performed by the current
process/cpu. Our queue handling can process packets queued by other cpus
while we own the queue (__QDISC_STATE_RUNNING,) anyway.



--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 10:46 pm

But I did document this semantic change, quite verbosely too

Classification should be done when the packet is enqueued.  So
you shouldn't really see other people's traffic.

The original requeueing would have broken this too, but that is
no longer with us :)

In my original submission I forgot to mention the case of TCP
transmission triggered by an incoming ACK, which is really the
same case as this.

So let me take this opportunity to resubmit with an updated
description:

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Now you may argue that this may not be the same as the thread that
is sending the packet.  However, we already have a precedence where
an fd is passed to a different thread, its security property is
inherited.  In this case, inheriting the classid of the thread
creating the socket is also the logical thing to do.

For sockets created on ...
From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 11:03 pm

I still dont understand why you introduce this function and not reuse it
in net/sched/cls_cgroup.c

You told me it needs a separate patch, I dont think so.



--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 11:11 pm

Because it doesn't have to go through this RCU crap.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 11:19 pm

I'm talking about the rcu_dereference on net_cls_subsys_id of
course, not the rest of it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 11:52 pm

OK Dave has convinced me that inheriting the cgroup of the process
creating a socket is not always the right thing to do.  So here's
a new patch that sets the socket classid to the last process with
a valid classid touching it.

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole ...
From: Thomas Graf
Date: Thursday, May 20, 2010 - 1:10 am

There is a fundamental problem with this. The process needs to be
associated with the cgroup before any sockets get created. Sockets
are often created right after the application starts. This means that
the only viable option is to start each application in a wrapper which
assigns itself to the cgroup and then forks the application as its
child. If a task is associated with a cgroup after it has started it
may lead to unpredictable outcome because only some of the sockets
may end up being classified.

This was the actual reason for the old method.

--

From: Thomas Graf
Date: Thursday, May 20, 2010 - 2:40 am

Disregard this. I didn't read your latest patch correctly.

--

From: David Miller
Date: Sunday, May 23, 2010 - 11:44 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Applied.
--

From: Neil Horman
Date: Thursday, May 20, 2010 - 10:29 am

So, I'm testing this patch out now, and unfotunately it doesn't seem to be
working.  Every frame seems to be holding a classid of 0.  Trying to figure out
why now.

Neil

--

From: Herbert Xu
Date: Thursday, May 20, 2010 - 4:16 pm

Not very surprising since tun.c doesn't go through the normal
socket interface.  I'll send a additional patch for that.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Neil Horman
Date: Thursday, May 20, 2010 - 5:39 pm

I don't think thats it.  I think its a chicken and egg situation.  I think the
problem is that tasks can't be assigned to cgroups until their created, and in
that time a sock can be created.  Its a natural race.  If you create a socket
before you assign it to a cgroup, that socket retains a classid of zero.  I'm
going to try modify the patch to update sockets owned by tasks when the cgroup
is assigned.

Best
--

From: Herbert Xu
Date: Thursday, May 20, 2010 - 6:02 pm

That's what I meant above.  My patch will make tun.c to the
classid update every time it sends out a packet.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Thursday, May 20, 2010 - 6:16 pm

Here it is:

tun: Update classid on packet injection

This patch makes tun update its socket classid every time we
inject a packet into the network stack.  This is so that any
updates made by the admin to the process writing packets to
tun is effected.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..a8a9aa8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -525,6 +525,8 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
 	struct sk_buff *skb;
 	int err;
 
+	sock_update_classid(sk);
+
 	/* Under a page?  Don't bother with paged skb. */
 	if (prepad + len < PAGE_SIZE || !linear)
 		linear = len;
diff --git a/net/core/sock.c b/net/core/sock.c
index 8f7fdf8..4969bd1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1055,6 +1055,7 @@ void sock_update_classid(struct sock *sk)
 	if (classid && classid != sk->sk_classid)
 		sk->classid = classid;
 }
+EXPORT_SYMBOL(sock_update_classid);
 #endif
 
 /**

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Sunday, May 23, 2010 - 11:44 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Applied.
--

From: David Miller
Date: Thursday, May 20, 2010 - 10:49 pm

From: Neil Horman <nhorman@tuxdriver.com>

Neil, you must not be using Herbert's most recent patch.

Either that or you haven't even read it.

Herbert's most recent patch doesn't create this chicken and egg
problem you mention because it explicitly watches for cgroupid changes
at all socket I/O operations including sendmsg() and sendmsg().  And
if it sees a different cgroupid at a socket I/O call, it updates the
cgroupid value in the socket.

So you very much can change the cgroup of the process mid-socket
ownership and it will work.

The only problem is, as Herbert stated, tun.  Because it does it's
networking I/O directly by calling netif_receive_skb() so it won't
hit any of Herbert's cgroup check points.
--

From: Neil Horman
Date: Friday, May 21, 2010 - 3:51 am

I'm not, I was testing the version prior. I wrote my note before he posted the
I read it, we just described the issue in diferent terms.

--

From: Herbert Xu
Date: Friday, May 21, 2010 - 4:08 am

Yeah I wasn't clear enough with my email without an actual patch :)

Thanks for testing Neil!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Neil Horman
Date: Friday, May 21, 2010 - 5:59 am

Thanks for the patch, I'll let you know how it works out in just a bit here!
--

From: Neil Horman
Date: Friday, May 21, 2010 - 9:40 am

So a bit of bad news.  I think you're patch is correct herbert but its still not
working.  When I initially started messing with this it was on 2.6.32.  But
since you're patches I (naturally) started testing them on 2.6.34.  Whereas in
2.6.32 the tasks classid value for the net_cls cgroup subsys was set properly,
in 2.6.34 its not.  It appears since we started using the cgroup subsys
registration calls, we never registered an attach method, so we never set the
tasks classid, so the comparison always fails.

There may also be an issue with the setting of the classid (possible use of the
wrong subsys id value when grabbing our cgroup_subsys_state), but I'm checking
on that now.


Best
--

From: Herbert Xu
Date: Friday, May 21, 2010 - 6:49 pm

Actually, I think it's because my patch mistook CONFIG_CGROUP
for CONFIG_CGROUPS.

Here is a fixed version (also includes a build fix on sk->classid).

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If ...
From: Neil Horman
Date: Saturday, May 22, 2010 - 5:26 am

Thats definately part of it yes.  But its not all of it, although I think the
rest doesn't have anything to do with your patch.  I fixed that straight away,
but I was still alwasy getting classids of zero in the qemu processes.  Looking
at the cgroup classifier, I'm wondering how this cgroup code ever worked in the
first place.  When we register the cgroup subsystem, we don't register an attach
method, so we never get a chance to assign task_cls_sate(tsk)->classid to any
non-zero value.  I've got a version of the patch that add that, but for some
reason, even though I set task_cls_state(tsk)->classid to the cgroup classid on
attach, task_cls_classid still returns zero.  I'm not sure why that is yet, but
I'm looking into it.

--

From: Herbert Xu
Date: Sunday, May 23, 2010 - 10:42 pm

No I don't think you need an attach method.

The task_struct has a pointer to the cgroups, which in turn has
a pointer to the cgroup_subsys_state that we allocated in the
cls_cgroup module.

Did you try building cls_cgroup into the kernel?

Perhaps there is a bug in how we register it at run-time, or
perhaps the cgroups infrastructure itself is buggy.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Sunday, May 23, 2010 - 11:44 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

BTW I made sure to apply this version.
--

From: David Miller
Date: Sunday, May 23, 2010 - 11:55 pm

From: David Miller <davem@davemloft.net>

Actually I had to revert, doesn't build:

  CC [M]  drivers/net/pppoe.o
In file included from net/socket.c:97:
include/net/cls_cgroup.h:22: error: field ‘css’ has incomplete type
make[1]: *** [net/socket.o] Error 1
make: *** [net] Error 2
make: *** Waiting for unfinished jobs....

Probably you only tested the build with cgroups enabled?
From: Herbert Xu
Date: Monday, May 24, 2010 - 12:07 am

Indeed, that does seem to be the problem here.  It turns out
that their struct is only declared when CONFIG_CGROUPS is defined,
how annoying.

Oh well I guess I'll follow their example :)

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error ...
From: David Miller
Date: Monday, May 24, 2010 - 12:14 am

From: Herbert Xu <herbert@gondor.apana.org.au>

Thanks for fixing this up, applied.
--

From: Neil Horman
Date: Monday, May 24, 2010 - 4:09 am

Excuse all my noise from before, Herberts patch works fine.  Apparently the
problem was mine.  QEMU creates several threads during startup, but adding the
processid to a cgroup doesn't cause its child threads to get added
automatically.  Doing this by hand causes this to work.

Thanks
Neil

--

From: Herbert Xu
Date: Monday, May 24, 2010 - 4:24 am

Thanks a lot for testing Neil!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 7:10 am

RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
by default, so if cgroup classifier is needed, it should work as is.

Speaking of net/sched/cls_cgroup.c, I am contemplating following
sequence :

rcu_read_lock();
classid = task_cls_state(current)->classid;
rcu_read_unlock();

RCU is definitly so special (should I say magic ?), that we use it even
if not needed. It makes us happy...




--

From: Neil Horman
Date: Wednesday, May 19, 2010 - 7:31 am

Yeah, I'd noticed there was no write side to that, but hadn't quite gotten to
--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 1:20 am

Dave also raised this but I believe nothing changes with regards
to the stack.  We currently call do_softirq which does not switch
stacks.


Can you explain this a bit more?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Eric Dumazet
Date: Wednesday, May 19, 2010 - 1:27 am

This is a bit wrong, at least here (CONFIG_4KSTACKS=y)

Some people still use 32bits these days ;)

Please check arch/x86/kernel/irq_32.c

asmlinkage void do_softirq(void)
{
        unsigned long flags;
        struct thread_info *curctx;
        union irq_ctx *irqctx;
        u32 *isp;

        if (in_interrupt())
                return;

        local_irq_save(flags);

        if (local_softirq_pending()) {
                curctx = current_thread_info();
                irqctx = __get_cpu_var(softirq_ctx);
                irqctx->tinfo.task = curctx->task;
                irqctx->tinfo.previous_esp = current_stack_pointer;

                /* build the stack frame on the softirq stack */
                isp = (u32 *) ((char *)irqctx + sizeof(*irqctx));

                call_on_stack(__do_softirq, isp);
                /*
                 * Shouldnt happen, we returned above if in_interrupt():
                 */
                WARN_ON_ONCE(softirq_count());
        }

        local_irq_restore(flags);
}



--

From: Herbert Xu
Date: Wednesday, May 19, 2010 - 1:44 am

Right, I was looking at the generic version.

Still, as I'm only changing tun.c where we know the call comes
from a syscall, I don't think the stack is a great issue.

Besides, for TX we already perform everything from the same stack
depth and the TX path isn't that much shallower compared to the
RX path.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Wednesday, May 19, 2010 - 1:14 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

do_softirq() _does_ switch stacks, it's a per-arch function that
does the stack switch and calls __do_softirq() on the softirq
stack.
--

Previous thread: [PATCH 08/29] pcmcia: dev_node removal (drivers with updated printk call) by Dominik Brodowski on Tuesday, May 18, 2010 - 11:42 pm. (1 message)

Next thread: [PATCH] net/mpc52xx_phy: Various code cleanups by Wolfram Sang on Wednesday, May 19, 2010 - 3:21 am. (2 messages)