Re: [RFC] cgroups: implement device whitelist lsm (v2)

Previous thread: please fix 2.6.24.3 unifdef-y += if_addrlabel.h problem by Adam Turk on Wednesday, March 12, 2008 - 8:17 pm. (2 messages)

Next thread: none
From: Serge E. Hallyn
Date: Wednesday, March 12, 2008 - 8:27 pm

Implement a cgroup using the LSM interface to enforce mknod and open
on device files.

This implements a simple device access whitelist.  A whitelist entry
has 4 fields.  'type' is a (all), c (char), or b (block).  'all' means it
applies to all types, all major numbers, and all minor numbers.  Major and
minor are obvious.  Access is a composition of r (read), w (write), and
m (mknod).

The root devcgroup starts with rwm to 'all'.  A child devcg gets a copy
of the parent.  Admins can then add and remove devices to the whitelist.
Once CAP_HOST_ADMIN is introduced it will be needed to add entries as
well or remove entries from another cgroup, though just CAP_SYS_ADMIN
will suffice to remove entries for your own group.

An entry is added by doing "echo <type> <maj> <min> <access>" > devcg.allow,
for instance:

	echo b 7 0 mrw > /cgroups/1/devcg.allow

An entry is removed by doing likewise into devcg.deny.  Since this is a
pure whitelist, not acls, you can only remove entries which exist in the
whitelist.  You must explicitly

	echo a 0 0 mrw > /cgroups/1/devcg.deny

to remove the "allow all" entry which is automatically inherited from
the root cgroup.

While composing this with the ns_cgroup may seem logical, it is not
the right thing to do, because updates to /cg/cg1/devcg.deny are
not reflected in /cg/cg1/cg2/devcg.allow.

A task may only be moved to another devcgroup if it is moving to
a direct descendent of its current devcgroup.

CAP_NS_OVERRIDE is defined as the capability needed to cross namespaces.
A task needs both CAP_NS_OVERRIDE and CAP_SYS_ADMIN to create a new
devcgroup, update a devcgroup's access, or move a task to a new
devcgroup.

CONFIG_COMMONCAP is defined whenever security/commoncap.c should
be compiled, so that the decision of whether to show the option
for FILE_CAPABILITIES can be a bit cleaner.

Pavel, do you have any experience with what sorts of rules a typical
customer would get?  I wasn't sure whether it was worth having some
sort of hash ...
From: James Morris
Date: Thursday, March 13, 2008 - 2:25 am

For lower overall complexity, why not just extend the capability LSM to 
include the devcgroup_ perms if CONFIG_CGROUP_DEV ?



- James
-- 
James Morris
<jmorris@namei.org>
--

From: Serge E. Hallyn
Date: Thursday, March 13, 2008 - 6:18 am

That does make for a simpler implementation at this point, however if
any other such LSMs come along (as Casey seemed to think they would) the
end result could end up being horrible spaghetti code of dependencies
and interrelated configs.

But OTOH we went years with no such changes, so that's probably not a
particularly practical concern unless someone can cite specific upcoming
examples.  So if noone objects I'll try that approach.

thanks,
-serge
--

From: James Morris
Date: Thursday, March 13, 2008 - 6:50 am

That can be addressed as the need arises.  A basic tenet of kernel 

-- 
James Morris
<jmorris@namei.org>
--

From: Serge E. Hallyn
Date: Thursday, March 13, 2008 - 7:38 am

True, but while this change simplifies the code a bit, the semantics
seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
and:

	SECURITY=n or
	rootplug is enabled
	capabilities is enabled
	smack is enabled
	selinux+capabilities is enabled

It will not be enforcing when
	dummy is loaded
	only selinux (and not capabilities) is loaded

If that's ok with people then I'm fine with it.  I suppose it should be
explained in the CONFIG_CGROUP_DEV help section, which it isn't in this
version I'm about to set.  Patch hitting the wire in a minute.

thanks,
--

From: James Morris
Date: Thursday, March 13, 2008 - 3:27 pm

Well, this is how real systems are going to be deployed.

It becomes confusing, IMHO, if you have to change which secondary LSM you 
stack with SELinux to enable a cgroup feature.

-- 
James Morris
<jmorris@namei.org>
--

From: Serge E. Hallyn
Date: Thursday, March 13, 2008 - 3:46 pm

So you're saying selinux without capabilities should still be able to
use dev_cgroup?  (Just making sure I understand right)

-serge
--

From: James Morris
Date: Thursday, March 13, 2008 - 4:49 pm

Yes.

All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities.  I can't 

Nope, SELinux always stacks with capabilities, so havng the cgroup hooks 
in capabilities makes sense (rather than having us change the secondary 
stacking LSM just to enable a feature).





-- 
James Morris
<jmorris@namei.org>
--

From: Serge E. Hallyn
Date: Thursday, March 13, 2008 - 6:41 pm

Oh, ok.

Will let the patch stand until Pavel and Greg comment then.

thanks,
-serge
--

From: Greg KH
Date: Thursday, March 13, 2008 - 9:47 pm

My main question was why was that file in the kernel/ directory?
Shouldn't that also be in the security/ directory?

And to be honest, I didn't really look at it at all other than the
diffstat to make sure you weren't messing with the kobj_map stuff
anymore :)

thanks,

greg k-h
--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 6:54 am

I'm using cgroups to track the tasks which should have their device
--

From: Pavel Emelyanov
Date: Friday, March 14, 2008 - 6:58 am

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 8:45 am

Ah.

Guess it could all go under security/.  Should it still go there even if
--

From: Pavel Emelyanov
Date: Friday, March 14, 2008 - 8:54 am

Sure it can - security/ is in obj-y regardless of whether the 

--

From: Stephen Smalley
Date: Friday, March 14, 2008 - 9:57 am

There is the precedent of the security/keys directory (security-related,
but not using LSM - aside from calling LSM hooks for access checks and
labeling of keys).

-- 
Stephen Smalley
National Security Agency

--

From: Pavel Emelyanov
Date: Friday, March 14, 2008 - 2:28 am

Well, I saw your previous patch, that was implemented as just another
LSM module and I liked it except for the LSM dependency.


--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 6:58 am

James and Stephen agree with your LSM qualms.  I suppose we could add
cgroups next to the lsm hooks.  I suspect Paul Menage would complain
about that (Paul?), and I do think it's silly as they are security
questions, not group tracking questions, but if it's what people want

In an earlier version I asked whether you had any experience with usual
# rules per container.  Do you have an idea?  Right now the whitelist is
a straight list we search through linearly.  If # rules is generally
tiny then I'm inclined to keep it that way...

thanks,
-serge
--

From: Paul Menage
Date: Friday, March 14, 2008 - 7:12 am

Depends on what you mean by "add cgroups to the LSM hooks". Could you
expand on that?

Paul
--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 7:42 am

cgroup hooks next to the lsm hooks.  So in fs/namei.c where there are
security_inode_permission() hooks, there would also be
cgroup_inode_permission() hooks to let the devices cgroup mediate the
access.  Well, in permission(), probably not in exec_permission_lite()
since that's probalby not a device access  :)

So far it looks like everyone likes that, so as long as you don't nack I
guess that'll be the way to go.

thanks,
-serge
--

From: Paul Menage
Date: Saturday, March 15, 2008 - 5:59 pm

This would just be a device cgroup-specific thing, right? Nothing to
do with the generic framework? If so, then that sounds fine (to me, at
least).

Paul
--

From: Pavel Emelyanov
Date: Friday, March 14, 2008 - 7:05 am

The way I see this is: cgroups provide a common way to group tasks
and an API for general configuration - that's the controller "face", 
and it's up to the controller to decide where he turns his "back",
IOW where the hooks are placed. For the memory controller - they are
injected directly into the mm code. For this controller, I think it

The # of rules usually has a linear dependency on the number of containers
(each of then has to have an access to /dev/null,zero,random at least), so
having 100 containers we will have to scan through a 300-entries list. I'd
vote for a hash table or a radix/binary/rb tree for that. Or any other way

--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 7:37 am

Oh no, the rules are stored per-container, so it sounds like you're

I'm fine with that, but not for 3 rules  :)

-serge
--

From: Pavel Emelyanov
Date: Friday, March 14, 2008 - 8:07 am

So am I :) Anyway - if someday this will grow up to tens of entries turning

--

From: Casey Schaufler
Date: Thursday, March 13, 2008 - 7:51 pm

That's what I was getting at. When the next feature comes along
are we going to stuff it into capabilities, too? Maybe we'll
cram it into audit or CIPSO instead, but how long can this go on?
Eventually we need a mechanism that allows more or less general
mix-and-match, maybe with a few rules like "don't mix plaids and
stripes" to keep things sane or these lesser facilities have no
chance. Seems like we're still making LSM too hard to use.

Unless I take an aggressive approach to adding them to Smack.
Hmm, might be a way to make a buck or two that way.
(Only kidding)(Well, mostly only kidding)

And yes, I understand that there are still those about who
don't like LSM in any form, much less in a useful one.


Casey Schaufler
casey@schaufler-ca.com
--

From: Paul Menage
Date: Friday, March 14, 2008 - 2:16 am

Maybe you should follow up the tree to ensure that all parent groups
have access to the device too? Or alternatively, cache the results of


But this isn't necessarily crossing namespaces. It could be used for
device control in the same namespace (e.g. allowing a job to access a
raw disk for its data storage rather than going through the
filesystem).

Paul
--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 7:05 am

Yes, I considered that.  Alternatively additions to a parent cgroup's
.deny could be propagated to all its descendents (but not additions to
the .allow).


To prevent it escaping to laxer device permissions, which of course only

Yeah it should be renamed.  I want to use the same cap which we would
use for user namespaces though.  CAP_NS_CONT(ainer)?  Even though there
really is no such thing as a 'container'.  But that would tie together
any such privileges for cgroups and namespaces.

thanks,
-serge
--

From: Paul Menage
Date: Friday, March 14, 2008 - 7:15 am

That makes it impossible for a root process to enter a child cgroup,
do something, and then go back to its own cgroup. Why aren't the
existing cgroup security semantics sufficient?

Paul
--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 7:35 am

Yes, but it can fire off a child in the child cgroup to do something,

Because the point of this is to provide some restrictions to otherwise
privileged users, and cgroups only provides dac-based permissions.

But that doesn't mean that I'm not doing too much.  I could just add a
CAP_SYS_ADMIN or CAP_CONT_OVERRIDE+CAP_SYS_ADMIN check, and not restrict
which cgroups a task can move to.  Does that sound good?

-serge
--

From: Paul Menage
Date: Saturday, March 15, 2008 - 5:57 pm

Sounds reasonable.

Paul
--

From: Paul Menage
Date: Friday, March 14, 2008 - 2:18 am

In keeping with the naming convention for control groups, "devices"
would be better than "devcg".

Paul
--

From: Serge E. Hallyn
Date: Friday, March 14, 2008 - 7:00 am

Noted, thanks.

-serge
--

Previous thread: please fix 2.6.24.3 unifdef-y += if_addrlabel.h problem by Adam Turk on Wednesday, March 12, 2008 - 8:17 pm. (2 messages)

Next thread: none