Hi Al, Christoph, Trond, Stephen, Casey,
Here's a set of patches that implement a very basic set of COW credentials. It
compiles, links and runs for x86_64 with EXT3, (V)FAT, NFS, AFS, SELinux and
keyrings all enabled. Most other filesystems are disabled, apart from things
like proc. It is not intended to completely cover the kernel at this point.The cred struct contains the credentials that the kernel needs to act upon
something or to create something. Credentials that govern how a task may be
acted upon remain in the task struct.Because keyrings and effective capabilities can be installed or changed in one
process by another process, they are shadowed by the cred structure rather than
residing there. Additionally, the session and process keyrings are shared
between all the threads of a process. The shadowing is performed by
update_current_cred() which is invoked on entry to any system call that might
need it.A thread's cred struct may be read by that thread without any RCU precautions
as only that thread may replace the its own cred struct. To change a thread's
credentials, dup_cred() should be called to create a new copy, the copy should
be changed, and then set_current_cred() should be called to make it live. Once
live, it may not be changed as it may then be shared with file descriptors, RPC
calls and other threads. RCU will be used to dispose of the old structure.The three patches are:
(1) Introduce struct cred and migrate fsuid, fsgid, the groups list and the
keyrings pointer to it.(2) Introduce a security pointer into the cred struct and add LSM hooks to
duplicate the information pointed to thereby and to free it.Make SELinux implement the hooks, splitting out some the task security
data to be associated with struct cred instead.(3) Migrate the effective capabilities mask into the cred struct.
I plan on adding a fourth patch that will allow the LSM security contents of a
cred struct to be manipulated by the ker...
Introduce a copy on write credentials record (struct cred). The fsuid, fsgid,
supplementary groups list move into it (DAC security). The session, process
and thread keyrings are reflected in it, but don't primarily reside there as
they aren't per-thread and occasionally need to be instantiated or replaced by
other threads or processes.The LSM security information (MAC security) does *not* migrate from task_struct
at this point, but will be addressed by a later patch.task_struct then gains an RCU-governed pointer to the credentials as a
replacement to the members it lost.struct file gains a pointer to (f_cred) and a reference on the cred struct that
the opener was using at the time the file was opened. This replaces f_uid and
f_gid.To alter the credentials record, a copy must be made. This copy may then be
altered and then the pointer in the task_struct redirected to it. From that
point on the new record should be considered immutable.In addition, the default setting of i_uid and i_gid to fsuid and fsgid has been
moved from the callers of new_inode() into new_inode() itself.Signed-off-by: David Howells <dhowells@redhat.com>
---arch/x86_64/kernel/sys_x86_64.c | 4 +
fs/aio.c | 25 +++++-
fs/anon_inodes.c | 2
fs/attr.c | 4 -
fs/compat.c | 65 ++++++++++++++
fs/compat_ioctl.c | 7 +-
fs/dcookies.c | 11 ++
fs/devpts/inode.c | 6 +
fs/dquot.c | 2
fs/eventfd.c | 4 +
fs/eventpoll.c | 16 ++++
fs/exec.c | 37 +++++++-
fs/ext3/balloc.c | 2
fs/ext3/ialloc.c | 4 -
fs/fcntl.c | 11 ++
fs/file_table.c | 3 -
fs/filesystems.c | 7 +-
fs/inode.c | 6 +
fs/inotify_user.c ...
Move into the cred struct the part of the task security data that defines how a
task acts upon an object. The part that defines how something acts upon a task
remains attached to the task.For SELinux this requires some of task_security_struct to be split off into
cred_security_struct which is then attached to struct cred. Note that the
contents of cred_security_struct may not be changed without the generation of a
new struct cred.The split is as follows:
(*) create_sid, keycreate_sid and sockcreate_sid just move across.
(*) sid is split into victim_sid - which remains - and action_sid - which
migrates.(*) osid, exec_sid and ptrace_sid remain.
victim_sid is the SID used to govern actions upon the task. action_sid is used
to govern actions made by the task.When accessing the cred_security_struct of another process, RCU read procedures
must be observed.Signed-off-by: David Howells <dhowells@redhat.com>
---include/linux/cred.h | 1
include/linux/security.h | 34 +++
kernel/cred.c | 7 +
security/dummy.c | 11 +
security/selinux/exports.c | 6
security/selinux/hooks.c | 497 +++++++++++++++++++++++--------------
security/selinux/include/objsec.h | 16 +
security/selinux/selinuxfs.c | 8 -
security/selinux/xfrm.c | 6
9 files changed, 380 insertions(+), 206 deletions(-)diff --git a/include/linux/cred.h b/include/linux/cred.h
index 22ae610..6c6feec 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -26,6 +26,7 @@ struct cred {
gid_t gid; /* fsgid as was */
struct rcu_head exterminate; /* cred destroyer */
struct group_info *group_info;
+ void *security;/* caches for references to the three task keyrings
* - note that key_ref_t isn't typedef'd at this point, hence the odd
diff --git a/include/linux/security.h b/include/linux/security.h
index 1a15526..e5ed2ea 100644
--- a/include/linux/secur...
My concern is with this victim_sid. Whether the concern is valid
depends on exactly how the other credentials can be used, which isn't
yet entirely clear to me.So my concern is that while a task is acting with alternate creds,
another task can act upon it based upon victim_sid. So does this
open up the possibility for an unprivileged task to ptrace or kill
a task in the middle of a privileged operation? Is that somehow
safe in the way this is used here?I guess I need to look more at the actual nfs patches etc.
thanks,
-
I think that's why they want it - so that things like nfsd and
cachefiles can switch the credentials used for permission checking and
object labeling (actor_sid, fscreate_sid) without exposing them to
access by other tasks via signals, ptrace, etc (victim_sid). Similar to
fsuid vs. uid. And then the separate issue of migrating the permission
checking and object labeling state into a separate credential structure
that can have a separate lifecycle from the task.Precisely when to use one identity vs. the other though isn't always
clear, and the potential for accidental divergence is also a concern.I think it is a mistake to have selinux_get_task_sid() blindly return
the victim SID btw. I think you likely need to split that interface and
require the caller to indicate what it wants, so that there is no
accidental misuse of the victim SID where the caller wanted the actor
SID. Or at least rename that interface to make it clear that it only
--
Stephen Smalley
National Security Agency-
What should auditing use in audit_filter_rules() when dealing with
AUDIT_SUBJ_* cases? Should the SUBJ cases use the subjective SID and the
AUDIT_OBJ_* cases use the objective SID? On the other hand AUDIT_OBJ_* cases
don't seem to have anything to do with tasks.David
-
I believe that you'll need to audit both sets of credentials.
I think that for audit filtering you will need to have the ability
to filter on either. It's no different from the euid/ruid split.Casey Schaufler
casey@schaufler-ca.com
-
(cc'd linux-audit)
As you say, I don't think AUDIT_OBJ_* has anything to do with tasks,
just object labels (like inode labels).I think you likely want the actor SID / subject SID or whatever you want
to call it for AUDIT_SUBJ_*.--
Stephen Smalley
National Security Agency-
Ah, ok, so the daemon would use this to act under the user's
credentials. I was thinking the user would be using this to act
under the daemon's or kernel's sid.Between that and David's response, that this is only for the duration of
one syscall (IIUC), and not exported to userspace, it sounds safe
enough at the moment. I do worry about the fact that inevitably someone
will want to 'expand' on that :)My worry arose from the fact that I don't see
security_cred_kernel_act_as() being called anywhere in this patchset...thanks,
Think kernel service rather than daemon. NFSd provides its own daemons to
override the security of, whereas cachefiles runs in the process context ofThat's more or less correct. You have to add the pagefault handler to that
Look in:
[PATCH 04/22] CRED: Request a credential record for a kernel service
Which was part of a patchset I sent on the 21st Sept. get_kernel_cred() is in
turn used by:[PATCH 13/22] CacheFiles: A cache that backs onto a mounted filesystem
David
-
This seems to me to be an unnatural and inappropriate separation.
Move the whole of the security blob into the cred if you must have
a cred (which I was soooo glad Linux didn't have after having dealt
with it in Solaris) rather than having two blobs to deal with. If an
LSM requires a different treatment between when a task is a subjectSo put all these fields into one blob and attach them to the cred.
Actually, if you put all these fields in the task blob maybe you
don't need to do your COW thing at all.Casey Schaufler
casey@schaufler-ca.com
-
The separation is necessary for a few reasons:
(1) The task victimisation context must *not* be changed by a temporary
override of the action and creation contexts for purposes such as
cachefiles.(2) If the victimisation context is not included in the override cred, then I
only need one copy of the override cred to do *all* the work for
cachefiles. I can share that singular override blob across every task
that wishes to access the cache.(3) If the victimisation context is moved to the override cred, I have to
create a new context every time I want to apply the override. This means
I have to deal with the possibility of OOM at such points. I could cacheIndeed, but I can help it to do so by providing separate security pointers on
Whilst that is true, one of the purposes of this is to make it easier and
cleaner to effect the override. Every field in the cred struct potentially
must be overridden. That's a lot of context to save each time I need to apply
the override and a lot of context to restore each time I want to restore it.With these patches, all I need to do is to take a ref and swap the cred
pointers with a memory barrier to satisfy the RCU, and then swap them back
again and release the ref. It's much, much simpler.Furthermore, with respect to LSM and SELinux, I think I can remove the SELinux
specific knowledge currently present in cachefiles by saying to LSM "give me a
cred for kernel service X". With SELinux this can do all the transformations
necessary to give me the appropriate action SID and file creation SID without
me needing to know that these concepts exist. I just apply the cred I'm given
as an override.With your suggestion, I either have to do a full set of transformations each
time I want to apply the override, or I have to know about SELinux or
whatever's internals. Your objection to my earlier patch was this very point.David
-
Could you use "object context" instead of "victimisation context"?
That would be for the LSM to decide, not the file system. While I
concede that it is unlikely that you are going to want to use the
same security attributes for your "object" and your "subject" I also
suggest that it is probable that the "object" attributes will want to
change in the case of an filesystem daemon as well, and that theAssuming that the LSM goes along with the notion you could do that
You're making a big mess (that's my opinion, take it for what it's worth)
throughout the LSMs to deal with a single (or maybe a very few) specialSo it sounds like what you'd be happiest with would be a separate task
struct hand crafted to he the right "object" and "subject" attributes.
Couldn't you set up a task to do the overridden operations? Yes, it
would have it's own set of ugly, but it would be isolated. I haven't
been through the code of late, but this used to be what nfsd did, that
being nothing except loaning it's attributes (ok, it was a cred) toYes, but the LSM writer now has to maintain two full security blobs
Yup. And I'm reluctantly withdrawing my objection to exporting secid's
across the LSM interface. I didn't like 'em when I saw 'em in 1988 in
the SecureWare CMW, and I don't like 'em any better now, but I had three
tries at extracting them from use outside the SELinux specific code and
it's clear that there's no way to do it without being Linus. For Smack
I have restructured a couple lists and can deal with secid's now.I don't see any way to get around the LSM being involved, even with
secid's. Only the LSM can decide what is appropriate for an override
value. Maybe all you need isint security_task_godlikesecid(u32 *secid)
which gives you the fully protected, all powerfull secid. That doesn't
handle capabilities, of course, but It appears you know how to deal with
those already.I appologize that I can't offer a complete alternative at this point,
but I do have other fis...
Filesystem? CacheFiles is acting on behalf of a filesystem and must override
the context that that filesystem was using, so that it can access a different99.99% probable that I will want a different subjective context for talking to
What daemon are you referring to?
You've misunderstood, I think. Consider readpages. CacheFiles really wants to
run in the process context of whoever called readpages. The main reason for
this is one of performance: readpages() is called an awful lot, and we don'tIndeed. I have mentioned that I intend to create a patch to provide an LSM
hook by which a kernel service can ask the LSM module for a new cred struct,
appropriate to that service. That way, all details of SIDs, secids,Sigh. Only by saving and restoring the complete credential context from/to the
This is actually the simpler and cleaner solution. It has been assumed,
generally, to this point that subjective context == objective context, but weActually, I think you were right. I shouldn't be exposing secids like that.
No, all I really need is something like:
struct cred *security_get_kernel_service_cred(const char *name);
And maybe:
int security_create_files_as(struct cred *, struct inode *);
So that I can say I want to be able to create files that look the same as that
inode there. The LSM can then check that the subjective context in the cred
struct is allowed to do that.David
-
Actually, that whilst that is sort of feasible for CacheFiles[*], it is not
really feasible for NFSd. NFSd possesses a set of daemons that have a
standard objective context and substitute a lot of different subjective
contexts as they perform VFS operations on behalf of remote clients.[*] But in practice quite icky from other points of view.
David
-
Move the effective capabilities mask from the task struct into the credentials
record.Note that the effective capabilities mask in the cred struct shadows that in
the task_struct because a thread can have its capabilities masks changed by
another thread. The shadowing is performed by update_current_cred() which is
invoked on entry to any system call that might need it.Signed-off-by: David Howells <dhowells@redhat.com>
---fs/buffer.c | 3 +++
fs/ioprio.c | 3 +++
fs/open.c | 27 +++++++++------------------
fs/proc/array.c | 2 +-
fs/readdir.c | 3 +++
include/linux/cred.h | 2 ++
include/linux/init_task.h | 2 +-
include/linux/sched.h | 2 +-
ipc/msg.c | 3 +++
ipc/sem.c | 3 +++
ipc/shm.c | 3 +++
kernel/acct.c | 3 +++
kernel/capability.c | 3 +++
kernel/compat.c | 3 +++
kernel/cred.c | 30 +++++++++++++++++++++++-------
kernel/exit.c | 2 ++
kernel/fork.c | 6 +++++-
kernel/futex.c | 3 +++
kernel/futex_compat.c | 3 +++
kernel/kexec.c | 3 +++
kernel/module.c | 6 ++++++
kernel/ptrace.c | 3 +++
kernel/sched.c | 9 +++++++++
kernel/signal.c | 6 ++++++
kernel/sys.c | 39 +++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 3 +++
kernel/time.c | 9 +++++++++
kernel/uid16.c | 3 +++
mm/mempolicy.c | 6 ++++++
mm/migrate.c | 3 +++
mm/mlock.c | 4 ++++
mm/mmap.c | 3 +++
mm/mremap.c | 3 +++
mm/oom_kill.c | 9 +++++++--
mm/swapfile.c | 6 ++++++
net/compat.c | 6 ++++++
net/socket.c | 45 ++++++++++++++++++++++++++++++++++++++++++...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1OOC If we were to simply drop support for one process changing the
capabilities of another, would we need this patch?Thanks
Andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)iD8DBQFG8fLrQheEq9QabfIRApPOAKCHAoazhTTpY/qSjdmRZxDptqeqiACfd4Q7
mdIPx+xpG19ih9uiVv1NSBU=
=TfZd
-----END PGP SIGNATURE-----
-
Umm... It would become simpler (which is a damn good thing - less PITA
with update_current_cred), but it would be still needed.FWIW, dropping that support would be a Good Thing(tm), as far as I'm
concerned. _Why_ do we want that, anyway, and how much userland code
is able to cope with that in sane way?
-
No. This has nothing to do about one process changing some other
process' capabilities. It has to do with being able to pass security
information around the kernel beyond the confines of the task struct.This is needed in order to deal with asynchronous i/o where security
checks may have to be deferred, and where the task struct may no longer
be available.
One example would be a failover situation when doing deferred writes: if
the first choice of storage medium is unavailable, and the kernel tries
to fail the write over to another storage. On NFS that might involve
having to build up a new RPCSEC_GSS security context for the new server.
Currently, you cannot do this safely because all the security info is
cached in the task struct and much of it cannot be copied.Trond
-
Ok, what can't be copied, and why can't it be copied?
Casey Schaufler
casey@schaufler-ca.com
-
In practice, no security information can be copied because the checks
are all made on the "current" pointer. There is no mechanism other than
'current' for passing security information around.Trond
-
Well, the patch could be less, but there's still the possibility of a kernel
service wanting to override the capabilities mask.David
-
