This is needed for computing pathnames in the AppArmor LSM.
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: John Johansen <jjohansen@suse.de>
---
fs/namei.c | 2 +-
include/linux/security.h | 9 ++++++---
security/dummy.c | 2 +-
security/selinux/hooks.c | 3 ++-
4 files changed, 10 insertions(+), 6 deletions(-)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1503,7 +1503,7 @@ int vfs_create(struct inode *dir, struct
return -EACCES; /* shouldn't it be ENOSYS? */
mode &= S_IALLUGO;
mode |= S_IFREG;
- error = security_inode_create(dir, dentry, mode);
+ error = security_inode_create(dir, dentry, nd ? nd->mnt : NULL, mode);
if (error)
return error;
DQUOT_INIT(dir);
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -283,6 +283,7 @@ struct request_sock;
* Check permission to create a regular file.
* @dir contains inode structure of the parent of the new file.
* @dentry contains the dentry structure for the file to be created.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
* @mode contains the file mode of the file to be created.
* Return 0 if permission is granted.
* @inode_link:
@@ -1204,8 +1205,8 @@ struct security_operations {
void (*inode_free_security) (struct inode *inode);
int (*inode_init_security) (struct inode *inode, struct inode *dir,
char **name, void **value, size_t *len);
- int (*inode_create) (struct inode *dir,
- struct dentry *dentry, int mode);
+ int (*inode_create) (struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt, int mode);
int (*inode_link) (struct dentry *old_dentry,
struct inode *dir, struct dentry *new_dentry);
int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
@@ -1611,11 +1612,12 @@ static inline int security_inode_init_se
static inline int security_inode_create (struct inode *dir,
struct ...Once again very strong NACK. Every conditional passing of vfsmounts get my veto. As mentioned last time if you really want this send a patch series first that passed the vfsmount consistantly. -
I don't consider it fair to NACK this patch just because some other part of
the kernel uses vfs_create() in the wrong way -- those are really independent
issues. The bug is not that hard to fix though; here is a proposed patch on
top of the other AppArmor patches.
The offenders are nfsd and the mqueue filesystem. Instantiate a struct
nameidata there.
The .dentry and .mnt fields can easily be filled in in nfsd. The mqueue
filesystem is somewhat special: files that mq_open creates are on an internal
mount and the mqueue filesystem is not generally visible (it may or may not
be mounted). When passing a regular vfsmount to vfs_create the files would
appear disconnected while they actually are kernel-internal objects. Use a
NULL vfsmount there to avoid that -- passing the kernel-internal vfsmount
there wouldn't help, anyway.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
---
fs/namei.c | 7 ++++---
fs/nfsd/vfs.c | 23 +++++++++++++++++++----
ipc/mqueue.c | 6 +++++-
3 files changed, 28 insertions(+), 8 deletions(-)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1503,7 +1503,7 @@ int vfs_create(struct inode *dir, struct
return -EACCES; /* shouldn't it be ENOSYS? */
mode &= S_IALLUGO;
mode |= S_IFREG;
- error = security_inode_create(dir, dentry, nd ? nd->mnt : NULL, mode);
+ error = security_inode_create(dir, dentry, nd->mnt, mode);
if (error)
return error;
DQUOT_INIT(dir);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1108,6 +1108,18 @@ nfsd_commit(struct svc_rqst *rqstp, stru
}
#endif /* CONFIG_NFSD_V3 */
+static inline int
+nfsd_do_create(struct inode *dir, struct dentry *child, struct vfsmount *mnt,
+ int mode)
+{
+ struct nameidata nd = {
+ .dentry = child,
+ .mnt = mnt,
+ };
+
+ return vfs_create(dir, child, mode, &nd);
+}
+
/*
* Create a file (regular, directory, device, fifo); UNIX sockets
* not yet implemented.
@@ -1192,7 +1204,8 @@ nfsd_create(struct svc_rqst *rqstp, stru
err = 0;
switch ...Note that it's not just this particular hunk, all these kinds of conditional But anyway, creating fake nameidata structures is not really helpful. If there is a nameidata passed people expect it to be complete, and if you pass them to an LSM people will e.g. try to look into lookup intents. But similar to the per-mountpoint r/o work I suspect you're simply operating on the wrong level, but then again this might not be fixable due to the braindead apparmor design. -
I don't actually agree with that: when nfsd creates a file, it still is a file create no matter where it originates from, and so it does make sense to provide the appropriate intent information too. Struct nameidata contains other crap only needed during an actual lookup too --- that's a mess, and the nameidata2 patch shows one possibility how to deal with it. (It's the cleanest approach I could think of right now without cluttering the vfs code with insane amounts of dereferences.) Thanks, Andreas -
You should provide intent information, yes - which your patch didn't :) And yes, I didn't like doing the ugly intent in nameidata hack, it's creating loads of problems for various parties, e.g. the stackable filesystem folks. Now the basic intent in nameidata mistaken has been made even worse by passing back a struct file in it conditionally and doing lots of work in ->lookup that shouldn't be there. (Which btw, I expect to cause quite a few problems for apparmor or other lsms, but I guess so far no one has tried them on NFSv4) -
Well, the information implicitly provided is "no intent": In do_create() in ipc/mqueue.c intents would be pretty pointless because the mqueue filesystem is local. In fs/nfsd/vfs.c, intents would make slightly more sense assuming that the underlying filesystem eported via nfsd is remote. That's an Pathname wise, NFSv4 should look like any other filesystem on the client side. On the Server side, the concept of pathnames doesn't really fly for nfs: if a directory contains more than one link to the same file, there is no way to tell those aliases from each other from the file descriptor. In addition, computing even the somewhat ambiguous pathnames that can be computed would require subtree checking. But trying to confine nfsd is pretty pointless anyway: the deamon is privileged and can do whatever it wants. It makes more sense to export the right directories with the right permissions in the first place. Thanks, Andreas -
Yes, the vfs uses or passes through a NULL nameidata or vfsmount pointer and
no intent information in several places:
(1) vfs_create() is called with a NULL nameidata in two places,
(2) lookup_one_len() and lookup_one_len_kern() pass a NULL nameidata to
permission(), d_op->d_revalidate(), and i_op->lookup(),
(3) file_permission() passes a NULL nameidata to permission().
(4) permission() is directly called with NULL nameidata in some places, and
In general it is incorrect to use NULL nameidata or vfsmounts because then the
vfsmount->mnt_flags cannot be checked, and no intent information is available.
Intents are an optional optimization for remote filesystems; we can simply
ignore them for local filesystems. There are some cases where we are passing
NULL which are real bugs, but there are also other cases in which the
operations are not mount related: for example, filesystems like pipefs have
the MS_NOUSER flag set which indicates that they cannot be mounted at all. On
filesystems like sysfs, files are created and removed whether or not that
filesystem is mounted.
The places where we should pass a proper nameidata / vfsmount but don't should
be fixed. Those are long-standing issues in the vfs though, and at least some
are not easy to correct. AppArmor is affected by some of the NULL vfsmount
passing, and we found a few more problems when looking into this more
closely, but most of that NULL passing doesn't affect AppArmor:
* In some places, vfs functions are called without nameidata / vfsmount,
followed by calling lsm hooks with a vfsmount. In those cases, the vfs
functions cannot check the vfsmount flags, but the lsm hooks will still
do the appropriate checks.
* In the AppArmor security model, directory searching is always allowed, and
the access check are done once the leaf dentry + vfsmount has been looked
up. For example, granting write access to /var/tmp/foo in a profile is
sufficient to allow that access; search access ...Wouldn't it normally result in fewer instructions (on most architectures ... maybe not x86) to keep the same argument order as vfs_create? ie: static inline int nfsd_do_create(struct inode *dir, struct dentry *child, int mode, struct vfsmount *mnt) -
Here is a patch with request for comment. The create, lookup, and permission inode operations are all passed a full nameidata. This is unfortunate because in nfsd and the mqueue filesystem we must instantiate a struct nameidata, but cannot provide all of the same information of a regular lookup. The unused fields take up space on the stack, but more importantly, it is not obvious which fields have meaningful values and which don't, and so things might easily break. This patch introduces struct nameidata2 with only the fields that make sense independent of an actual lookup, and uses that struct in those places where a full nameidata is not needed. The entire patch is a little big so I'm only including the key parts here. The complete version is here: http://forgeftp.novell.com/apparmor/LKML_Submission-April_07/split-up-nameidata.diff Signed-off-by: Andreas Gruenbacher <agruen@suse.de> --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -14,21 +14,39 @@ struct open_intent { enum { MAX_NESTED_LINKS = 8 }; +/** + * Fields shared between nameidata and nameidata2 -- nameidata2 could + * be embedded in nameidata, but then the vfs code would become + * cluttered with dereferences. + */ +#define __NAMEIDATA2 \ + struct dentry *dentry; \ + struct vfsmount *mnt; \ + unsigned int flags; \ + \ + union { \ + struct open_intent open; \ + } intent; + struct nameidata { - struct dentry *dentry; - struct vfsmount *mnt; + __NAMEIDATA2 struct qstr last; - unsigned int flags; int last_type; unsigned depth; char *saved_names[MAX_NESTED_LINKS + 1]; +}; - /* Intent data */ - union { - struct open_intent open; - } intent; +struct nameidata2 { + __NAMEIDATA2 }; +#undef __NAMEIDATA2 + +static inline struct nameidata2 *ND2(struct nameidata *nd) +{ + return container_of(&nd->dentry, struct nameidata2, dentry); +} + struct path { struct vfsmount *mnt; struct dentry *dentry; @@ -76,10 +94,9 @@ extern void ...
Or better just pass argument directly. We really should only pass down the the dentry and the intent to the filesystem. The filesystem has not business looking at mnt in the operations, and the relevant bits of flags (mostly whether it's O_EXCLUSIVE) should be stored in the intent, because that's the only way it should be used. Doing it that way also allows to fix some braindead calling conventions or this one. etc.. -
Please don't use the kernel-doc begin-marker "/**" when the comment block --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
Sigh, kernel-doc should improve things, not get in the way ... -
Well, I see it as sort of a language that is embedded in C source code. But what are you suggesting? -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
is a clear sign that interface is wrong. Leaving aside the general idiocy of "we prohibit to do something with file if mounted here, but if there's another mountpoint, well, we just miss", an API of that kind is just plain wrong. Either you can live without seeing vfsmount in that method (in which case you shouldn't pass it at all), or you have a hole. NACK. -
The fundamental model of AppArmor is to perform access checks based on the location of a file in the filesystem namespace, i.e., the pathname, and this can only be done with both the dentry and vfsmount. My understanding was that there was agreement at the last kernel summit that pathname based approaches No. There are callers of vfs_create() that use a NULL nameidata, and that's what causes the problem here. Struct nameidata is pretty big, and so we don't want to allocate temporary nameidata objects all over the place. So to me the above NULL check seems the lesser evil. One way to deal with the nameidata size problem is to split it up into one part for the real path lookups, and a much smaller part that only contains the dentry, vfsmount, and intent flags. This would allow to pass around the smaller nameidata much more consistently. John has posted patches for that on May 14; subject [RFD Patch 0/4]. Feedback appreciated. In several places, the NULL nameidata is wrong because we can't check the vfsmount flags or intent. Not having this information is causing problems for This is backwards from what AppArmor does. The policy defines which paths may be accessed; all paths not explicitly listed are denied. If files are mounted at multiple locations, then the policy may allow access to some locations but not to others. That's not a hole. In fact this is not much different from traditional permissions on parent directories: even if the same files are mounted at several locations, parent directory permissions may allow accessing only some of those locations. Thanks, Andreas -
I don't know what else you'd call it. Would you mind providing some concrete examples of how such a model would be useful? - James -- James Morris <jmorris@namei.org> -
AppArmor doesn't label files; it's a different model from SELinux. Its policy defines which processes may access which paths. Even if for some reson the same files were visible elsewhere, the policy wouldn't cover those other paths, and so accessing them would be denied. So again, that's not a security The model is explained, with examples, in the technical documentation at http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/. Thanks, Andreas -
Hello. I think bind mounts were discussed when shared subtree ( http://lwn.net/Articles/159092/ ) was introduced. For systems that allow users mount their CD/DVDs freely, bind mounts are used and labeling files is a convenient way to deny accessing somebody else's files. But systems that don't allow users mount their CD/DVDs freely, bind mounts needn't to be used and using pathnames is a convenient way to deny accessing somebody else's files. Pathname based access control/auditing system works if the system doesn't use bind mounts. However, there are distributions (e.g. Debian Etch) that always use bind mounts. In such distributions, pathname based access control/auditing system doesn't work. This is not the fault of distributions nor pathname based access control/auditing system. It is possible to solve by passing vfsmount to VFS and LSM functions. SELinux users are having a lot of trouble because pathnames in audit logs are not always complete. AppArmor users are having a lot of trouble because pathnames which a process requested are ambiguous when bind mounts are used. Being able to report pathnames that a process requested is not surprising when considering user friendliness. I beleive passing vfsmount makes both users happy. Thanks. -
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth< |
