These patches add local caching for network filesystems such as NFS and AFS.
The patches can roughly be broken down into a number of sets:
(*) 01-keys-inc-payload.diff
(*) 02-keys-search-keyring.diff
(*) 03-keys-callout-blob.diff
Three patches to the keyring code made to help the CIFS people.
Included because of patches 05-08.
(*) 04-keys-get-label.diff
A patch to allow the security label of a key to be retrieved.
Included because of patches 05-08.
(*) 05-security-current-fsugid.diff
(*) 06-security-separate-task-bits.diff
(*) 07-security-subjective.diff
(*) 08-security-kernel-service.diff
Patches to permit the subjective security of a task to be overridden.
All the security details in task_struct are decanted into a new struct
that task_struct then has two pointers two: one that defines the
objective security of that task (how other tasks may affect it) and one
that defines the subjective security (how it may affect other objects).
Note that I have dropped the idea of struct cred for the moment. With
the amount of stuff that was excluded from it, it wasn't actually any
use to me. However, it can be added later.
Required for cachefiles.
(*) 09-release-page.diff
(*) 10-fscache-page-flags.diff
(*) 11-add_wait_queue_tail.diff
(*) 12-fscache.diff
Patches to provide a local caching facility for network filesystems.
(*) 13-cachefiles-ia64.diff
(*) 14-cachefiles-ext3-f_mapping.diff
(*) 15-cachefiles-write.diff
(*) 16-cachefiles-monitor.diff
(*) 17-cachefiles-export.diff
(*) 18-cachefiles.diff
Patches to provide a local cache in a directory of an already mounted
filesystem.
(*) 19-fscache-nfs.diff
(*) 20-fscache-nfs-mount.diff
(*) 21-fscache-nfs-display.diff
Patches to provide NFS with local caching.
(*) 22-fcrypt-bit-annotate.diff
A fix for AFS.
(*) 23-afs-testsetpageerror.diff
...Increase the size of a payload that can be used to instantiate a key in
add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to
be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a
buffer of sufficient size, vmalloc() will be tried instead.
Signed-off-by: David Howells <dhowells@redhat.com>
---
security/keys/keyctl.c | 38 ++++++++++++++++++++++++++++++--------
1 files changed, 30 insertions(+), 8 deletions(-)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index d9ca15c..8ec8432 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -19,6 +19,7 @@
#include <linux/capability.h>
#include <linux/string.h>
#include <linux/err.h>
+#include <linux/vmalloc.h>
#include <asm/uaccess.h>
#include "internal.h"
@@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type,
char type[32], *description;
void *payload;
long ret;
+ bool vm;
ret = -EINVAL;
- if (plen > 32767)
+ if (plen > 1024 * 1024 - 1)
goto error;
/* draw all the data into kernel space */
@@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type,
/* pull the payload in if one was supplied */
payload = NULL;
+ vm = false;
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
- if (!payload)
- goto error2;
+ if (!payload) {
+ if (plen <= PAGE_SIZE)
+ goto error2;
+ vm = true;
+ payload = vmalloc(plen);
+ if (!payload)
+ goto error2;
+ }
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type,
key_ref_put(keyring_ref);
error3:
- kfree(payload);
+ if (!vm)
+ kfree(payload);
+ else
+ vfree(payload);
error2:
kfree(description);
error:
@@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id,
key_ref_t keyring_ref;
void *payload;
long ret;
+ bool vm = false;
ret = -EINVAL;
- if (plen > ...Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.
The scenario: User in process A does things that cause things to be
created in its process session keyring. The user then does an su to
another user and starts a new process, B. The two processes now
share the same process session keyring.
Process B does an NFS access which results in an upcall to gssd.
When gssd attempts to instantiate the context key (to be linked
into the process session keyring), it is denied access even though it
has an authorization key.
The order of calls is:
keyctl_instantiate_key()
lookup_user_key() (the default: case)
search_process_keyrings(current)
search_process_keyrings(rka->context) (recursive call)
keyring_search_aux()
keyring_search_aux() verifies the keys and keyrings underneath the
top-level keyring it is given, but that top-level keyring is neither
fully validated nor checked to see if it is the thing being searched for.
This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
---
security/keys/keyring.c | 35 +++++++++++++++++++++++++++++++----
1 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 88292e3..76b89b2 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
struct keyring_list *keylist;
struct timespec now;
- unsigned long possessed;
+ unsigned long possessed, kflags;
struct key *keyring, *key;
key_ref_t key_ref;
long err;
@@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
now = current_kernel_time();
err = ...Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/keys-request-key.txt | 11 +++++--- Documentation/keys.txt | 14 +++++++--- include/linux/key.h | 9 ++++--- security/keys/internal.h | 9 ++++--- security/keys/keyctl.c | 7 ++++- security/keys/request_key.c | 49 ++++++++++++++++++++++-------------- security/keys/request_key_auth.c | 12 +++++---- 7 files changed, 70 insertions(+), 41 deletions(-) diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 266955d..09b55e4 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt @@ -11,26 +11,29 @@ request_key*(): struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); or: struct key *request_key_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); or: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); Or by userspace invoking the request_key system ...
Add a keyctl() function to get the security label of a key.
The following is added to Documentation/keys.txt:
(*) Get the LSM security context attached to a key.
long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
size_t buflen)
This function returns a string that represents the LSM security context
attached to a key in the buffer provided.
Unless there's an error, it always returns the amount of data it could
produce, even if that's too big for the buffer, but it won't copy more
than requested to userspace. If the buffer pointer is NULL then no copy
will take place.
A NUL character is included at the end of the string if the buffer is
sufficiently big. This is included in the returned count. If no LSM is
in force then an empty string will be returned.
A process must have view permission on the key for this function to be
successful.
Signed-off-by: David Howells <dhowells@redhat.com>
---
Documentation/keys.txt | 21 +++++++++++++++
include/linux/keyctl.h | 1 +
include/linux/security.h | 20 +++++++++++++-
security/dummy.c | 8 ++++++
security/keys/compat.c | 3 ++
security/keys/keyctl.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++
security/security.c | 5 +++
security/selinux/hooks.c | 21 +++++++++++++--
8 files changed, 141 insertions(+), 4 deletions(-)
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index b82d38d..be424b0 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -711,6 +711,27 @@ The keyctl syscall functions are:
The assumed authoritative key is inherited across fork and exec.
+ (*) Get the LSM security context attached to a key.
+
+ long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
+ size_t buflen)
+
+ This function returns a string that represents the LSM security context
+ attached to a key in the buffer provided.
+
+ Unless there's ...Change current->fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be separated from the task_struct. Signed-off-by: David Howells <dhowells@redhat.com> --- arch/ia64/kernel/perfmon.c | 4 ++-- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- drivers/isdn/capi/capifs.c | 4 ++-- drivers/usb/core/inode.c | 4 ++-- fs/9p/fid.c | 2 +- fs/9p/vfs_inode.c | 4 ++-- fs/9p/vfs_super.c | 4 ++-- fs/affs/inode.c | 4 ++-- fs/anon_inodes.c | 4 ++-- fs/attr.c | 4 ++-- fs/bfs/dir.c | 4 ++-- fs/cifs/cifsproto.h | 2 +- fs/cifs/dir.c | 12 ++++++------ fs/cifs/inode.c | 8 ++++---- fs/cifs/misc.c | 4 ++-- fs/coda/cache.c | 6 +++--- fs/coda/upcall.c | 4 ++-- fs/devpts/inode.c | 4 ++-- fs/dquot.c | 2 +- fs/exec.c | 4 ++-- fs/ext2/balloc.c | 2 +- fs/ext2/ialloc.c | 4 ++-- fs/ext2/ioctl.c | 2 +- fs/ext3/balloc.c | 2 +- fs/ext3/ialloc.c | 4 ++-- fs/ext4/balloc.c | 2 +- fs/ext4/ialloc.c | 4 ++-- fs/fuse/dev.c | 4 ++-- fs/gfs2/inode.c | 10 +++++----- fs/hfs/inode.c | 4 ++-- fs/hfsplus/inode.c | 4 ++-- fs/hpfs/namei.c | 24 ++++++++++++------------ fs/hugetlbfs/inode.c | 16 ...
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Signed-off-by: David Howells <dhowells@redhat.com> --- drivers/block/loop.c | 5 - drivers/char/agp/frontend.c | 2 drivers/char/drm/drm_fops.c | 2 drivers/char/tty_audit.c | 2 fs/affs/super.c | 4 - fs/autofs/inode.c | 4 - fs/autofs4/inode.c | 4 - fs/autofs4/waitq.c | 4 - fs/binfmt_elf.c | 12 +- fs/cifs/connect.c | 5 - fs/cifs/ioctl.c | 2 fs/ecryptfs/messaging.c | 15 +- fs/exec.c | 20 ++- fs/fat/inode.c | 4 - fs/fcntl.c | 7 + fs/file_table.c | 4 - fs/fuse/dir.c | 12 +- fs/hfs/super.c | 4 - fs/hfsplus/options.c | 4 - fs/hpfs/super.c | 4 - fs/hugetlbfs/inode.c | 4 - fs/inotify_user.c | 2 fs/ioprio.c | 12 +- fs/namei.c | 6 + fs/ncpfs/ioctl.c | 32 ++--- fs/open.c | 22 ++- fs/proc/array.c | 14 +- fs/proc/base.c | 16 +- fs/proc/proc_sysctl.c | 4 - fs/quota.c | 4 - fs/smbfs/dir.c | 4 - fs/smbfs/inode.c | 2 fs/smbfs/proc.c | 2 include/linux/init_task.h | 23 +++ include/linux/sched.h | 78 +++++++++--- include/net/scm.h | 4 - ipc/mqueue.c | 4 - ipc/msg.c | 4 - ipc/sem.c | 4 - ipc/shm.c | 16 +- ipc/util.c | ...
Remove the temporarily embedded task security record from task_struct. Instead it is made to dangle from the task_struct::sec and task_struct::act_as pointers with references counted for each. do_coredump() is made to create a copy of the security record, modify it and then use that to override the main one for a task. sys_faccessat() is made to do the same. The process and session keyrings are moved from signal_struct into a new thread_group_security struct. This is then refcounted, with pointers coming from the task_security struct instead of from signal_struct. The keyring functions then take pointers to task_security structs rather than task_structs for their security contexts. This is so that request_key() can proceed asynchronously without having to worry about the initiator task's act_as pointer changing. The LSM hooks for dealing with task security are modified to deal with the task security struct directly rather than going via the task_struct as appopriate. This permits the subjective security context of a task to be overridden by changing its act_as pointer without altering its objective security pointer, and thus not breaking signalling, ptrace, etc. whilst the override is in force. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/exec.c | 15 +- fs/open.c | 37 ++--- include/linux/init_task.h | 17 -- include/linux/key-ui.h | 10 + include/linux/key.h | 31 +--- include/linux/sched.h | 40 ++++- include/linux/security.h | 43 ++++-- kernel/Makefile | 2 kernel/cred.c | 139 ++++++++++++++++++ kernel/exit.c | 1 kernel/fork.c | 40 ++--- kernel/kmod.c | 10 + kernel/sys.c | 16 +- kernel/user.c | 2 net/rxrpc/ar-key.c | 4 - security/dummy.c | 14 ...
Allow kernel services to override LSM settings appropriate to the actions
performed by a task by duplicating a security record, modifying it and then
using task_struct::act_as to point to it when performing operations on behalf
of a task.
This is used, for example, by CacheFiles which has to transparently access the
cache on behalf of a process that thinks it is doing, say, NFS accesses with a
potentially inappropriate (with respect to accessing the cache) set of
security data.
This patch provides two LSM hooks for modifying a task security record:
(*) security_kernel_act_as() which allows modification of the security datum
with which a task acts on other objects (most notably files).
(*) security_create_files_as() which allows modification of the security
datum that is used to initialise the security data on a file that a task
creates.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/cred.h | 22 ++++++++++++
include/linux/security.h | 35 +++++++++++++++++++
kernel/cred.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
security/dummy.c | 15 ++++++++
security/security.c | 13 +++++++
security/selinux/hooks.c | 45 ++++++++++++++++++++++++
6 files changed, 216 insertions(+), 0 deletions(-)
diff --git a/include/linux/cred.h b/include/linux/cred.h
new file mode 100644
index 0000000..c9f8906
--- /dev/null
+++ b/include/linux/cred.h
@@ -0,0 +1,22 @@
+/* Credential management
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_CRED_H
+#define _LINUX_CRED_H
+
+struct task_security;
+struct inode;
+
+extern struct task_security ...-- Stephen Smalley National Security Agency --
Cleared means what? Setting to 0? Or is there some other constant I should use for that? David --
Yes, setting to 0. Otherwise, only other issue I have with this interface is it won't generalize to dealing with nfsd, where we want to set the acting context to a context we obtain from or determine based upon the client. Why can't cachefilesd just push a context into the kernel and pass that into the hook as the acting context, and then nfsd can do likewise using the context provided by the client or obtained locally from exports for ordinary clients? Avoids the transition SID computation altogether within the kernel and makes this more generic. -- Stephen Smalley National Security Agency --
It would get a context from the client or from a local configuration that would map security-unaware clients to a default context, and then want to assume that context for the particular operation. No transition the way in which dbusd imports contexts), or directly as a context returned by a libselinux function. Has to be done that way so that it can be set differently for different policy types (strict, targeted, mls). Naturally, cachefiles (the kernel module) would invoke a security hook It doesn't fit with how other users of security_kernel_act_as() will likely want to work (they will want to just set the context to a specified value, whether one obtained from the client or from some local source), nor with how type transitions normally work (exec, with the program type as the second type field). I think it will just cause confusion and subtle breakage. -- Stephen Smalley National Security Agency --
That sounds too SELinux specific. How do I do it so that it works for any LSM? Is linking against libselinux is a viable option if it's not available under all LSM models? Is it available under all LSM models? Perhaps Casey can It's causing me lots of confusion as it is. I have been / am being told by different people to do different things just in dealing with SELinux, and various people are raising extra requirements or restrictions beyond that. There doesn't seem to be a consensus. It sounds like the best option is just to have the kernel nick the userspace daemon's security context and use that as is, and junk all the restrictions on what the daemon can do so that the kernel isn't too restricted. David --
You can't. There is no LSM for userspace; LSM specifically disavowed any common userspace API, and that was one of our original Nope, they would all have their own libraries, if they have a library at all. But that isn't your problem - your kernel interface should be generic, and your LSM hooks should be generic, but your userspace isn't required to be. Have a look at how many programs in the distribution Karl isn't a maintainer of the SELinux kernel code. And I had thought Well, you could do that, if that meets your needs, but it doesn't sound very optimal either. Why are you opposed to having userspace determine the context and write it to a cachefiles interface, and just have the kernel authorize it (invoke a hook to check permissions between the daemon's context and the specified label), and make it the acting context when appropriate (invoke a different hook to set it as the acting context)? -- Stephen Smalley National Security Agency --
That's what I remember as well - I suggested the transition idea and then, after discussion, agreed that it wasn't the best approach. And, as Steve points out, it's not my call to make. Karl --
Sigh. Can you tell me then how to do it now? I don't know very much about using SELinux userspace stuff libraries or configuring them. David --
I don't have enough context anymore to help with design, but I can help with the actual usage of the libraries if you have specific questions. Karl --
So, basically, userspace programs (outside of security tools) aren't supposed It is if I have to maintain a special pieces of code for each possible LSM. One piece for SELinux, one piece for AppArmour, one piece for Smack, one piece In /usr/bin ldd reports approximately 297 binaries link to libselinux, though I can't say how many of those linked against it directly rather than picking it up by contamination through a shared library. Furthermore, I've no idea Because, from what I gather, that means my userspace program needs to do something different, depending on the security model that's currently in force on a system. Furthermore, I would have to have separate code, as far as I know, for each security model as there's no commonality in userspace. I can't just link against libselinux. It might not be there. I'm not going to tie my program to SELinux either. Furthermore, I worked out "the right way to do this" with some apparent SELinux person, and you seemed reasonably accepting of it. Now I have to go and redo all the work, having had to redo the security stuff a couple of times already because someone objected. *That* is the main obstacle to getting my code accepted at the moment, I think. How about I just stick the context in /etc/cachefilesd.conf as a textual configuration item and have the daemon pass that as a string to the cachefiles kernel module, which can then ask LSM if it's valid to set this context as an override, given the daemon's own security context? That seems entirely reasonable to me. David --
I didn't say that. I said that LSM provides no standard way for userspace to access the security infrastructure, at least beyond the raw /proc/self/attr and xattr APIs, and I view that as a defect in LSM. SELinux does have a userspace API, because we think there should be a way for userspace to do that, and it is being used by userspace programs All your code has to do is invoke a function provided by libselinux. If at some later time a liblsm is introduced that provides a common front-end to a libselinux, libsmack, ..., then you can use that. But it doesn't exist today. But it all just becomes a simple function call The point being that userspace is already using the SELinux API directly - cachefilesd certainly won't be the first or most crucial application You can certainly make the linking conditional at compile time based on a build flag, and/or you can dlopen it at runtime and fall back gracefully if not present. It isn't tying your program to libselinux, just using it if present, in this case to obtain an input to supply to your kernel module so that the "policy" of deciding what context to use is completely determined in userspace and not hardcoded at all into the I did express reservations about it, and I really don't think that this is the main obstacle - the determination of the acting task security label and how to set it is really only a small part of your patch set, That mostly works, but it means that an update to policy may require an update to /etc/cachefilesd.conf, or that switching from one policy to another might likewise require changing that file. Versus using a separate policy-provided config file for the label. BTW, as should be obvious, some LSMs aren't label-based at all, so it would need to be optional. -- Stephen Smalley National Security Agency --
That seems like an awful lot of work. I suggest that what you put in /etc/cachefilesd.conf is a line like: security_context:"<whatever>" and have your daemon pass "<whatever>" into the kernel using a cachefile mechanism. The kernel code can call security_secctx_to_secid("<whatever>") to determine if it's valid. No need to invoke LSM specific code in your daemon. You may need to have an application, say cachefileselinuxcontext, that will read the current policy and spit out an appropriate value of "<whatever>", but that can be separate and LSM specific without For LSM's that don't use labels what you will have to pass in won't be a label, it will be something else. But since any LSM that wants to do networking or audit will have to deal with secid's and secctx's the method outlined above ought to fit the bill. Casey Schaufler casey@schaufler-ca.com --
That sounds workable, although I think he will want a more specific hook than security_secctx_to_secid(), or possibly a second hook call, that would not only validate the context but authorize the use of it by the cachefilesd process. And then the security_task_kernel_act_as() hook just takes the secid as input rather than the task struct of the daemon, and applies it. At that point, nfsd can use the same mechanism for setting the acting SID based on the client process after doing its own -- Stephen Smalley National Security Agency --
I thought using secids was verboten as it made things too specific. Have you example code for the security hook you mention? I'm not sure I understand why security_secctx_to_secid() is not sufficient. Or is it that I need something that takes a secctx, converts it to a secid and authorises its use all in one go? If it's this, why can't that be rolles into security_task_kernel_act_as()? That sets up a task_security struct which is then switched in and out without consultation of the LSM. David --
Well, that has been Casey's objection in the past to it, but he seems to have accepted their use now for certain purposes, and they are already security_secctx_to_secid() would just validate and map a context string to a secid. It wouldn't perform any permission check, as the caller might a kernel-internal user that is just mapping back and forth like current users of security_secid_to_secctx, or it might be something that ultimately originated from userspace but the hook has no way of knowing why or what set of checks would be appropriate. You'd need a more specific hook for the authorization, one that would perform a permission check, e.g. an avc_has_perm() call. Which likely requires defining a I was under the impression that security_task_kernel_act_as() was being used to switch the current task to an acting context, not to initially set up a struct for later use. If you go with the latter approach, then what is the lifecycle on that struct? BTW, it gets a little confusing with your use of task_security for the full task security state vs our existing use of task_security_struct within SELinux for the task's LSM security blob. I suppose ours could be renamed to task_selinux. -- Stephen Smalley National Security Agency --
Hmmm... This is sounding very not-simple. I don't know how to do this. I
can probably guess the kernel side by looking at how SECCLASS_KEY is done, but
it sounds like it involves changes to the userspace policy processing tools
too.
It also sounds a bit like overkill, but if it's the right way then I guess it
has to be done.
What does the security class represent in this case? And can it be generalised
Definitely the latter. I guess I wasn't very clear in the patch description.
(1) You create a new task_security struct
(2) You fill in the fsuid, fsgid, etc.
(3) You request that the LSM security pointer in it be set to point to the
context you want (at the moment this is done by attempting a transition
from the daemon's context):
ret = security_transition_sid(dtsec->sid, SECINITSID_KERNEL,
SECCLASS_PROCESS, &ksid);
and, in the current code, it returns an error if you're not allowed to do
that. But instead you'd ask it to set a specific context, and it'd set
that if you're permitted to do that, and give you an error if you're not.
(4) You then use the task_security at will to override the task->act_as
pointer in whatever task(s) you're operating on behalf of at the moment.
(5) When you cease operating on the behalf of a task, you revert its act_as
pointer and drop a reference to your task_security struct.
(6) When the last ref to the task_security struct goes away, the LSM data
I know. I thought about it quite a bit, but the problem is there's so much
overloading of various words (eg: security, context), and I wanted to avoid
Better still, perhaps, would be to prefix things with selinux_ to make it
namespace clean.
David
--
Not the tools, just the policy definitions. Dan can help with that. You add the definitions to the policy, then there is a script to regenerate the SELinux kernel headers that #define the SECCLASS_ and It is just a way of carving up the permission space, typically based on object type, but it can essentially be arbitrary. The check in this case seems specific to cachefiles since it is controlling an operation on the /dev/cachefiles interface that only applies to cachefiles internal operations, so making a cachefiles class seems reasonable. If this was instead just a generic "set my acting context to <value>" operation, then it could be a generic /proc/self/attr/active interface with an generic implementation and permission check, but here we aren't setting the active context of the cachefilesd daemon but rather of the cachefiles kernel module. The other approach that I suggested a long time ago is to exempt the cachefiles kernel module internal operations from SELinux permission checks altogether by setting some task flag when performing those operations and checking for that flag either on entry to the security_ static inlines, similar to the inode private flag, or within SELinux itself. Rationale being that the cachefiles kernel module can already do what it wants and the SELinux permission checks are really about controlling what userspace can do. Then we don't have to invent a context for the kernel module at all or worry about subtle breakage when -- Stephen Smalley National Security Agency --
Can you specify what sort of permissions you're thinking of providing for tasks to operate on this class? Can an object of this class 'operate' on other objects, or can only process-class objects do that? How does an object of this class acquire a label? What is an object of this class? Is it a "cache"? Or were you thinking of a "module"? David --
They would correspond with the operations provided by the /dev/cachefiles interface, at the granularity you want to support distinctions to be made. Could just be a single 'setcontext' permission if that is all you want to control distinctly, or could be a permission In this case, I wouldn't expect a cachefiles object to act on anything else. Some objects are also used as subjects, especially in the I was thinking the latter since the only goal was to control what contexts could be set by a given task, but you could support per-cache "objects" with their own labels (in which case the label would likely be determined from the creating task). If the latter, you don't really need a label for the object, and can just use the supplied context/secid as the object of the permission check, ala: rc = avc_has_perm(tsec->sid, secid, SECCLASS_CACHEFILES, CACHEFILES__SETCONTEXT); If the former, then you'd need more than one check, as you then have to check whether the task can act on the cache in question, and then check whether it can set the context for that cache to the specified value. -- Stephen Smalley National Security Agency --
Can this be made simpler by the fact that /dev/cachefiles has its own unique There is only one operation that makes sense to have a permission: "set context and begin caching". All the other operations on a file descriptor attached to /dev/cachfiles are necessary for there to be a managed cache at all, and given that you've Ummm. I was under the impression that the target SID had to be a member of target class. David --
That lets you control who can use the interface at all, but not what Do any of the interfaces allow a task to act on a cache other than one it has created? How does the task identify the desired cache? What if Not necessarily. secid is being applied as the acting context for the cachefiles kernel module, so the above makes sense, even though there isn't really any "object" in view here. Abstractly, the question we are asking above is: Can this task set the context of the cachefiles kernel module to this value? -- Stephen Smalley National Security Agency --
Each file descriptor opened creates one separate cache instance. Any commands
sent over that filedescriptor affect only the cache instance it is attached
to; similarly, any status data you read only refers to that one cache
instance.
Closing the file descriptor makes the cache go away as far as the kernel is
concerned. The cachefiles daemon retains its cache dev file descriptor for
As far as the cache daemon is concerned, the file descriptor is its handle to
So the following (taken from cachefilesd.te):
allow cachefilesd_t cachefiles_var_t : file { getattr rename unlink };
says, for example, allow:
avc_has_perm("cachefilesd_t",
"cachefiles_var_t",
SECCLASS_FILE,
FILE__RENAME,
...);
David
--
I have argued that in the past. I'm reasonably convinced that I have lost that argument at least for the immediate future as audit, usb, and networking are all dependent on them. I can't image an LSM that manages to avoid them, at least for the short term. If secid's are ever expundged from the kernel cachefiles will require reeducation, It would seem to me that security_secctx_to_secid() ought to suffice if the application code was written correctly. Be aware that factors outside the LSM may be important, too. As Stephen points out elsewhere, Smack will require you have particular capabilities (CAP_MAC_OVERRIDE, CAP_MAC_ADMIN) while a DAC LSM may require CAP_DAC_OVERRIDE. SELinux is likely to be the odd duck in this pond in that it does not use the capability mechanism in the way Nature intends it to be, opting to treat "privilege" with a completely different model. Casey Schaufler casey@schaufler-ca.com --
What sort of authorization are you thinking of? I would expect that to have been done by cachefileselinuxcontext (or cachefilesspiffylsmcontext) up in userspace. If you're going to rely on userspace applications for policy enforcement they need good points all, in spite of my personal distaste for secids. Casey Schaufler casey@schaufler-ca.com --
In Smack, I'd expect that you'd want to apply a CAP_MAC_OVERRIDE check. In SELinux, we'd apply a permission check between the task's security context and the specified security context so that we can control the pairwise relationship between them via allow rules and constraints. The kernel has no way of knowing whether the context was determined by cachefileselinuxcontext or not; it only knows that some task is trying to write some value to /cachefiles/context or whatever the kernel interface is, and it needs to apply some authorization check there, -- Stephen Smalley National Security Agency --
Yes, but I would expect that interface to be protected (owned by root, mode 0400). If /dev/cachefiles has to be publicly accessable make it Yes, but forgive me being slow, I don't see the problem. Casey Schaufler casey@schaufler-ca.com --
-- Stephen Smalley National Security Agency --
Yes, but we're talking about writing the configuration information to the kernel, not actually making any access checks with it. I think. What I think we're talking about (and please correct me David if I've stepped into the wrong theatre) is getting the magic secctx that cachefiles will use instead of the secctx that the task would have otherwise. I don't think we're talking about recomputing it on every access, I think David is looking for the blunderbuss secctx that he can use any time he needs one. And it would be CAP_MAC_ADMIN, since you're changing the MAC characteristics of the system, not doing an access check. Casey Schaufler casey@schaufler-ca.com --
Indeed.
The way I do it is:
(1) The daemon opens /dev/cachefiles to being an instance of a cache.
(2) The daemon negotiates a security context for the module to use.
(3) The security context is place in a task_security structure.
(4) This task_security struct is attached temporarily to task->act_as each
time any task attempts to access the cache through the module.
(5) The task_security struct is discarded when the file descriptor that was
created in (1) is closed and the cache is withdrawn at the same time.
David
--
What would I do with such a thing? How would it get run? Spat out to where? David --
Put it in /etc/init.d/cachefiles and run it at boot time. Put the result into /etc/cachefiles.conf. Have cachefilesd read it and pass it downward. Casey Schaufler casey@schaufler-ca.com --
Ewww. Runtime mangling of the configuration. I suppose it doesn't have to be in that file with the rest of the config. David --
More likely, run it at build time in your .spec file to generate cachefiles.conf, then run it again maybe upon a policy update or if the user selects a different policy. -- Stephen Smalley National Security Agency --
I don't think sticking it in cachefiles.conf is a good idea necessarily. That has to be an administrator modifiable file. Is there a program I could make cachefiles run directly and capture the output of that could give me the How do I do that? David --
Yes, we could easily make a simple program that just invokes a libselinux function that in turn grabs the proper context from some context configuration file under /etc/selinux/$SELINUXTYPE/contexts/ and -- Stephen Smalley National Security Agency --
That sounds nicely genericisable, perhaps even for any LSM. /usr/bin/lsm-get-context cachefiles It does have to be able to come up with different contexts for different caches, but that can be controlled by changing the name supplied to it. David --
--- David Howells <dhowells@redhat.com> wrote: Works for Smack. I can't say definitively, but I think it will work for SELinux. Beyond that and we're into the fuzzy bit of the LSM. Casey Schaufler casey@schaufler-ca.com --
Linking against libselinux is not now, nor will it ever be, a viable option. There's just too much sophistication contained in libselinux That would be consistant with the (perhaps archaic now) behavior of nfsd on Unix, which did nothing but "lend it's credential" to the underlying kernel code. I think it's a rational approach, although I expect that in may have troubles under SELinux. Casey Schaufler casey@schaufler-ca.com --
nfsd needs to able to set the acting label to a value determined based on the client so that file operations performed on behalf of the client are subjected to the right set of permission checks and new files are labeled properly, just as it already does for uid and gid (via fsuid and fsgid). So merely inheriting the label from the nfsd daemon doesn't help with that purpose. Both nfsd and cachefiles need a way to set the acting label, so having a common hook for both to do that makes sense. The authorization of that label will differ, so splitting the authorization into a separate hook also makes sense. -- Stephen Smalley National Security Agency --
I would expect that the operation would be more sophisticated than that. You certainly aren't going to use what comes from the other side without any processing, and I expect you'll have some sort of operation on anything you pull from a config file Unless you've got an LSM other than SELinux, of course. If cachefilesd is going to be responsible for maintaining this magic context there needs to be an LSM interface for it, not I think that I agree with Stephen, although I could be mirely confused. That happens to me when interfaces are described in SELinux terms. I still don't care much for multiple contexts, and I don't have a good grasp of how you'll deal with Smack, or any LSM other than SELinux. Just as Stephen mentions, I also don't see the generality that a change of this magnitude really ought to provide. Casey Schaufler casey@schaufler-ca.com --
Yes, that's true - the contexts would be subjected to a permission check. But that's separable from the act of setting it as the task's acting security state (and needs to be separated, as the precise check will vary depending on the situation - cachefiles is going to apply a LSM is an in-kernel interface. Here we are talking about a userspace interface for obtaining the right security label to use. There is no equivalent to LSM in userspace as of yet. Feel free to invent one, but -- Stephen Smalley National Security Agency --
I am much more concerned with the interfaces used to pass the information into the kernel. I would expect that to be LSM independent, not a call into libselinux that resolves into a selinuxfs operation, or it's netlink equivilant. It would be unfortunate if the userland/kernel interface became an obstacle Casey Schaufler casey@schaufler-ca.com --
That wasn't the issue. The interface to the cachefiles module would just consist of cachefilesd writing a string label to some pseudo file tell cachefiles what label to apply as the acting label for operations performed by cachefiles. Which isn't SELinux-specific at all. David was asking though how cachefilesd (the userspace agent) would obtain such a label to use. And that may very well be LSM-specific, and as there is no LSM userspace API, it makes sense for him to invoke a libselinux function at present. If a liblsm is later created and provides a common front-end API (internally dlopen'ing the right shared library based on some configuration, whether libselinux or libsmack or whatever), then cachefilesd can instead call the liblsm interface, but that doesn't exist today. -- Stephen Smalley National Security Agency --
I am certainly not in favor of adding such complexity. I suggest that cachefilesd get the context it wants using whatever scheme works. I think there should be a cachefiles specific (LSM independent) scheme for getting it into the kernel, and a LSM hooks for setting it, if that's what he really wants to do. Casey Schaufler casey@schaufler-ca.com --
Me neither. I understand SELinux somewhat, though it's got a lot of wibbly bits, and WinNT's security system, but I have no experience of the other Perhaps it should be a specific interface, solely for cachefiles's use then. David --
That would help focus things, to be sure. I don't know if that focus will speed things up or slow them down, but I think that attempting to accomodate SELinux/NFS, with the state that effort is in, will only lead to tears. Casey Schaufler casey@schaufler-ca.com --
Okay. I can:
(1) Have cachefilesd (the daemon) pass a security context string to the
cachefiles kernel module, which can then convert it to a secID. It'll
require a security_secctx_to_secid() function, but I'm fairly certain I
have a patch to add such kicking around somewhere.
(2) Make security_task_kernel_act_as() take a task_security struct and a
secID and just assign the latter to the former. I'm not sure it makes
sense to do any checks here, other than checking that under SELinux the
secID is of SECCLASS_PROCESS class.
However, I need to write a check that the cachefilesd daemon is permitted to
nominate the secID it did. Can someone tell me how to do this? The obvious
way to do this is to add another PROCESS__xxx security permit specifically for
cachefiles, but that seems like a waste of a bit when there are only two spare
bits.
avc_has_perm(daemon_tsec->sid, nominated_sid,
SECCLASS_PROCESS, PROCESS__CACHEFILES_USE, NULL);
Now, I recall the addition of another security class being mentioned, which
presumably would give something like:
avc_has_perm(daemon_tsec->sid, nominated_sid,
SECCLASS_CACHE, CACHE__USE_AS_OVERRIDE, NULL);
And I assume this doesn't care if one, the other or both of the two SIDs
mentioned are of SECCLASS_PROCESS rather than of SECCLASS_CACHE.
David
--
Hmmmm... I can't see how to add a new security class. I can see that security classes are defined in various autogenerated header files, but autogenerated from what? The "This file is automatically generated. Do not edit." message at the top of these files seems to belie the fact they're actually checked in to GIT as is. David --
Already planned for 2.6.25, see: http://marc.info/?l=selinux&m=119973017423487&w=2 Right, the latter is reasonable. Requires adding the class and permission definition to policy/flask/security_classes and policy/flask/access_vectors and then regenerating the kernel headers from those files, ala: svn co http://oss.tresys.com/repos/refpolicy/trunk refpolicy cd refpolicy/policy/flask vi security_classes access_vectors <add new class to end> make make LINUX_D=/path/to/linux-2.6 tokern Dan knows how to do that. -- Stephen Smalley National Security Agency --
Policy ultimately has to be updated in order to start writing allow rules based on the new class/perm. libselinux et al doesn't have to change. If you have a "SELinux: policy loaded with handle_unknown=allow" message in your /var/log/messages, then new classes/perms that are not yet known to the policy will be allowed by default, so the operation will be permitted by the kernel. -- Stephen Smalley National Security Agency --
I don't. How do I set it? David --
Okay... It looks like I want four security operations/hooks for cachefiles:
(1) Check that a daemon can nominate a secid for use by the kernel to override
the process subjective secid.
(2) Set the secid mentioned in (1).
(3) Check that the kernel may create files as a particular secid (this could
be specified indirectly by specifying an inode, which would hide the secid
inside the LSM).
(4) Set the fscreate secid mentioned in (3).
Now, it's possible to condense (1) and (2) into a single op, and condense (3)
and (4) into a single op. That, however, might make the ops unusable by nfsd,
which may well want to bypass the checks or do them elsewhere.
Any thoughts?
David
--
FYI, I added the following vectors:
# kernel services that need to override task security
class kernel_service
{
use_as_override
create_files_as
}
The first allows:
avc_has_perm(daemon_tsec->sid, nominated_sid,
SECCLASS_KERNEL_SERVICE,
KERNEL_SERVICE__USE_AS_OVERRIDE,
NULL);
And the second something like:
avc_has_perm(tsec->sid, inode->sid,
SECCLASS_KERNEL_SERVICE,
KERNEL_SERVICE__CREATE_FILES_AS,
NULL);
Rather than specifically dedicating them to the cache, I made them general.
David
--
Make sure that you or Dan submits a policy patch to register these classes and permissions in the policy when the kernel patch is queued for merge. -- Stephen Smalley National Security Agency --
Do I just send the attached patch to <selinux@tycho.nsa.gov>? Or do I need to
make a diff from a point in the tree nearer the root? Is there anything else
I need to alter whilst I'm at it?
David
---
Index: policy/flask/security_classes
===================================================================
--- policy/flask/security_classes (revision 2573)
+++ policy/flask/security_classes (working copy)
@@ -109,4 +109,7 @@
# network peer labels
class peer
+# kernel services that need to override task security
+class kernel_service
+
# FLASK
Index: policy/flask/access_vectors
===================================================================
--- policy/flask/access_vectors (revision 2573)
+++ policy/flask/access_vectors (working copy)
@@ -736,3 +736,10 @@
{
recv
}
+
+# kernel services that need to override task security
+class kernel_service
+{
+ use_as_override
+ create_files_as
+}
--
-- James Morris <jmorris@namei.org> --
Yes, and I would recommend doing so to avoid permission races. You're going to have to deal with the case where step (2) fails even if you have step (1), so the "test and set" mindset seems Again, I don't think you're doing yourself any favors with a separate test operation. On (4) are you suggesting a third attribute value? There's the secid of the task originally, the secid you're going to use to do the access Let me see if I understand your current scheme. You want a (object) secid that is used to access the task. You want a (subject) secid that the task uses to accesses objects. You want a (newobject) secid that an object gets on creation. And you want them all to be distinct and settable. Did I get that right? Thank you. Casey Schaufler casey@schaufler-ca.com --
Looking at SELinux, that doesn't get rid of the permission race because there's
no locking. This may be different for other models.
I was thinking of having steps (2) and (4) not do any checking, but rather
assume that the caller has done the checks before calling the set routines,
possibly by calling the hooks mentioned in (1) and (3).
My main problem is that I don't know how NFSd wants to do things. I suppose
That's correct. Let me summarise:
(1) The daemon has an active process security ID (say A). When the daemon
nominates an override process security ID (say B) to be used by the
kernel, the cachefiles module asks the LSM to check that A is allowed to
nominate B for this purpose.
(2) The cachefiles module is given a path under which its cache exists. The
directory at the base of this path has its own security ID (say C).
cachefiles wants to create new files in the cache with the same security
ID as that directory (ie. C).
However, when cachefiles is creating files in the cache, the security of
whatever process is doing the access will be overridden with B, so
cachefiles asks the LSM to check that B is allowed create files as C.
Note that this is an instantaneous check in the cache startup stage. This
allows caching to be aborted early if the security policy does not permit
B to create Cs. Technically this check is superfluous as it's re-checked
That depends on what you mean. cachefilesd (the daemon) will be run with a
security label because there's a security model in place.
I don't actually need to access the daemon, but the daemon does need to do
Correct. This is used as an override by any task that accesses the cache
indirectly through the cachefiles module.
The cachefilesd daemon has its own secid with which it accesses the cache
directly. The sets of permissions that must be granted by the module's
override subjective secid and by the daemon's subjective secid aren't
File ...I don't think this check is on the kernel per se but rather the ability of the daemon to nominate a secid for use on files created later by the I think it is fine to combine them. -- Stephen Smalley National Security Agency --
Hmmm... At the moment the cachefiles module works out for itself what the file label should be by looking at the root directory it was given and assuming the label on that is what it's going to be using. Are you suggesting this should be specified directly instead by the daemon? David --
No, just that however the secid is determined (whether indirectly via specification of a directory or directly via specification of a secid), the ability of the daemon to control what secid gets used ought to be controlled. Or, alternatively, the ability of the daemon to enable caching in a given directory ought to be controlled. -- Stephen Smalley National Security Agency --
Oh my. While there will be cases where the label of the file will match the label of the containing directory, and in fact for most label based LSMs that will usually be the case, you certainly can't count on it. The only place that you can find the correct label for a file with any confidence in from the xattr (assuming the LSM uses xattrs) on the file itself. I can imaging an LSM for which it would make sense to derive the label from the root directory, but I know Smack isn't one of them, and I don't think that SELinux is either, although I would defer a definitive answer on that to Stephen. Casey Schaufler casey@schaufler-ca.com --
The cache files are created by the cachefiles kernel module, not by the userspace daemon, and the userspace daemon doesn't need to directly read/write them at all (but I think it does need to be able to unlink them?). The userspace daemon merely identifies the directory where the cache should live as part of configuring the cache when enabling it. Hence, it is fine to use a fixed label for the cache files (systemhigh in a MLS world), and to let the directory's label serve as the basis for it. Only the cachefiles kernel module directly reads and writes the files. -- Stephen Smalley National Security Agency --
That is what I currently do. SELinux rules are provided to grant the appropriate file accesses to the override label used by the kernel module, so Correct. --
Well, my bad, and thank you for clearing up my misunderstanding. Casey Schaufler casey@schaufler-ca.com --
This is pretty nasty. I would suggest either to have the function return the number of pages that were added to pagecache, or just --
Recruit a couple of page flags to aid in cache management. The following extra
flags are defined:
(1) PG_fscache (PG_owner_priv_2)
The marked page is backed by a local cache and is pinning resources in the
cache driver.
(2) PG_fscache_write (PG_owner_priv_3)
The marked page is being written to the local cache. The page may not be
modified whilst this is in progress.
If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/splice.c | 2 +-
include/linux/page-flags.h | 38 ++++++++++++++++++++++++++++++++++++--
include/linux/pagemap.h | 11 +++++++++++
mm/filemap.c | 16 ++++++++++++++++
mm/migrate.c | 2 +-
mm/page_alloc.c | 3 +++
mm/readahead.c | 9 +++++----
mm/swap.c | 4 ++--
mm/swap_state.c | 4 ++--
mm/truncate.c | 10 +++++-----
mm/vmscan.c | 2 +-
11 files changed, 83 insertions(+), 18 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 6bdcb61..61edad7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
*/
wait_on_page_writeback(page);
- if (PagePrivate(page))
+ if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..fcc9e23 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -77,25 +77,30 @@
#define PG_active 6
#define PG_slab 7 /* slab debug (Suparna wants this) */
-#define PG_owner_priv_1 8 /* Owner use. If pagecache, fs may use*/
+#define PG_owner_priv_1 8 /* Owner use. fs may use in pagecache */
#define ...I'd much prefer if you would handle this in the filesystem, and have it set PG_private whenever fscache needs to receive a callback, and DTRT depending on whether PG_fscache etc. is set or not. Also, this wait_on_page_fscache_write / end_page_fscache_write stuff seems like it would belong in your fscache headers rather than generic mm code (ditto for your PG_fscache checks in the page allocator -- you --
Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/wait.h | 2 ++
kernel/wait.c | 18 ++++++++++++++++++
2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..f1038d0 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,8 @@ static inline int waitqueue_active(wait_queue_head_t *q)
#define is_sync_wait(wait) (!(wait) || ((wait)->private))
extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q,
+ wait_queue_t *wait));
extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait));
extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(add_wait_queue);
+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+ unsigned long flags;
+
+ wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue_tail(q, wait);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
--
The attached patch adds a generic intermediary (FS-Cache) by which filesystems
may call on local caching capabilities, and by which local caching backends may
make caches available:
+---------+
| | +--------------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +--------------+
+---------+ +-->| | |
| | | |--+
| AFS |----->| FS-Cache |
| | | |--+
+---------+ +-->| | |
| | | | +--------------+
+---------+ | +----------+ | | |
| | | +-->| CacheFiles |
| ISOFS |--+ | /var/cache |
| | +--------------+
+---------+
The patch also documents the netfs interface and the cache backend
interface provided by the facility.
There are a number of reasons why I'm not using i_mapping to do this.
These have been discussed a lot on the LKML and CacheFS mailing lists,
but to summarise the basics:
(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise, making it difficult
to distinguish blocks that have been read from the network and cached from
those that haven't.
(2) The backing inode must be fully populated before being exposed to
userspace through the main inode because the VM/VFS goes directly to the
backing inode and does not interrogate the front inode on VM ops.
Therefore:
(a) The backing inode must fit entirely within the cache.
(b) All backed files currently open must fit entirely within the cache at
the same time.
(c) A working set of files in total larger than the cache may not be
cached.
(d) A file may not grow larger than the available ...This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> --- arch/ia64/kernel/ia64_ksyms.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index bd17190..20c3546 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); --
Change all the usages of file->f_mapping in ext3_*write_end() functions to use
the mapping argument directly. This has two consequences:
(*) Consistency. Without this patch sometimes one is used and sometimes the
other is.
(*) A NULL file pointer can be passed. This feature is then made use of by
the generic hook in the next patch, which is used by CacheFiles to write
pages to a file without setting up a file struct.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/ext3/inode.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 9b162cd..bc918d3 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1227,7 +1227,7 @@ static int ext3_generic_write_end(struct file *file,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
{
- struct inode *inode = file->f_mapping->host;
+ struct inode *inode = mapping->host;
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -1252,7 +1252,7 @@ static int ext3_ordered_write_end(struct file *file,
struct page *page, void *fsdata)
{
handle_t *handle = ext3_journal_current_handle();
- struct inode *inode = file->f_mapping->host;
+ struct inode *inode = mapping->host;
unsigned from, to;
int ret = 0, ret2;
@@ -1293,7 +1293,7 @@ static int ext3_writeback_write_end(struct file *file,
struct page *page, void *fsdata)
{
handle_t *handle = ext3_journal_current_handle();
- struct inode *inode = file->f_mapping->host;
+ struct inode *inode = mapping->host;
int ret = 0, ret2;
loff_t new_i_size;
--
Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.
This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/pagemap.h | 5 +++++
mm/filemap.c | 18 ++++++++++++++++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6a1b317..21c35e2 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -223,6 +223,11 @@ static inline void wait_on_page_fscache_write(struct page *page)
extern void end_page_fscache_write(struct page *page);
/*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
diff --git a/mm/filemap.c b/mm/filemap.c
index bea1ba6..6872d1b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -521,6 +521,24 @@ void fastcall wait_on_page_bit(struct page *page, int bit_nr)
EXPORT_SYMBOL(wait_on_page_bit);
/**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+ wait_queue_head_t *q = page_waitqueue(page);
+ unsigned long flags;
+
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue(q, waiter);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
* unlock_page - unlock a locked page
* @page: the page
*
--
Export a number of functions for CacheFiles's use. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/super.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index ceaf2e3..cd199ae 100644 --- a/fs/super.c +++ b/fs/super.c @@ -266,6 +266,7 @@ int fsync_super(struct super_block *sb) __fsync_super(sb); return sync_blockdev(sb->s_bdev); } +EXPORT_SYMBOL_GPL(fsync_super); /** * generic_shutdown_super - common helper for ->kill_sb() --
Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a backing store for the cache. CacheFiles uses a userspace daemon to do some of the cache management - such as reaping stale nodes and culling. This is called cachefilesd and lives in /sbin. The source for the daemon can be downloaded from: http://people.redhat.com/~dhowells/cachefs/cachefilesd.c And an example configuration from: http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf The filesystem and data integrity of the cache are only as good as those of the filesystem providing the backing services. Note that CacheFiles does not attempt to journal anything since the journalling interfaces of the various filesystems are very specific in nature. CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for communication with the daemon. Only one thing may have this open at once, and whilst it is open, a cache is at least partially in existence. The daemon opens this and sends commands down it to control the cache. CacheFiles is currently limited to a single cache. CacheFiles attempts to maintain at least a certain percentage of free space on the filesystem, shrinking the cache by culling the objects it contains to make space if necessary - see the "Cache Culling" section. This means it can be placed on the same medium as a live set of data, and will expand to make use of spare space and automatically contract when the set of data requires more space. ============ REQUIREMENTS ============ The use of CacheFiles and its daemon requires the following features to be available in the system and in the cache filesystem: - dnotify. - extended attributes (xattrs). - openat() and friends. - bmap() support on files in the filesystem (FIBMAP ioctl). - The use of bmap() to detect a partial page at the end of the file. It is strongly recommended that the "dir_index" option is enabled on Ext3 filesystems being used as a ...
The attached patch makes it possible for the NFS filesystem to make use of the network filesystem local caching service (FS-Cache). To be able to use this, an updated mount program is required. This can be obtained from: http://people.redhat.com/steved/fscache/util-linux/ To mount an NFS filesystem to use caching, add an "fsc" option to the mount: mount warthog:/ /a -o fsc Signed-off-by: David Howells <dhowells@redhat.com> --- fs/nfs/Makefile | 1 fs/nfs/client.c | 5 + fs/nfs/file.c | 37 ++++ fs/nfs/fscache-def.c | 289 +++++++++++++++++++++++++++++++++ fs/nfs/fscache.c | 391 +++++++++++++++++++++++++++++++++++++++++++++ fs/nfs/fscache.h | 148 +++++++++++++++++ fs/nfs/inode.c | 47 +++++ fs/nfs/read.c | 28 +++ fs/nfs/super.c | 3 fs/nfs/sysctl.c | 1 include/linux/nfs_fs.h | 9 + include/linux/nfs_fs_sb.h | 18 ++ 12 files changed, 968 insertions(+), 9 deletions(-) diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index df0f41e..073d04c 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 70587f3..acb2179 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -43,6 +43,7 @@ #include "delegation.h" #include "iostat.h" #include "internal.h" +#include "fscache.h" #define NFSDBG_FACILITY NFSDBG_CLIENT @@ -139,6 +140,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname, clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED; #endif + nfs_fscache_get_client_cookie(clp); + return clp; error_3: @@ -170,6 +173,8 @@ static void nfs_free_client(struct nfs_client *clp) nfs4_shutdown_client(clp); ...
Changes to the kernel configuration defintions and to the NFS mount options to
allow the local caching support added by the previous patch to be enabled.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/Kconfig | 8 ++++++++
fs/nfs/client.c | 2 ++
fs/nfs/internal.h | 1 +
fs/nfs/super.c | 14 ++++++++++++++
4 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 215b0d6..83d1227 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1650,6 +1650,14 @@ config NFS_V4
If unsure, say N.
+config NFS_FSCACHE
+ bool "Provide NFS client caching support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+ help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
config NFS_DIRECTIO
bool "Allow direct I/O on NFS files"
depends on NFS_FS
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index acb2179..be38c3c 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -575,6 +575,7 @@ static int nfs_init_server(struct nfs_server *server,
/* Initialise the client representation from the mount data */
server->flags = data->flags & NFS_MOUNT_FLAGMASK;
+ server->options = data->options;
if (data->rsize)
server->rsize = nfs_block_size(data->rsize, NULL);
@@ -931,6 +932,7 @@ static int nfs4_init_server(struct nfs_server *server,
/* Initialise the client representation from the mount data */
server->flags = data->flags & NFS_MOUNT_FLAGMASK;
server->caps |= NFS_CAP_ATOMIC_OPEN;
+ server->options = data->options;
if (data->rsize)
server->rsize = nfs_block_size(data->rsize, NULL);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f3acf48..ef09e00 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -35,6 +35,7 @@ struct nfs_parsed_mount_data {
int acregmin, acregmax,
acdirmin, acdirmax;
int namlen;
+ unsigned int options;
unsigned ...Display the local caching state in /proc/fs/nfsfs/volumes.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/nfs/client.c | 7 ++++---
fs/nfs/fscache.h | 15 +++++++++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be38c3c..91ecea3 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1335,7 +1335,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
/* display header on line 1 */
if (v == &nfs_volume_list) {
- seq_puts(m, "NV SERVER PORT DEV FSID\n");
+ seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1349,12 +1349,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);
- seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+ seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
clp->cl_nfsversion,
NIPQUAD(clp->cl_addr.sin_addr),
ntohs(clp->cl_addr.sin_port),
dev,
- fsid);
+ fsid,
+ nfs_server_fscache_state(server));
return 0;
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 144fb58..9a735fc 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -53,6 +53,17 @@ extern void __nfs_fscache_invalidate_page(struct page *, struct inode *);
extern int nfs_fscache_release_page(struct page *, gfp_t);
/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+ if (server->nfs_client->fscache &&
+ (server->options & NFS_OPTION_FSCACHE))
+ return "yes";
+ return "no ";
+}
+
+/*
* release the caching state associated with a page if undergoing complete page
* invalidation
*/
@@ -109,6 +120,10 @@ static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
static inline void ...Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
crypto/fcrypt.c | 88 ++++++++++++++++++++++++++++---------------------------
1 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/crypto/fcrypt.c b/crypto/fcrypt.c
index d161949..a32cb68 100644
--- a/crypto/fcrypt.c
+++ b/crypto/fcrypt.c
@@ -51,7 +51,7 @@
#define ROUNDS 16
struct fcrypt_ctx {
- u32 sched[ROUNDS];
+ __be32 sched[ROUNDS];
};
/* Rotate right two 32 bit numbers as a 56 bit number */
@@ -73,8 +73,8 @@ do { \
* /afs/transarc.com/public/afsps/afs.rel31b.export-src/rxkad/sboxes.h
*/
#undef Z
-#define Z(x) __constant_be32_to_cpu(x << 3)
-static const u32 sbox0[256] = {
+#define Z(x) __constant_cpu_to_be32(x << 3)
+static const __be32 sbox0[256] = {
Z(0xea), Z(0x7f), Z(0xb2), Z(0x64), Z(0x9d), Z(0xb0), Z(0xd9), Z(0x11),
Z(0xcd), Z(0x86), Z(0x86), Z(0x91), Z(0x0a), Z(0xb2), Z(0x93), Z(0x06),
Z(0x0e), Z(0x06), Z(0xd2), Z(0x65), Z(0x73), Z(0xc5), Z(0x28), Z(0x60),
@@ -110,8 +110,8 @@ static const u32 sbox0[256] = {
};
#undef Z
-#define Z(x) __constant_be32_to_cpu((x << 27) | (x >> 5))
-static const u32 sbox1[256] = {
+#define Z(x) __constant_cpu_to_be32((x << 27) | (x >> 5))
+static const __be32 sbox1[256] = {
Z(0x77), Z(0x14), Z(0xa6), Z(0xfe), Z(0xb2), Z(0x5e), Z(0x8c), Z(0x3e),
Z(0x67), Z(0x6c), Z(0xa1), Z(0x0d), Z(0xc2), Z(0xa2), Z(0xc1), Z(0x85),
Z(0x6c), Z(0x7b), Z(0x67), Z(0xc6), Z(0x23), Z(0xe3), Z(0xf2), Z(0x89),
@@ -147,8 +147,8 @@ static const u32 sbox1[256] = {
};
#undef Z
-#define Z(x) __constant_be32_to_cpu(x << 11)
-static const u32 sbox2[256] = {
+#define Z(x) __constant_cpu_to_be32(x << 11)
+static const __be32 sbox2[256] = {
Z(0xf0), Z(0x37), Z(0x24), Z(0x53), Z(0x2a), Z(0x03), Z(0x83), Z(0x86),
Z(0xd1), Z(0xec), Z(0x50), Z(0xf0), Z(0x42), Z(0x78), Z(0x2f), Z(0x6d),
Z(0xbf), Z(0x80), Z(0x87), Z(0x27), Z(0x95), Z(0xe2), Z(0xc5), Z(0x5d),
@@ -184,8 +184,8 @@ static const u32 sbox2[256] = {
};
#undef Z
-#define Z(x) ...Add a TestSetPageError() macro to the suite of page flag manipulators. This can be used by AFS to prevent over-excision of rejected writes from the page cache. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/page-flags.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index fcc9e23..0350c37 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -130,6 +130,7 @@ #define PageError(page) test_bit(PG_error, &(page)->flags) #define SetPageError(page) set_bit(PG_error, &(page)->flags) #define ClearPageError(page) clear_bit(PG_error, &(page)->flags) +#define TestSetPageError(page) test_and_set_bit(PG_error, &(page)->flags) #define PageReferenced(page) test_bit(PG_referenced, &(page)->flags) #define SetPageReferenced(page) set_bit(PG_referenced, &(page)->flags) --
Add a function - cancel_rejected_write() - to excise a rejected write from the pagecache. This function is related to the truncation family of routines. It permits the pages modified by a network filesystem client (such as AFS) to be excised and discarded from the pagecache if the attempt to write them back to the server fails. The dirty and writeback states of the afflicted pages are cancelled and the pages themselves are detached for recycling. All PTEs referring to those pages are removed. Note that the locking is tricky as it's very easy to deadlock against truncate() and other routines once the pages have been unlocked as part of the writeback process. To this end, the PG_error flag is set, then the PG_writeback flag is cleared, and only *then* can lock_page() be called. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/mm.h | 5 ++- mm/truncate.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 520238c..438270f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1005,12 +1005,13 @@ extern int do_munmap(struct mm_struct *, unsigned long, size_t); extern unsigned long do_brk(unsigned long, unsigned long); -/* filemap.c */ -extern unsigned long page_unuse(struct page *); +/* truncate.c */ extern void truncate_inode_pages(struct address_space *, loff_t); extern void truncate_inode_pages_range(struct address_space *, loff_t lstart, loff_t lend); +extern void cancel_rejected_write(struct address_space *, pgoff_t, pgoff_t); +/* filemap.c */ /* generic vm_area_ops exported for stackable file systems */ extern int filemap_fault(struct vm_area_struct *, struct vm_fault *); diff --git a/mm/truncate.c b/mm/truncate.c index 5b7d1c5..95fc1a8 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -465,3 +465,86 @@ int invalidate_inode_pages2(struct address_space *mapping) return ...
Improve the handling of the case of a server rejecting an attempt to write back
a cached write. AFS operates a write-back cache, so the following sequence of
events can theoretically occur:
CLIENT 1 CLIENT 2
======================= =======================
cat data >/the/file
(sits in pagecache)
fs setacl -dir /the/dir/of/the/file \
-acl system:administrators rlidka
(write permission removed for client 1)
sync
(writeback attempt fails)
The way AFS attempts to handle this is:
(1) The affected region will be excised and discarded on the basis that it
can't be written back, yet we don't want it lurking in the page cache
either. The contents of the affected region will be reread from the
server when called for again.
(2) The EOF size will be set to the current server-based file size - usually
that which it was before the affected write was made - assuming no
conflicting write has been appended, and assuming the affected write
extended the file.
This patch makes the following changes:
(1) Zero-length short reads don't produce EBADMSG now just because the OpenAFS
server puts a silly value as the size of the returned data. This prevents
excised pages beyond the revised EOF being reinstantiated with a surprise
PG_error.
(2) Writebacks can now be put into a 'rejected' state in which all further
attempts to write them back will result in excision of the affected pages
instead.
(3) Preparing a page for overwriting now reads the whole page instead of just
those parts of it that aren't to be covered by the copy to be made. This
handles the possibility that the copy might fail on EFAULT. Corollary to
this, PG_update can now be set by afs_prepare_page() on behalf of
afs_prepare_write() rather than setting it in afs_commit_write().
(4) In the case of a conflicting write, afs_prepare_write() will attempt to
flush the write to the server, and will then wait for ...Save the operation ID to be used with a call that we're making for display
through /proc/net/rxrpc_calls. This helps debugging stuck operations as we
then know what they are.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/afs/fsclient.c | 32 +++++++++++++++++++++++---------
fs/afs/rxrpc.c | 1 +
fs/afs/vlclient.c | 2 ++
include/net/af_rxrpc.h | 1 +
net/rxrpc/af_rxrpc.c | 3 +++
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/ar-proc.c | 7 ++++---
7 files changed, 35 insertions(+), 12 deletions(-)
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 04584c0..a468f2d 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -287,6 +287,7 @@ int afs_fs_fetch_file_status(struct afs_server *server,
call->reply2 = volsync;
call->service_id = FS_SERVICE;
call->port = htons(AFS_FS_PORT);
+ call->operation_ID = htonl(FSFETCHSTATUS);
/* marshall the parameters */
bp = call->request;
@@ -316,7 +317,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call,
case 0:
call->offset = 0;
call->unmarshall++;
- if (call->operation_ID != FSFETCHDATA64) {
+ if (call->operation_ID != htonl(FSFETCHDATA64)) {
call->unmarshall++;
goto no_msw;
}
@@ -464,7 +465,7 @@ static int afs_fs_fetch_data64(struct afs_server *server,
call->reply3 = buffer;
call->service_id = FS_SERVICE;
call->port = htons(AFS_FS_PORT);
- call->operation_ID = FSFETCHDATA64;
+ call->operation_ID = htonl(FSFETCHDATA64);
/* marshall the parameters */
bp = call->request;
@@ -509,7 +510,7 @@ int afs_fs_fetch_data(struct afs_server *server,
call->reply3 = buffer;
call->service_id = FS_SERVICE;
call->port = htons(AFS_FS_PORT);
- call->operation_ID = FSFETCHDATA;
+ call->operation_ID = htonl(FSFETCHDATA);
/* marshall the parameters */
bp = call->request;
@@ -577,6 +578,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
call->service_id = FS_SERVICE;
call->port = ...The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and through it any attached caches. The kAFS filesystem will use caching automatically if it's available. Signed-Off-By: David Howells <dhowells@redhat.com> --- fs/Kconfig | 8 + fs/afs/Makefile | 3 fs/afs/cache.c | 505 ++++++++++++++++++++++++++++++++++------------------ fs/afs/cache.h | 15 -- fs/afs/cell.c | 16 +- fs/afs/file.c | 212 +++++++++++++--------- fs/afs/inode.c | 26 +-- fs/afs/internal.h | 53 ++--- fs/afs/main.c | 27 +-- fs/afs/mntpt.c | 4 fs/afs/vlocation.c | 23 +- fs/afs/volume.c | 14 - fs/afs/write.c | 6 - 13 files changed, 537 insertions(+), 375 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index 83d1227..7f3278f 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -2120,6 +2120,14 @@ config AFS_DEBUG If unsure, say N. +config AFS_FSCACHE + bool "Provide AFS client caching support (EXPERIMENTAL)" + depends on EXPERIMENTAL + depends on AFS_FS=m && FSCACHE || AFS_FS=y && FSCACHE=y + help + Say Y here if you want AFS data to be cached locally on disk through + the generic filesystem cache manager + config 9P_FS tristate "Plan 9 Resource Sharing Support (9P2000) (Experimental)" depends on INET && NET_9P && EXPERIMENTAL diff --git a/fs/afs/Makefile b/fs/afs/Makefile index a666710..4f64b95 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -2,7 +2,10 @@ # Makefile for Red Hat Linux AFS client. # +afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o + kafs-objs := \ + $(afs-cache-y) \ callback.o \ cell.o \ cmservice.o \ diff --git a/fs/afs/cache.c b/fs/afs/cache.c index de0d7de..8e179a9 100644 --- a/fs/afs/cache.c +++ b/fs/afs/cache.c @@ -9,248 +9,399 @@ * 2 of the License, or (at your option) any later version. */ -#ifdef AFS_CACHING_SUPPORT -static cachefs_match_val_t afs_cell_cache_match(void *target, - const void *entry); -static void ...
