These patches add local caching for network filesystems such as NFS and AFS.
FS-Cache now runs fully asynchronously as required by Trond Myklebust for NFS.
--
Changes:
[try #3]:
(*) Added missing file to CacheFiles patch.
(*) Made new security functions return errors and pass actual return data via
argument pointer.
(*) Cleaned up NFS patch.
(*) The 'fsc' flag must now be passed to NFS mount by the string options.
(*) Split the NFS patch into three as requested by Trond.
[try #2]:
(*) The CacheFiles module no longer accepts directory fds in its cull and
inuse commands from cachefilesd. Instead it uses the current working
directory of the calling process as the basis for looking up the object.
Corollary to this, fget_light() no longer needs to be exported.
--
A tarball of the patches is available at:
http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-22.tar.bz2
To use this version of CacheFiles, the cachefilesd-0.9 is also required. It
is available as an SRPM:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm
Or as individual bits:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec
The .fc, .if and .te files are for manipulating SELinux.
David
-
The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.
The invalidatepage() address space op is called (indirectly) to do the honours.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
mm/readahead.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 39bf45d..12d1378 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -15,6 +15,7 @@
#include <linux/backing-dev.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/pagevec.h>
+#include <linux/buffer_head.h>
void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ * such as the NFS fs marking pages that are cached locally on disk, thus we
+ * need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+ struct page *page)
+{
+ if (PagePrivate(page)) {
+ if (TestSetPageLocked(page))
+ BUG();
+ page->mapping = mapping;
+ do_invalidatepage(page, 0);
+ page->mapping = NULL;
+ unlock_page(page);
+ }
+ page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+ struct page *victim;
+
+ while (!list_empty(pages)) {
+ victim = list_to_page(pages);
+ list_del(&victim->lru);
+ read_cache_pages_invalidate_page(mapping, victim);
+ }
+}
+
/**
* ...Recruit a couple of page flags to aid in cache management. The following extra
flags are defined:
(1) PG_fscache (PG_owner_priv_2)
The marked page is backed by a local cache and is pinning resources in the
cache driver.
(2) PG_fscache_write (PG_owner_priv_3)
The marked page is being written to the local cache. The page may not be
modified whilst this is in progress.
If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/splice.c | 2 +-
include/linux/page-flags.h | 30 +++++++++++++++++++++++++++++-
include/linux/pagemap.h | 11 +++++++++++
mm/filemap.c | 16 ++++++++++++++++
mm/migrate.c | 2 +-
mm/page_alloc.c | 3 +++
mm/readahead.c | 9 +++++----
mm/swap.c | 4 ++--
mm/swap_state.c | 4 ++--
mm/truncate.c | 10 +++++-----
mm/vmscan.c | 2 +-
11 files changed, 76 insertions(+), 17 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index c010a72..ae4f5b7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
*/
wait_on_page_writeback(page);
- if (PagePrivate(page))
+ if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..eaf9854 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,19 +83,24 @@
#define PG_private 11 /* If pagecache, has fs-private data */
#define PG_writeback 12 /* Page is under writeback */
+#define PG_owner_priv_2 13 /* Owner use. If pagecache, fs may use */
#define PG_compound 14 /* Part of a compound page */
...Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/wait.h | 1 +
kernel/wait.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..4cae7db 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q)
#define is_sync_wait(wait) (!(wait) || ((wait)->private))
extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t * wait));
extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait));
extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(add_wait_queue);
+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+ unsigned long flags;
+
+ wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue_tail(q, wait);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
-
The attached patch adds a generic intermediary (FS-Cache) by which filesystems
may call on local caching capabilities, and by which local caching backends may
make caches available:
+---------+
| | +--------------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +--------------+
+---------+ +-->| | |
| | | |--+
| AFS |----->| FS-Cache |
| | | |--+
+---------+ +-->| | |
| | | | +--------------+
+---------+ | +----------+ | | |
| | | +-->| CacheFiles |
| ISOFS |--+ | /var/cache |
| | +--------------+
+---------+
The patch also documents the netfs interface and the cache backend
interface provided by the facility.
There are a number of reasons why I'm not using i_mapping to do this.
These have been discussed a lot on the LKML and CacheFS mailing lists,
but to summarise the basics:
(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise, making it difficult
to distinguish blocks that have been read from the network and cached from
those that haven't.
(2) The backing inode must be fully populated before being exposed to
userspace through the main inode because the VM/VFS goes directly to the
backing inode and does not interrogate the front inode on VM ops.
Therefore:
(a) The backing inode must fit entirely within the cache.
(b) All backed files currently open must fit entirely within the cache at
the same time.
(c) A working set of files in total larger than the cache may not be
cached.
(d) A file may not grow larger than the available ...This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-Off-By: David Howells <dhowells@redhat.com> --- arch/ia64/kernel/ia64_ksyms.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index bd17190..20c3546 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); -
Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).
This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.
Supply a generic implementation for this that uses the prepare_write() and
commit_write() address_space operations to bound a copy directly into the page
cache.
Hook the Ext2 and Ext3 operations to the generic implementation.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
fs/ext2/inode.c | 2 +
fs/ext3/inode.c | 3 ++
include/linux/fs.h | 7 ++++
mm/filemap.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 107 insertions(+), 0 deletions(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 0079b2c..b3e4b50 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -695,6 +695,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};
const struct address_space_operations ext2_aops_xip = {
@@ -713,6 +714,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};
/*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index de4e316..93809eb 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1713,6 +1713,7 @@ static const struct address_space_operations ext3_ordered_aops = {
.releasepage = ext3_releasepage,
.direct_IO = ext3_direct_IO,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};
static const struct address_space_operations ext3_writeback_aops = {
@@ -1727,6 +1728,7 @@ static const struct address_space_operations ext3_writeback_aops = {
...Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.
This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
include/linux/pagemap.h | 5 +++++
mm/filemap.c | 19 +++++++++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d1049b6..452fdcf 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -220,6 +220,11 @@ static inline void wait_on_page_fscache_write(struct page *page)
extern void end_page_fscache_write(struct page *page);
/*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
diff --git a/mm/filemap.c b/mm/filemap.c
index 5e419a2..c60c24e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -518,6 +518,25 @@ void fastcall wait_on_page_bit(struct page *page, int bit_nr)
EXPORT_SYMBOL(wait_on_page_bit);
/**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+ wait_queue_head_t *q = page_waitqueue(page);
+ unsigned long flags;
+
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue(q, waiter);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
* unlock_page - unlock a locked page
* @page: the page
*
-
Export a number of functions for CacheFiles's use. Signed-Off-By: David Howells <dhowells@redhat.com> --- fs/super.c | 2 ++ kernel/auditsc.c | 2 ++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index fc8ebed..c0d99dd 100644 --- a/fs/super.c +++ b/fs/super.c @@ -270,6 +270,8 @@ int fsync_super(struct super_block *sb) return sync_blockdev(sb->s_bdev); } +EXPORT_SYMBOL_GPL(fsync_super); + /** * generic_shutdown_super - common helper for ->kill_sb() * @sb: superblock to kill diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 3401293..0112179 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1526,6 +1526,8 @@ add_names: } } +EXPORT_SYMBOL_GPL(__audit_inode_child); + /** * auditsc_get_stamp - get local copies of audit_context values * @ctx: audit_context for the task -
Make it possible for a process's file creation SID to be temporarily overridden
by CacheFiles so that files created in the cache have the right label attached.
Without this facility, files created in the cache will be given the current
file creation SID of whatever process happens to have invoked CacheFiles
indirectly by means of opening a netfs file at the time the cache file is
created.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
include/linux/security.h | 39 +++++++++++++++++++++++++++++++++++++++
security/dummy.c | 14 ++++++++++++++
security/selinux/hooks.c | 20 ++++++++++++++++++++
3 files changed, 73 insertions(+), 0 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h
index c11dc8a..edd1677 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1147,6 +1147,15 @@ struct request_sock;
* @secdata contains the security context.
* @seclen contains the length of the security context.
*
+ * @get_fscreate_secid:
+ * Get the current FS security ID.
+ * @secid points the location in which to return the security ID.
+ *
+ * @set_fscreate_secid:
+ * Set the current FS security ID.
+ * @secid contains the security ID to set.
+ * @oldsecid points the location in which to return the old security ID.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1330,6 +1339,8 @@ struct security_operations {
int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size);
int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen);
void (*release_secctx)(char *secdata, u32 seclen);
+ int (*get_fscreate_secid)(u32 *secid);
+ int (*set_fscreate_secid)(u32 secid, u32 *oldsecid);
#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2127,6 +2138,16 @@ static inline void security_release_secctx(char *secdata, u32 seclen)
return security_ops->release_secctx(secdata, seclen);
}
+static inline int ...I still object to the use of sids in LSM interfaces. I still owe you a viable alternative. Casey Schaufler casey@schaufler-ca.com -
Add an act-as SID to task_security_struct that is equivalent to fsuid/fsgid in
task_struct. This permits a task to perform operations as if it is the
overriding SID, without changing its own SID as that might be needed to control
access to the process by ptrace, signals, /proc, etc.
This is useful for CacheFiles in that it allows CacheFiles to access the cache
files and directories using the cache's security context rather than the
security context of the process on whose behalf it is working, and in the
context of which it is running.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
include/linux/security.h | 36 ++++++++
security/dummy.c | 14 +++
security/selinux/exports.c | 2
security/selinux/hooks.c | 162 +++++++++++++++++++++++--------------
security/selinux/include/objsec.h | 1
security/selinux/selinuxfs.c | 2
security/selinux/xfrm.c | 6 +
7 files changed, 156 insertions(+), 67 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h
index edd1677..194ef49 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1156,6 +1156,18 @@ struct request_sock;
* @secid contains the security ID to set.
* @oldsecid points the location in which to return the old security ID.
*
+ * @act_as_secid:
+ * Set the security ID as which to act, returning the security ID as which
+ * the process was previously acting.
+ * @secid contains the security ID to act as.
+ * @oldsecid points the location in which to return the displaced security ID.
+ *
+ * @act_as_self:
+ * Reset the security ID as which to act to be the same as the process's
+ * owning security ID, and return the security ID as which the process was
+ * previously acting.
+ * @oldsecid points the location in which to return the displaced security ID.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1341,6 +1353,8 @@ struct ...Permit an inode's security ID to be obtained by the CacheFiles module. This is
then used as the SID with which files and directories will be created in the
cache.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
include/linux/security.h | 19 +++++++++++++++++++
security/dummy.c | 7 +++++++
security/selinux/hooks.c | 9 +++++++++
3 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h
index 194ef49..a54958a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -414,6 +414,11 @@ struct request_sock;
* the size of the buffer required.
* Returns number of bytes used/required on success.
*
+ * @inode_get_secid:
+ * Retrieve the security ID from an inode.
+ * @inode refers to the inode to get the security ID from.
+ * @secid points the location in which to return the security ID.
+ *
* Security hooks for file operations
*
* @file_permission:
@@ -1256,6 +1261,7 @@ struct security_operations {
int (*inode_getsecurity)(const struct inode *inode, const char *name, void *buffer, size_t size, int err);
int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags);
int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t buffer_size);
+ int (*inode_get_secid)(struct inode *inode, u32 *secid);
int (*file_permission) (struct file * file, int mask);
int (*file_alloc_security) (struct file * file);
@@ -1818,6 +1824,13 @@ static inline int security_inode_listsecurity(struct inode *inode, char *buffer,
return security_ops->inode_listsecurity(inode, buffer, buffer_size);
}
+static inline int security_inode_get_secid(struct inode *inode, u32 *secid)
+{
+ if (unlikely(IS_PRIVATE(inode)))
+ return 0;
+ return security_ops->inode_get_secid(inode, secid);
+}
+
static inline int security_file_permission (struct file *file, int mask)
{
return security_ops->file_permission (file, mask);
@@ ...Get the SID under which the CacheFiles module should operate so that the
SELinux security system can control the accesses it makes.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
include/linux/security.h | 20 ++++++++++++++++++++
security/dummy.c | 7 +++++++
security/selinux/hooks.c | 7 +++++++
3 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h
index a54958a..593a4d0 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1173,6 +1173,14 @@ struct request_sock;
* previously acting.
* @oldsecid points the location in which to return the displaced security ID.
*
+ * @cachefiles_get_secid:
+ * Determine the security ID for the CacheFiles module to use when
+ * accessing the filesystem containing the cache.
+ * @secid contains the security ID under which cachefiles daemon is
+ * running.
+ * @modsecid contains the pointer to where the security ID for the module
+ * is to be stored.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1361,6 +1369,7 @@ struct security_operations {
int (*set_fscreate_secid)(u32 secid, u32 *oldsecid);
int (*act_as_secid)(u32 secid, u32 *oldsecid);
int (*act_as_self)(u32 *oldsecid);
+ int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);
#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2185,6 +2194,11 @@ static inline int security_act_as_self(u32 *oldsecid)
return security_ops->act_as_self(oldsecid);
}
+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ return security_ops->cachefiles_get_secid(secid, modsecid);
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2897,6 +2911,12 @@ static inline u32 security_act_as_self(u32 *oldsecid)
return 0;
}
+static inline int security_cachefiles_get_secid(u32 secid, u32 ...Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a backing store for the cache. CacheFiles uses a userspace daemon to do some of the cache management - such as reaping stale nodes and culling. This is called cachefilesd and lives in /sbin. The source for the daemon can be downloaded from: http://people.redhat.com/~dhowells/cachefs/cachefilesd.c And an example configuration from: http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf The filesystem and data integrity of the cache are only as good as those of the filesystem providing the backing services. Note that CacheFiles does not attempt to journal anything since the journalling interfaces of the various filesystems are very specific in nature. CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for communication with the daemon. Only one thing may have this open at once, and whilst it is open, a cache is at least partially in existence. The daemon opens this and sends commands down it to control the cache. CacheFiles is currently limited to a single cache. CacheFiles attempts to maintain at least a certain percentage of free space on the filesystem, shrinking the cache by culling the objects it contains to make space if necessary - see the "Cache Culling" section. This means it can be placed on the same medium as a live set of data, and will expand to make use of spare space and automatically contract when the set of data requires more space. ============ REQUIREMENTS ============ The use of CacheFiles and its daemon requires the following features to be available in the system and in the cache filesystem: - dnotify. - extended attributes (xattrs). - openat() and friends. - bmap() support on files in the filesystem (FIBMAP ioctl). - The use of bmap() to detect a partial page at the end of the file. It is strongly recommended that the "dir_index" option is enabled on Ext3 filesystems being used as a ...
The attached patch makes it possible for the NFS filesystem to make use of the network filesystem local caching service (FS-Cache). To be able to use this, an updated mount program is required. This can be obtained from: http://people.redhat.com/steved/fscache/util-linux/ To mount an NFS filesystem to use caching, add an "fsc" option to the mount: mount warthog:/ /a -o fsc Signed-Off-By: David Howells <dhowells@redhat.com> --- fs/nfs/Makefile | 1 fs/nfs/client.c | 5 + fs/nfs/file.c | 51 ++++++ fs/nfs/fscache-def.c | 288 +++++++++++++++++++++++++++++++++++ fs/nfs/fscache.c | 374 +++++++++++++++++++++++++++++++++++++++++++++ fs/nfs/fscache.h | 144 +++++++++++++++++ fs/nfs/inode.c | 48 +++++- fs/nfs/read.c | 28 +++ fs/nfs/sysctl.c | 44 +++++ include/linux/nfs_fs.h | 8 + include/linux/nfs_fs_sb.h | 7 + 11 files changed, 988 insertions(+), 10 deletions(-) diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index b55cb23..07c9345 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o nfs-objs := $(nfs-y) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index a49f9fe..f1783b2 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -41,6 +41,7 @@ #include "delegation.h" #include "iostat.h" #include "internal.h" +#include "fscache.h" #define NFSDBG_FACILITY NFSDBG_CLIENT @@ -137,6 +138,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname, clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED; #endif + nfs_fscache_get_client_cookie(clp); + return clp; error_3: @@ -168,6 +171,8 @@ static void nfs_free_client(struct nfs_client *clp) nfs4_shutdown_client(clp); ...
Changes to the kernel configuration defintions and to the NFS mount options to
allow the local caching support added by the previous patch to be enabled.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/Kconfig | 8 ++++++++
fs/nfs/client.c | 14 ++++++++++----
fs/nfs/internal.h | 2 ++
fs/nfs/super.c | 40 ++++++++++++++++++++++++++++++++++------
4 files changed, 54 insertions(+), 10 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 7feb4cb..76d5d16 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1600,6 +1600,14 @@ config NFS_V4
If unsure, say N.
+config NFS_FSCACHE
+ bool "Provide NFS client caching support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+ help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
config NFS_DIRECTIO
bool "Allow direct I/O on NFS files"
depends on NFS_FS
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index f1783b2..0de4db4 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -543,7 +543,8 @@ error:
/*
* Create a version 2 or 3 client
*/
-static int nfs_init_server(struct nfs_server *server, const struct nfs_mount_data *data)
+static int nfs_init_server(struct nfs_server *server, const struct nfs_mount_data *data,
+ unsigned int extra_options)
{
struct nfs_client *clp;
int error, nfsvers = 2;
@@ -580,6 +581,7 @@ static int nfs_init_server(struct nfs_server *server, const struct nfs_mount_dat
server->acregmax = data->acregmax * HZ;
server->acdirmin = data->acdirmin * HZ;
server->acdirmax = data->acdirmax * HZ;
+ server->options = extra_options;
/* Start lockd here, before we might error out */
error = nfs_start_lockd(server);
@@ -776,6 +778,7 @@ void nfs_free_server(struct nfs_server *server)
* - keyed on server and FSID
*/
struct nfs_server *nfs_create_server(const struct nfs_mount_data *data,
+ unsigned ...Display the local caching state in /proc/fs/nfsfs/volumes.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/nfs/client.c | 7 ++++---
fs/nfs/fscache.h | 12 ++++++++++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 0de4db4..d350668 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1319,7 +1319,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
/* display header on line 1 */
if (v == &nfs_volume_list) {
- seq_puts(m, "NV SERVER PORT DEV FSID\n");
+ seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1333,12 +1333,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);
- seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+ seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
clp->cl_nfsversion,
NIPQUAD(clp->cl_addr.sin_addr),
ntohs(clp->cl_addr.sin_port),
dev,
- fsid);
+ fsid,
+ nfs_server_fscache_state(server));
return 0;
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 44bb0d1..77f3450 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -56,6 +56,17 @@ extern void __nfs_fscache_invalidate_page(struct page *, struct inode *);
extern int nfs_fscache_release_page(struct page *, gfp_t);
/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+ if (server->nfs_client->fscache &&
+ (server->options & NFS_OPTION_FSCACHE))
+ return "yes";
+ return "no ";
+}
+
+/*
* release the caching state associated with a page if undergoing complete page
* invalidation
*/
@@ -110,6 +121,7 @@ static inline void nfs_fscache_unregister(void) {}
static inline void nfs_fscache_get_client_cookie(struct ...How would you expect an LSM that is not SELinux to interface with CacheFiles? You have gone to a great deal of effort to support the requirements of an SELinux system, and that's good, but you have extended the LSM interface to expose SELinux data structures (secids) and require them for the operation of CacheFiles, and that's bad. The data used within an LSM is private to the LSM, and this applies to SELinux as well as to any other LSM that may come along, such as the Smack LSM I'm working on. This applies to task data as well as file data. Further, the behavior of the system in the presence of an LSM should be controlled by the LSM, it is more than a little scary that CacheFiles is enforcing SELinux policy based on secids that may be coming from a different LSM. I applaud the integration of CacheFiles with SELinux. Unfortunately, you've done so using the LSM interface in such a way that an LSM other than SELinux is likely to demonstrate inappropriate behaviors in the presence of CacheFiles because you have so carefully integrated the SELinux requirements. If the integration with SELinux is important to you, and I would expect that it is given the work you've put into it, I suggest that the SELinux specific behaviors be identified so that another LSM can provide the behavior appropriate to the policy it chooses to enforce and put that into SELinux with an LSM interface. I know that you're looking at a significant effort to do that, but I wouldn't think that you'd want CacheFiles to behave badly in the presence of an LSM that doesn't happen to be SELinux. I also know it's tempting to point out the SELinux is the only upstream LSM. I hope to change that before too long, and I know there are others with ambitions as well. I would not like to see CacheFiles have to get excluded in the presence of other LSMs and I doubt you would either. Casey Schaufler casey@schaufler-ca.com -
You have to understand that I didn't know that much about the LSM interface,
so I asked advice of the Red Hat security people, who, naturally, pointed me
at the SELinux mailing list. I knew my stuff would have to work with SELinux
to be used with RH stuff.
Furthermore, as you pointed out, there aren't any other LSM modules upstream
yet for me to work against. I would like CacheFiles to work with all LSM
modules in general, but I don't know how to do that yet.
I'm open to suggestion as to how to modify things to support any LSM.
Btw, do you understand the problems that CacheFiles has to deal with? If I
set this down clearly, this may help you or someone else suggest a better way
to do things.
(1) Some random process tries to access a file on a network filesystem
(NFS example).
(2) NFS goes to the cache to attempt to read the data from there prior to
going to the network.
(3) The cache driver wants to access the files in the cache, but it's
running in the security context of either the aforementioned random
process, or one of FS-Cache's thread pool.
This security context, however, doesn't necessarily give it the rights
to access what's in the cache, so the driver has to be permitted to act
as a context appropriate to accessing the cache, without changing the
overall security context of the random process (which would impact
things trying to act on that process - kill() for example).
(4) Assuming the data is found in the cache, all well and good, but if it
isn't, the cache driver will have to create some files in the cache.
Now, if the cache driver just went ahead and created the files, they
could end up with their own security contexts being derived from the
random process's security context, thus potentially making it impossible
for other processes to access the cache.
So the file-creation part of the security context must also be
overridden ...While neither is upstream you can certainly look at AppArmor and Smack, It's been a long time since I dealt with file system cacheing, and that was under Unix, and I don't claim to have a working understanding I think that this is the point you should attack. Control the security characteristics of the cache driver properly and you shouldn't need the Can you run the cache as an independent thread and send it messages rather than trying to do things in the context of the calling process? Yes, and the SELinux semantics for what label to give a file don't help much, either. The problem with the "act_as" interfaces is that I wouldn't expect them to be any more reliable than the old access() Ideally you want to be running in the right context to create the new file so that no one can use it and then label it "correctly" The cache driver is a unique case with an unusual function. It's pretty obvious that the kernel architecture, the VFS architecture, LSM, SELinux, NFS and pretty much everyone else has given no thought whatever to the implications of their designs on file system cacheing. For all concerned, I'll say "sorry 'bout that". Casey Schaufler casey@schaufler-ca.com -
How? The cache driver acts on behalf of someone else. That someone else has one security context, but the cache itself has to have a different context so that the cache can be shared. It introduces more complexity, which I believe you were just arguing against above... It also incurs more kernel threads - which I really really want to avoid. I would rank the complexity and resource overhead of the act-as stuff in LSM (or at least in SELinux) as much less than what you're suggesting. As it stands, the FS-Cache layer has a pool of threads that CacheFiles makes use of, but this can't be bound to the security of a specific cache because I suspect that's more by the fact that security wasn't particularly thought about when these interfaces were first written. As with everything in the Meaning you think I should just give up on this? How about I reduce the interface I'm proposing to two functions: (1) int security_act_as(struct task_struct *context) Temporarily make the current process act as the given task, including, for example, for SELinux, the security ID with which this task acts on things, and the security ID with which this task creates files. (2) int security_act_as_self(void); Restore the context as which we're asking. This would mean that the task's security context would have to be able to store acting security IDs for everything, but I don't think that's too much of a stretch resourcewise. David -
No, sorry, sometimes I sound meaner than I really am. I meant that I haven't looked into the issues at all and I bet there are plenty, maybe in audit and places outside of the security realm, but this looks like a clean approach from the LSM interface standpoint. Do you want the entire task or just task->security? I could see it either way, but I suspect the task is your best bet. If you call security_act_as() twice, then security_act_as_self() do you pop a stack, or return to the initial state? How about security_act_as(NULL) returning you to the initial state, and dropping security_act_as_self()? Thank you for taking the effort to address the issues I raised. I appreciate your willingness to accommodate my concerns even after I'd flamed you. Casey Schaufler casey@schaufler-ca.com -
It would probably have to be the task struct, lest the security information Good point. I've pondered that. What I have at the moment partly acts like a stack in that I store some of the shifted-out context on the machine stack (in struct cachefiles_secctx). The act-as context should probably be shifted too, That would be fine. Actually, to address Stephen Smalley's requirements also, how about making things a bit more complex. Have the following suite of functions: (1) int security_get_context(struct sec **_context); This allocates and gives the caller a blob that describes the current context of all the LSM module states attached to the current task and stores a pointer to it in *_context. (2) int security_push(struct sec *context, struct sec **_old_context) This causes all the LSM modules on the current task to switch to a new acting state, passing back the old state. It does not change how other tasks do things to this one. (3) int security_pop(struct sec *context) This causes all the LSM modules on the current task to switch to a new acting state, deleting the old state. It does not change how other tasks do things to this one. (4) int security_delete_context(struct sec *context) This deletes a context blob. The context blob could then be structured very simply. Give each loaded LSM module an integer index as it is registered. Having a limit to the number of LSM modules would make things simpler. The blob would then be an array of void pointers, one per LSM module, indexed by the integer index for each one. It you don't have a limit on the number of LSM modules, you'd also need a count of slots in the blob. Any LSM module that wanted to implement the above three functions would fill in or otherwise use the slot that belongs to it. Otherwise the slot would just be left NULL. For example: context --->+--------+ +---------+ | SLOT 0 |----------------------------------->| ...
Seems like over-design - we don't need to support LSM stacking, and we don't need to support pushing/popping more than one level of context. What was the objection again to the original interface, aside from replacing "u32 secids" with "void* security blobs"? -- Stephen Smalley National Security Agency -
It will, at some point hopefully, be possible for someone to try, say, NFS exporting a cached ISO9660 mount (CDROM) - in which case, we'd should allow for two levels of stack. If we can pass the displaced context to the caller to restore later then that allows for more or less unlimited depth. It occurs to me that the following is almost good enough, but not quite: (1) int security_get_context(void **_context); This allocates and gives the caller a blob that describes the current context of all the LSM module states attached to the current task and stores a pointer to it in *_context. (2) int security_push(void *context, struct sec **_old_context) This causes all the LSM modules on the current task to switch to a new acting state, passing back the old state. It does not change how other tasks do things to this one. (3) int security_pop(void *context) This causes all the LSM modules on the current task to switch to a new acting state, deleting the old state. It does not change how other tasks do things to this one. (4) int security_delete_context(void *context) I still need a way to transform the cachefilesd context into the kernel's context. See patch: Subject: [Linux-cachefs] [PATCH 12/16] CacheFiles: Get the SID under which the CacheFiles module should operate [try #3] However, this seems to add a fairly generic tranformation, so that could be generalised: I got the impression that Casey thought much of this was tied to SELinux, but rereading his/her emails, I'm not so certain. Maybe that's sufficient. Casey? However, I've realised a problem (as outlined above) with what I've got. Namely its stack isn't necessarily deep enough. Alternatively, nfsd perhaps should suppress caching on what it reads. David -
I assume that you're talking about the LSM specific data changing, not the LSM itself. If you change the task->security information you are definitly going to change what other tasks can do to the calling task. This is part of the dark side of label swapping. This is what I was trying to suggest when I said that if you're going to switch labels you switch to a system-daemon label, do your work, then change the file label explicitly. Stephen may have a trick up his sleeve for SELinux, but I don't I did get the impression that your initial design was focused on SELinux, and that the implications of alternative LSM modules had not been very high on your priority list. It's clear from That's the really nice thing about cans of worms. They come in six-packs. Casey Schaufler casey@schaufler-ca.com -
I dealt with that in my current act-as patch. Under SELinux a task has two primary labels. One with which it is labelled and is used to govern effects upon it, and one that is used to act upon things and follows changes to the In CacheFiles case, the cachefilesd daemon's security label into the label the Yeah... David -
The specification of your push interface that the push operation not affect how others access the process is OK for SELinux, but not for any other MAC scheme that I've dealt with, and I think that's most of them. Nuts. Smack, for example, uses exactly one label on the process for all purposes. Are you concerned about accesses other than signals? Signals could be staitforward to deal with in a pushed situation, but I'd hesitate to say that the solution would generalize without I'm not sure I understand what this is doing. Casey Schaufler casey@schaufler-ca.com -
It's a fairly important concept. The victimisation security context on a process must not change, even if the kernel overrides the security context that that process acts as so that it can transparently do work on its behalf. IMO, the right way to do this is to pass the security context directly to There's also /proc and ptrace() for example. ps -z must not show the CacheFiles consists of two parts: the kernel module which creates things in the cache and does accesses into the cache on behalf of processes that access cached filesystems, and the userspace daemon that builds cull tables and deletes things. The reason there are two security labels is that the daemon's label gives it just enough rights to be able to do its job. More or less all it can do is lookup, opendir, readdir, stat, rmdir, unlink and open the chardev for talking to the kernel module. This means that the daemon can't, for example, be made to read or modify cache storage objects. Thus means, however, that the daemon's label isn't sufficient for the kernel module to do its job. But since there's no way for the kernel module to directly get a label (and indeed it doesn't know the label it needs), a transformation has to be applied that turns the process label used by the daemon into a process label that the kernel, and only the kernel, can use. The kernel's label gives it, amongst other things, the additional rights to do mkdir, creat, open, read, write, setxattr, getxattr, rename - things the daemon isn't allowed to do. David -
With Smack you can leave the label alone, raise CAP_MAC_OVERRIDE, do your business of setting the label correctly, and then drop the capability. No new hooks required. Casey Schaufler casey@schaufler-ca.com -
That sounds like a contradiction. How can you both leave it alone and set it? David -
Whoops, sorry. You leave the process label alone and explicitly set the file label using the xattr interfaces. Casey Schaufler casey@schaufler-ca.com -
xattr interfaces don't help with the initial labeling of the file when it is created. -- Stephen Smalley National Security Agency -
That's true. The deamon needs to run with an appropriate label. I don't believe that this is situation with a really simple solution Casey Schaufler casey@schaufler-ca.com -
That's the wrong way to do things. There'd then be a window in which cachefilesd (the userspace daemon) could attempt to view the file when the file has the wrong label attached. David -
Except that CAP_MAC_OVERRIDE doesn't exist upstream, and if it did, it would represent Smack-specific logic in the core kernel (when you're complaining about SELinux-specific logic there). So even that would have to be encapsulated within a hook. -- Stephen Smalley National Security Agency -
LSM stacking has always been contentious and I don't see that it addresses the issue, which is changing the data used The objection centers around exposing LSM specific data outside the LSM, and it applies to either secids or blobs, really. If you need this information outside the LSM odds are good that what you're using it for is going to be LSM specific, and hence should be inside the LSM. I admit to two gray areas, audit and system service tasks such as the two cited here. I like simplicity and find the single security_act_as() interface attractive for the latter case. Casey Schaufler casey@schaufler-ca.com -
I don't see how that helps with nfsd assuming the label of a remote -- Stephen Smalley National Security Agency -
Well, assuming that nfsd assuming the label of a remote client is
a good idea ...
newtask = taskstructdup(current);
newtask->security = security_of_client;
security_act_as(newtask);
... do interesting things ...
security_act_as_self(); /* security_act_as(NULL); ? */
cleanup_newtask(...)
... would be the basic flow. For what it's worth, and the whole
issue is being debated with gusto elsewhere, there are enough
problems with nfsd using this approach that it may not be worth
Casey Schaufler
casey@schaufler-ca.com
-
Parts of it are unique, but some of the same issues crop up in nfs - we will need a way there as well for nfsd to assume the client process' label for permission checking and new file labeling purposes, and the act_as hook is not fundamentally different than what nfsd does today with the fsuid/fsguid, just applied to the security label. -- Stephen Smalley National Security Agency -
