Re: Pull request for FS-Cache, including NFS patches

Previous thread: [PATCH] cifs: fix buffer overrun in parse_DFS_referrals by Jeff Layton on Wednesday, December 17, 2008 - 4:31 am. (3 messages)

Next thread: [PATCH mmotm] nilfs2: fix gc failure on volumes keeping numerous snapshots by Ryusuke Konishi on Wednesday, December 17, 2008 - 8:50 pm. (1 message)
From: David Howells
Date: Wednesday, December 17, 2008 - 5:30 pm

Hi Stephen,

Can you try pulling the master branch of this tree:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nfs-fscache.git

into linux-next, please?

This tree includes the following:

 (1) The 'next' branch of the security tree, which you already have.

 (2) The 'linux-next' branch of the NFS tree, which you already have.

 (3) My FS-Cache, CacheFiles and AFS patches, and associated enablement
     patches.

 (4) My patches to enable NFS to use FS-Cache.

I've tried merging into next-20081217, and it just applied and the tests
worked upon it.

David
---
The following changes since commit 1bda71282ded6a2e09a2db7c8884542fb46bfd4f:
  Linus Torvalds (1):
        Merge branch 'for-linus' of git://git.kernel.org/.../ieee1394/linux1394-2.6

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nfs-fscache.git master

Al Viro (1):
      Audit: Log TIOCSTI

David Howells (123):
      CRED: Wrap task credential accesses in the IA64 arch
      CRED: Wrap task credential accesses in the MIPS arch
      CRED: Wrap task credential accesses in the PA-RISC arch
      CRED: Wrap task credential accesses in the PowerPC arch
      CRED: Wrap task credential accesses in the S390 arch
      CRED: Wrap task credential accesses in the x86 arch
      CRED: Wrap task credential accesses in the block loopback driver
      CRED: Wrap task credential accesses in the tty driver
      CRED: Wrap task credential accesses in the ISDN drivers
      CRED: Wrap task credential accesses in the network device drivers
      CRED: Wrap task credential accesses in the USB driver
      CRED: Wrap task credential accesses in 9P2000 filesystem
      CRED: Wrap task credential accesses in the AFFS filesystem
      CRED: Wrap task credential accesses in the autofs filesystem
      CRED: Wrap task credential accesses in the autofs4 filesystem
      CRED: Wrap task credential accesses in the BFS filesystem
      CRED: Wrap ...
From: Stephen Rothwell
Date: Thursday, December 18, 2008 - 4:44 am

Hi David,

On Thu, 18 Dec 2008 00:30:21 +0000 David Howells <dhowells@redhat.com> wrot=

Added from today.

Usual spiel: all patches in that branch must have been
	posted to a relevant mailing list
	reviewed
	unit tested
	destined for the next merge window (or the current release)
*before* they are included.
--=20
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Christoph Hellwig
Date: Thursday, December 18, 2008 - 7:24 am

I don't think we want fscache for .29 yet.  I'd rather let the
credential code settle for one release, and have more time for actually
reviewing it properly and have it 100% ready for .30.

--

From: Andrew Morton
Date: Thursday, December 18, 2008 - 1:36 pm

On Thu, 18 Dec 2008 09:24:20 -0500

I don't believe that it has yet been convincingly demonstrated that we
want to merge it at all.

It's a huuuuuuuuge lump of new code, so it really needs to provide
decent value.  Can we revisit this?  Yet again?  What do we get from
all this?

--

From: Bernd Schubert
Date: Thursday, December 18, 2008 - 4:07 pm

I really don't understand why fs-cache is always rejected. Actually it is the 
perfect solution for NFS booted systems - you have a big cluster of nodes and 
in order to minimize administration overhead the nodes are booted over NFS 
from one common chroot. With unionfs (preferred solution here is unionfs-fuse) 
one then maintains files required to be differently by different clients.

Caching files on the local disk minimized the network access and boosts the 
performance, so at least for this usage example fs-cache would be great.
(Actually I have been thinking about to implement a caching branch into 
unionfs-fuse, but if the kernel can do it on its own, it is also fine.)

In the past David already posted many benchmarks and just a few weeks ago 
again:

http://lkml.indiana.edu/hypermail/linux/kernel/0811.3/00584.html


Cheers,
Bernd


--

From: Andrew Morton
Date: Thursday, December 18, 2008 - 4:26 pm

On Fri, 19 Dec 2008 00:07:33 +0100

It's never been rejected.  For a long time it has been in a state where
we're looking for the data which would allow us to agree that its
benefits are worth its costs.  AFAIK that has never really been
convincingly demonstrated.  Nor has the converse case been


OK, benchmarks are good.   But look:

 303 files changed, 21049 insertions(+), 3726 deletions(-)

it's an enormous hunk of code.  That will be in the kernel for ever and
ever, needing maintenance, adding additional burden to our effort to
evolve the kernel, etc.


Are any distros pushing for this?  Or shipping it?  If so, are they
able to weigh in and help us with this quite difficult decision?

--

From: Stephen Rothwell
Date: Thursday, December 18, 2008 - 5:05 pm

Hi David,

Given the ongoing discussions around FS-Cache, I have removed it from
linux-next.  Please ask me to include it again (if sensible) once some
decision has been reached about its future.

--=20
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Stephen Rothwell
Date: Sunday, December 28, 2008 - 8:45 pm

Hi David,

On Fri, 19 Dec 2008 11:05:39 +1100 Stephen Rothwell <sfr@canb.auug.org.au> =

What was the result of discussions around FS-Cache?  I ask because it
reappeared in linux-next today via the nfs tree (merged into that on Dec
24 and 25) ...

--=20
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Andrew Morton
Date: Sunday, December 28, 2008 - 9:01 pm

There was none.

Dan Muntz's question:

  Solaris has had CacheFS since ~1995, HPUX had a port of it since
  ~1997.  I'd be interested in evidence of even a small fraction of
  Solaris and/or HPUX shops using CacheFS.  I am aware of customers who
  thought it sounded like a good idea, but ended up ditching it for
  various reasons (e.g., CacheFS just adds overhead if you almost
  always hit your local mem cache).

was an very very good one.

Seems that instead of answering it, we've decided to investigate the

oh.
--

From: Trond Myklebust
Date: Monday, December 29, 2008 - 7:30 am

David has given you plenty of arguments for why it helps scale the
server (including specific workloads), has given you numbers validating
his claim, and has presented claims that Red Hat has customers using
cachefs in RHEL-5.
The arguments I've seen against it, have so far been:

     1. Solaris couldn't sell their implementation
     2. It's too big
     3. It's intrusive

Argument (1) has so far appeared to be pure FUD. In order to discuss the
lessons of history, you need to first do the work of analysing and
understanding it first. I really don't see how it is relevant to Linux
whether or not the Solaris and HPUX cachefs implementations worked out
unless you can demonstrate that that their experience shows some fatal
flaw in the arguments and numbers that David presented, and that his
customers are deluded.
If you want examples of permanent caches that clearly do help servers
scale, then look no further than the on-disk caches used in almost all
http browser implemantations. Alternatively, as David mentioned, there
are the on-disk caches used by AFS/DFS/coda.

(2) may be valid, but I have yet to see specifics for where you'd like
to see the cachefs code slimmed down. Did I miss them?

(3) was certainly true 3 years ago, when the code was first presented
for review, and so we did a review and critique then. The NFS specific
changes have improved greatly as a result, and as far as I know, the
security folks are happy too. If you're not happy with the parts that
affect the memory management code then, again, it would be useful to see
specifics that what you want changed.

If there is still controversy concerning this, then I can temporarily
remove cachefs from the nfs linux-next branch, but I'm definitely
keeping it in the linux-mm branch until someone gives me a reason for
why it shouldn't be merged in its current state.

Trond

--

From: Ric Wheeler
Date: Monday, December 29, 2008 - 7:54 am

I can add that our Red Hat customers who tried the cachefs preview did 
find it useful for their workloads (and, by the way, also use the 
Solaris cachefs on solaris boxes if I remember correctly).  They have 
been nagging me and others at Red Hat about getting it into supported 
state for quite a while :-)

As you point out, this is all about getting more clients to be driven by 
a set of NFS servers.

Regards,


--

From: Muntz, Daniel
Date: Monday, December 29, 2008 - 4:05 pm

Before throwing the 'FUD' acronym around, maybe you should re-read the
details.  My point was that there were few users of cachefs even when
the technology had the potential for greater benefit (slower networks,
less powerful servers, smaller memory caches).  Obviously cachefs can
improve performance--it's simply a function of workload and the
assumptions made about server/disk/network bandwidth.  However, I would
expect the real benefits and real beneficiaries to be fewer than in the
past.  HOWEVER^2 I did provide some argument(s) in favor of adding
cachefs, and look forward to extensions to support delayed write,
offline operation, and NFSv4 support with real consistency checking (as
long as I don't have to take the customer calls ;-).  BTW,
animation/video shops were one group that did benefit, and I imagine
they still could today (the one I had in mind did work across Britain,
the US, and Asia and relied on cachefs for overcoming slow network
connections).  Wonder if the same company is a RH customer...

All the comparisons to HTTP browser implementations are, imho, absurd.
It's fine to keep a bunch of http data around on disk because a) it's RO
data, b) correctness is not terribly important, and c) a human is
generally the consumer and can manually request non-cached data if
things look wonky.  It is a trivial case of caching.

As for security, look at what MIT had to do to prevent local disk
caching from breaking the security guarantees of AFS.

Customers (deluded or otherwise) are still customers.  No one is forced
to compile it into their kernel.  Ship it.

  -Dan


-----Original Message-----
From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] 
Sent: Monday, December 29, 2008 6:31 AM
To: Andrew Morton
Cc: Stephen Rothwell; Bernd Schubert; nfsv4@linux-nfs.org;
linux-kernel@vger.kernel.org; steved@redhat.com; dhowells@redhat.com;
linux-next@vger.kernel.org; linux-fsdevel@vger.kernel.org;
rwheeler@redhat.com
Subject: Re: Pull request for FS-Cache, including ...
From: Trond Myklebust
Date: Tuesday, December 30, 2008 - 11:44 am

I did read your argument. My point is that although the argument sounds
reasonable, it ignores the fact that the customer bases are completely
different. The people asking for cachefs on Linux typically run a
cluster of 2000+ clients all accessing the same read-only data from just
a handful of servers. They're primarily looking to improve the
performance and stability of the _servers_, since those are the single
point of failure of the cluster.

As far as I know, historically there has never been a market for 2000+
HP-UX, or even Solaris based clusters, and unless the HP and Sun product
plans change drastically, then simple economics dictates that nor will
there ever be such a market, whether or not they have cachefs support.

OpenSolaris is a different kettle of fish since it has cachefs, and does
run on COTS hardware, but there are other reasons why that hasn't yet

See above. The majority of people I'm aware of that have been asking for
this are interested mainly in improving read-only workloads for data
that changes infrequently. Correctness tends to be important, but the
requirements are no different from those that apply to the page cache.
You mentioned the animation industry: they are prime example of an
industry that satisfies (a), (b), and (c). Ditto the oil and gas
exploration industry, as well as pretty much all scientific computing,

See what David has added to the LSM code to provide the same guarantees
for cachefs...

Trond

--

From: Muntz, Daniel
Date: Tuesday, December 30, 2008 - 3:15 pm

Unless it (at least) leverages TPM, the issues I had in mind can't
really be addressed in code.  One requirement is to prevent a local root
user from accessing fs information without appropriate permissions.
This leads to unwieldly requirements such as allowing only one user on a
machine at a time, blowing away the cache on logout, validating (e.g.,
refreshing) the kernel on each boot, etc.  Sure, some applications won't
care, but you're also potentially opening holes that users may not
consider.

  -Dan

-----Original Message-----
From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] 
Sent: Tuesday, December 30, 2008 10:45 AM
To: Muntz, Daniel
Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert;
nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com;
dhowells@redhat.com; linux-next@vger.kernel.org;
linux-fsdevel@vger.kernel.org; rwheeler@redhat.com
Subject: RE: Pull request for FS-Cache, including NFS patches


I did read your argument. My point is that although the argument sounds
reasonable, it ignores the fact that the customer bases are completely
different. The people asking for cachefs on Linux typically run a
cluster of 2000+ clients all accessing the same read-only data from just
a handful of servers. They're primarily looking to improve the
performance and stability of the _servers_, since those are the single
point of failure of the cluster.

As far as I know, historically there has never been a market for 2000+
HP-UX, or even Solaris based clusters, and unless the HP and Sun product
plans change drastically, then simple economics dictates that nor will
there ever be such a market, whether or not they have cachefs support.

OpenSolaris is a different kettle of fish since it has cachefs, and does
run on COTS hardware, but there are other reasons why that hasn't yet

See above. The majority of people I'm aware of that have been asking for
this are interested mainly in improving read-only workloads for data
that changes infrequently. ...
From: Trond Myklebust
Date: Tuesday, December 30, 2008 - 3:36 pm

You can't prevent a local root user from accessing cached data: that's
true with or without cachefs. root can typically access the data
using /dev/kmem, swap, intercepting tty traffic, spoofing user creds,...
If root can't be trusted, then find another machine.

The worry is rather that privileged daemons may be tricked into
revealing said data to unprivileged users, or that unprivileged users
may attempt to read data from files to which they have no rights using
the cachefs itself. That is a problem that is addressable by means of
LSM, and is what David has attempted to solve.

  Trond

--

From: Muntz, Daniel
Date: Tuesday, December 30, 2008 - 4:00 pm

Yes, and if you have a single user on the machine at a time (with cache
flushed inbetween, kernel refreshed), root can read /dev/kmem, swap,
intercept traffic and read cachefs data to its heart's content--hence,
those requirements.

  -Dan

-----Original Message-----
From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] 
Sent: Tuesday, December 30, 2008 2:36 PM
To: Muntz, Daniel
Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert;
nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com;
dhowells@redhat.com; linux-next@vger.kernel.org;
linux-fsdevel@vger.kernel.org; rwheeler@redhat.com
Subject: RE: Pull request for FS-Cache, including NFS patches


You can't prevent a local root user from accessing cached data: that's
true with or without cachefs. root can typically access the data using
/dev/kmem, swap, intercepting tty traffic, spoofing user creds,...
If root can't be trusted, then find another machine.

The worry is rather that privileged daemons may be tricked into
revealing said data to unprivileged users, or that unprivileged users
may attempt to read data from files to which they have no rights using
the cachefs itself. That is a problem that is addressable by means of
LSM, and is what David has attempted to solve.

  Trond

--

From: Trond Myklebust
Date: Tuesday, December 30, 2008 - 4:17 pm

Unless you _are_ root and can check every executable, after presumably
rebooting into your own trusted kernel, then those requirements won't
mean squat. If you're that paranoid, then you will presumably also be
using a cryptfs-encrypted partition for cachefs, which you unmount when
you're not logged in.

That said, most cluster environments will tend to put most of their
security resources into keeping untrusted users out altogether. The
client nodes tend to be a homogeneous lot with presumably only a trusted
few sysadmins...


--

From: David Howells
Date: Wednesday, December 31, 2008 - 4:15 am

Actually...  Cachefiles could fairly trivially add encryption.  It would have
to be simple encryption but you wouldn't have to store any keys locally.

Currently cachefiles _copies_ data between the backingfs and the netfs pages
because the direct-IO code is only usable to/from userspace.  Rather than
copying, encrypt/decrypt could be called.

A key could be constructed at the point a cache file is looked up.  It could
be constructed from the coherency data.  In the case of NFS that would be
mtime, ctime, isize and change_attr.  The coherency data would be encrypted
with this key and then stored on disk, as would the contents of the file.

It might be possible to chuck the cache key (NFS fh) into the encryption key
too and also encrypt the cache key before it is turned into a filename, though
we'd have to be careful to avoid collisions if each filename is encrypted with
a different key.

We'd probably have to be careful about the coherency data decrypting with a
different key showing up as the wrong but valid thing.

The nice thing about this is that the key need not be retained locally since
it's entirely constructed from data fetched from the netfs.

David
--

From: Muntz, Daniel
Date: Wednesday, December 31, 2008 - 9:11 pm

Sure, trusted kernel and trusted executables,  but it's easier than it
sounds.  If you start with a "clean" system, you don't need to verify
excutables _if_ they're coming from the secured file server (by
induction: if you started out secure, the executables on the file server
will remain secure).  You simply can't trust the local disk from one
user to the next.  Following the protocol, a student can log into a
machine, su to do their OS homework, but not compromise the security of
the distributed file system.

If I can su while another user is logged in, or the kernel/cmds are not
validated between users, cryptfs isn't safe either.

If you're following the protocol, it doesn't even matter if a bad guy
("untrusted user"?) gets root on the client--they still can't gain
inappropriate access to the file server.  OTOH, if my security plan is
simply to not allow root access to untrusted users, history says I'm
going to lose.

  -Dan

-----Original Message-----
From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] 
Sent: Tuesday, December 30, 2008 3:18 PM
To: Muntz, Daniel
Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert;
nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com;
dhowells@redhat.com; linux-next@vger.kernel.org;
linux-fsdevel@vger.kernel.org; rwheeler@redhat.com
Subject: RE: Pull request for FS-Cache, including NFS patches


Unless you _are_ root and can check every executable, after presumably
rebooting into your own trusted kernel, then those requirements won't
mean squat. If you're that paranoid, then you will presumably also be
using a cryptfs-encrypted partition for cachefs, which you unmount when
you're not logged in.

That said, most cluster environments will tend to put most of their
security resources into keeping untrusted users out altogether. The
client nodes tend to be a homogeneous lot with presumably only a trusted
few sysadmins...


--

From: Arjan van de Ven
Date: Thursday, January 1, 2009 - 1:09 am

On Wed, 31 Dec 2008 20:11:13 -0800
"Muntz, Daniel" <Dan.Muntz@netapp.com> wrote:


if you have a user, history says you're going to lose.

you can make your system as secure as you want, with physical access
all bets are off.
keyboard sniffer.. easy.
special dimms that mirror data... not even all THAT hard, just takes a
bit of cash.
running the user in a VM without him noticing.. not too hard either.
etc.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Kyle Moffett
Date: Thursday, January 1, 2009 - 11:40 am

Yeah... this is precisely the reason that the security-test-plan and
system-design-document for any really security sensitive system starts
with:

[  ]  The system is in a locked rack
[  ]  The rack is in a locked server room with detailed access logs
[  ]  The server room is in a locked and secured building with 24-hour
camera surveillance and armed guards

I've spent a little time looking into the security guarantees provided
by DAC and by the FS-Cache LSM hooks, and it is possible to reasonably
guarantee that no *REMOTE* user will be able to compromise the
contents of the cache using a combination of DAC (file permissions,
etc) and MAC (SELinux, etc) controls.  As previously mentioned, local
users (with physical hardware access) are an entirely different story.

As far as performance considerations for the merge... FS-cache on
flash-based storage also has very different performance tradeoffs from
traditional rotating media.  Specifically I have some sample 32GB
SATA-based flash media here with ~230Mbyte/sec sustained read and
~200Mbyte/sec sustained write and with a 75usec read latency.  It
doesn't take much link latency at all to completely dwarf that kind of
access time.

Cheers,
Kyle Moffett
--

From: Arjan van de Ven
Date: Wednesday, December 31, 2008 - 2:49 am

On Tue, 30 Dec 2008 14:15:42 -0800

we're talking about NFS here (but also local CDs and potentially CIFS
etc). The level of security you're talking about is going to be the
same before or after cachefs.... very little against local root.

Frankly, any networking filesystem just trusts that the connection is
authenticated... eg there is SOMEONE on the machine who has the right
credentials. 

Cachefs doesn't change that; it still validates with the server before
giving userspace the data.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: David Howells
Date: Monday, December 29, 2008 - 8:01 am

I disagree with your assertion that there was no result.  Various people,
beside myself, have weighed in with situations where FS-Cache is or may be
useful.  You've been presented with benchmarks showing that it can make a
difference.

However, *you* are the antagonist, as strictly defined in the dictionary; we
were trying to convince *you*, so a result has to come from *you*.  I feel

And to a large extent irrelevant.  Yes, we know caching adds overhead; I've
never tried to pretend otherwise.  It's an exercise in compromise.  You don't
just go and slap a cache on everything.  There *are* situations in which a
cache will help.  We have customers who know about them and are willing to
live with the overhead.

What I have done is to ensure that, even if caching is compiled in, then the
overhead is minimal if there is _no_ cache present.  That is requirement #1 on
my list.

Assuming I understand what he said correctly, I've avoided the main issue
listed by Dan because I don't do as Solaris does and interpolate the cache
between the user and NFS.  Of course, that probably buys me other issues (FS

Sigh.

The main point is that caching _is_ useful, even with its drawbacks.  Dan may
be aware of customers of Sun/HP who thought caching sounds like a good idea,
but then ended up ditching it.  I can well believe it.  But I am also aware of
customers of Red Hat who are actively using the caching we put in RHEL-5 and
customers who really want caching available in future RHEL and Fedora versions
for various reasons.

To sum up:

 (1) Overhead is minimal if there is no cache.

 (2) Benchmarks show that the cache can be effective.

 (3) People are already using it and finding it useful.

 (4) There are people who want it for various projects.

 (5) The use of a cache does not automatically buy you an improvement in
     performance: it's a matter of compromise.

 (6) The performance improvement may be in the network or the servers, not the
     client that is actually doing the ...
From: Andrew Morton
Date: Sunday, December 28, 2008 - 9:07 pm

And that of course means that many many 2.6.28 patches which I am
maintaining will need significant rework to apply on top of linux-next,
and then they won't apply to mainline.  Or that linux-next will not apply
on top of those patches.  Mainly memory management.

Please drop the NFS tree until after -rc1.

Guys, this: http://lkml.org/lkml/2008/12/27/173
--

From: Stephen Rothwell
Date: Sunday, December 28, 2008 - 10:26 pm

Hi Andrew, Trond,


OK, it is dropped for now (including today's tree).

--=20
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: David Howells
Date: Monday, December 29, 2008 - 8:04 am

Significant rework to many many patches?  The FS-Cache patches don't have all

Okay, that's a reasonable request.

David
--

From: David Howells
Date: Monday, December 29, 2008 - 7:26 am

That is the result of discussions during the kernel summit in Portland.  The
discussion here is about whether Andrew agrees with adding the patches or not,
as far as I can tell.  There are a number of people/companies who want them;
there is Andrew who does not.

David
--

From: David Howells
Date: Thursday, December 18, 2008 - 7:27 pm

I should tell you to go and reread LKML at this point.

But...  What can FS-Cache do for you?  Well, in your specific case, probably
nothing.  If all your computers are local to your normal desktop box and are
connected by sufficiently fast network and you have sufficiently few of them,
or you don't use any of NFS, AFS, CIFS, Lustre, CRFS, CD-ROMs then it is
likely that won't gain you anything.

Even if you do use some of those "netfs's", it won't get you anything yet
because I haven't included patches to support anything other than NFS and the
in-kernel AFS client yet.

However, if you do use NFS (or my AFS client), and you are accessing computers
via slow networks, or you have lots of machines spamming your NFS server, then
it might avail you.

It's a compromise: a trade-off between the loading and latencies of your
network vs the loading and latencies of your disk; you sacrifice disk space to
make up for the deficiencies of your network.  The worst bit is that the
latencies are additive under some circumstances (when doing a page read you
may have to check disk and then go to the network).


So, FS-Cache can do the following for you:

 (1) Allow you to reduce network loading by diverting repeat reads to local
     storage.

 (2) Allow you to reduce the latency of slow network links by diverting repeat
     reads to local storage.

 (3) Allow you to reduce the effect of uncontactable servers by serving data
     out of local storage.

 (4) Allows you to reduce the latency of slow rotating media (such as CDROM
     and CD-changers).

 (5) Allow you to implement disconnected operation, partly by (3), but also by
     caching changes for later syncing.

Now, (1) and (2) are readily demonstrable.  I have posted benchmarks to do
this.  (3) to (5) are not yet implemented; these have to be mostly implemented
in the filesystems that use FS-Cache rather than FS-Cache itself.  FS-Cache
currently has sufficient functionality to do (3) and (4), but needs some extra
bits to ...
From: Andrew Morton
Date: Thursday, December 18, 2008 - 7:44 pm

Was that information captured/maintained somewhere?  It really is important (I

I want to be able to have an answer when someone asks me "why was all that

Of course, their opinions (and supporting explanations) would be valuable.
--

From: Ric Wheeler
Date: Thursday, December 18, 2008 - 8:10 pm

The users that I spoke to from the financial sector that tried it are 
still quite interested. One simple use case for them is a very large 
cluster of NFS clients for read-mostly workloads (say, 1000 or more NFS 
clients for shared system partitions). They like the ability to do 
persistent caching across reboots which allows them to have less (and 
less beefy) NFS servers for all of those boxes trying to reboot at once.

The other use case was for the large rendering customers, but I don't 
have first hand knowledge of the details...

Regards,

Ric

--

From: Trond Myklebust
Date: Friday, December 19, 2008 - 5:33 am

OK. I do agree that persistent caches can be (sometimes very!) useful
for certain workloads. As far as I'm concerned, it is mainly a tool to
help scale up the number of clients per server.

One interesting use case that I didn't see David mention is for cluster
boot up. In a lot of the HPC clustered set-ups there tend be a number of
'hot' files that all clients need to access at roughly the same time in
the boot cycle. Pre-loading these files into the persistent cache before
booting the cluster is one way to solve this problem. Server replication
and/or copying the files to local storage on the clients are other
solutions.
The advantage the persistent cache confers over the two other solutions
is that it simplifies the data management: if you do need to change one
or two of those hot files on the server, then the clients will
automatically update their caches (both the page cache and the
persistent cache) without requiring any further action from the cluster
administrator. The disadvantage is that cachefs doesn't yet appear to
have a tool to select those files that are hot and are therefore best
suited to cache (please correct me if I'm wrong, David). The fact that a
file is 'hot' on some given client is not necessarily equivalent to
saying that it is hot on the server and vice versa. Are there any plans
to at some point introduce a tool to manage the persistent cache?

Cheers
  Trond

--

From: David Howells
Date: Friday, December 19, 2008 - 6:32 am

I've come across something similar, where a large company was distributing /usr
to its UNIX/Linux workstations by AFS without persistent local caching.  Cue a
powercut, that took away the power from a large quantity of machines and then

Yes.  I have plans for tools to pin and unpin files, introduce culling
priorities, make space reservations in the cache, and cache readahead.

These aren't, however, immediately necessary to make local caching useful.

To do this, ideally I want a set of ioctl, fcntl or fadvise commands that are
common to all filesystems that will just be ignored if the filesystem isn't
currently doing caching.

Our customers also want to be able to configure this statically, perhaps in
some /etc file.  Something like, on NFS mount X from server Y, fully readahead
all files in or under directory Z.  I have an idea on how to do this, but I
need to thrash it out with Al.

David
--

From: Gabor Gombas
Date: Friday, December 19, 2008 - 9:48 am

Not just boot up. Consider a room full of thin clients using nfsroot and
the lecturer saying "Now everybody open a browser" or "Now everybody
open Openoffice". With just NFS, it takes ages (there is a bottleneck of
a single gigabit link between the clients and the NFS server even though
the server itself has a 10gig card). If we redirect most of /usr/lib to
a small local flash with some LD_LIBRARY_PATH and bind mount trickery we
get an acceptable startup time. The flash is too small to hold even
/usr/lib (flash size: 500M, /usr/lib is: 927M) so it is not possible to
keep everything locally.

It would be really nice if the local caching could be handled
automatically and we would not need so many hacks, so I really look
forward trying FS-Cache if I have time. I used cachefs on Solaris ages
ago and I had good experiences back then; it would be really nice if
Linux would catch up.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------
--

From: David Howells
Date: Friday, December 19, 2008 - 6:03 am

[Empty message]
From: Muntz, Daniel
Date: Thursday, December 18, 2008 - 8:45 pm

Local disk cache was great for AFS back around 1992.  Typical networks
were 10 or 100Mbps (slower than disk access at the time), and memories
were small (typical 16MB).  FS-Cache appears to help only with read
traffic--one reason why the web loves caching--and only for reads that
would miss the buffer/page cache (memories are now "large").  Solaris
has had CacheFS since ~1995, HPUX had a port of it since ~1997.  I'd be
interested in evidence of even a small fraction of Solaris and/or HPUX
shops using CacheFS.  I am aware of customers who thought it sounded
like a good idea, but ended up ditching it for various reasons (e.g.,
CacheFS just adds overhead if you almost always hit your local mem
cache).

One argument in favor that I don't see here is that local disk cache is
persistent (I'm assuming it is in your implementation).

Addressing 1 and 2 in your list, I'd be curious how often a miss in core
is a hit on disk.
Number 3 scares me.  How does this play with the expected semantics of
NFS?
Number 5 is hard, if not provably requiring human intervention to do
syncs when writes are involved (see Disconnected AFS work by
UM/CITI/Huston, and work at CMU).
Add persistence as number 6.  This may be the best reason to have it,
imho.

  -Dan

-----Original Message-----
From: David Howells [mailto:dhowells@redhat.com] 
Sent: Thursday, December 18, 2008 6:27 PM
To: Andrew Morton
Cc: sfr@canb.auug.org.au; linux-kernel@vger.kernel.org;
nfsv4@linux-nfs.org; steved@redhat.com; dhowells@redhat.com;
linux-fsdevel@vger.kernel.org; rwheeler@redhat.com
Subject: Re: Pull request for FS-Cache, including NFS patches


I should tell you to go and reread LKML at this point.

But...  What can FS-Cache do for you?  Well, in your specific case,
probably nothing.  If all your computers are local to your normal
desktop box and are connected by sufficiently fast network and you have
sufficiently few of them, or you don't use any of NFS, AFS, CIFS,
Lustre, CRFS, CD-ROMs then it is likely that won't ...
From: J. Bruce Fields
Date: Thursday, December 18, 2008 - 9:09 pm

Would a disk cache on SSD make any sense?  Seems like it'd change the

More details on the experiences of RHEL/Fedora users might be
interesting.  My (vague, mostly uniformed) impression is that the group
of people who think they need it is indeed a lot larger than the group
who really do need it--but that the latter group still exists.

--

From: David Howells
Date: Friday, December 19, 2008 - 6:22 am

It could, as could using a battery backed RAM cache.

David
--

From: David Howells
Date: Friday, December 19, 2008 - 6:20 am

Yes.  If you connect an NFS client to an NFS server by GigE and have no other
load on the net, and the NFS server can keep the whole dataset in RAM, then
the CacheFiles backend kills your performance.  I have pointed out that you

FS-Cache itself doesn't care whether you're trying to cache read traffic or
write traffic.  The in-kernel AFS filesystem will cache reads and writes, but
the NFS fs will only cache reads as I have currently implemented it.

The reason I did this is that if you make a write on an AFS file, you can
immediately tell if it overlapped with a write from another client, and you
can then nuke your caches.

On NFS, you can't, at least not without proper change attribute support in
NFSv4.  This is not so much a problem with the NFS protocol as a problem with
the fact that the backing filesystems that NFS uses don't or didn't support
it.

However, the same problem affects the NFS pagecache.  If you do a write to an
NFS server, *that* can then be undetectably out of sync with the server.  If
you don't mind that state persisting on disk, FS-Cache presents no barrier to

I know that Solaris and Windows have it; I don't have any facts about how

It can be.  As I said in my email:

	I've tried to implement just the minimal useful functionality for
	persistent caching.  There is more to be added, but it's not immediately
	necessary.

The CacheFiles backend is persistent, but you could write a backend that
isn't, if you, for example, wanted to make use of the >4G of RAM permitted to
some Pentium motherboards as a huge ramdisk.

Not having fully persistent caching allows you to skimp on what metadata you

It depends on what you're doing.  I'm not sure I can give you a better answer


Yeah, I appreciate that it's, erm, 'yucky'.  You can aliken it to doing a GIT
pull or CVS update into your tree and then having to manually fix up the
conflicts.

David
--

From: Muntz, Daniel
Date: Friday, December 19, 2008 - 11:08 am

AFS was designed to support local disk cache, so with callbacks you can
get a consistent system.  I'll ass-u-me that Linux flavors of AFS have
callbacks.  It should be possible to *integrate* NFSv4.x with FS-Cache
similarly (i.e., I don't think you could drop it in as a 'black-box'
without breaking something, unless you explicitly build an independent
proxy cache server).

  -Dan

-----Original Message-----
From: David Howells [mailto:dhowells@redhat.com] 
Sent: Friday, December 19, 2008 5:21 AM
To: Muntz, Daniel
Cc: dhowells@redhat.com; Andrew Morton; sfr@canb.auug.org.au;
linux-kernel@vger.kernel.org; nfsv4@linux-nfs.org; steved@redhat.com;
linux-fsdevel@vger.kernel.org; rwheeler@redhat.com
Subject: Re: Pull request for FS-Cache, including NFS patches

NFS?

I don't know.  Yet it's something you want to do for AFS, I think.


David
--

From: David Howells
Date: Friday, December 19, 2008 - 11:24 am

NFSv4 has equivalents of both the data version number and callbacks.

David
--

From: Bryan Henderson
Date: Friday, December 19, 2008 - 12:53 pm

important.

Maybe for consistency, but for the performance benefits of local disk 
caching, I believe the callbacks are pretty important.  I say that because 
I regularly use an NFS 3 filesystem on the IBM internal network that is 
painfully slow on Linux and fine on AIX with CacheFS.  It was also fine 
when this data was in AFS instead, and if I copy all the files to a local 
disk filesystem.

In this case, the files are _all in local page cache_, so I assume the 
waiting is for the constant stream of transactions the client uses to make 
sure the data in the cache is still current every time I open a file.  AIX 
CacheFS doesn't sweat the consistency, so makes these queries only 
periodically.  AFS had the callbacks, so rather than the client asking 
every time if the data had changed, the server just told it when it did.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Storage Systems

--

From: David Howells
Date: Friday, December 19, 2008 - 6:20 pm

Indeed, but he said "so with callbacks you can get a _consistent_ system".

David
--

From: Muntz, Daniel
Date: Friday, December 19, 2008 - 11:05 pm

Callbacks, instead of assuming your data is good for some short period
of time.  It's all well and good that there are mechanisms to know that
data has changed, but without some leasing/callback/(ugh)mandatory
locking, you're just estimating consistency.

  -Dan 

-----Original Message-----
From: David Howells [mailto:dhowells@redhat.com] 
Sent: Friday, December 19, 2008 5:20 PM
To: Bryan Henderson
Cc: dhowells@redhat.com; Andrew Morton; Muntz, Daniel;
linux-fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org;
nfsv4@linux-nfs.org; rwheeler@redhat.com; sfr@canb.auug.org.au;
steved@redhat.com
Subject: Re: Pull request for FS-Cache, including NFS patches


Indeed, but he said "so with callbacks you can get a _consistent_
system".

David
--

Previous thread: [PATCH] cifs: fix buffer overrun in parse_DFS_referrals by Jeff Layton on Wednesday, December 17, 2008 - 4:31 am. (3 messages)

Next thread: [PATCH mmotm] nilfs2: fix gc failure on volumes keeping numerous snapshots by Ryusuke Konishi on Wednesday, December 17, 2008 - 8:50 pm. (1 message)