Hi Stephen,
Can you try pulling the master branch of this tree:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nfs-fscache.git
into linux-next, please?
This tree includes the following:
(1) The 'next' branch of the security tree, which you already have.
(2) The 'linux-next' branch of the NFS tree, which you already have.
(3) My FS-Cache, CacheFiles and AFS patches, and associated enablement
patches.
(4) My patches to enable NFS to use FS-Cache.
I've tried merging into next-20081217, and it just applied and the tests
worked upon it.
David
---
The following changes since commit 1bda71282ded6a2e09a2db7c8884542fb46bfd4f:
Linus Torvalds (1):
Merge branch 'for-linus' of git://git.kernel.org/.../ieee1394/linux1394-2.6
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nfs-fscache.git master
Al Viro (1):
Audit: Log TIOCSTI
David Howells (123):
CRED: Wrap task credential accesses in the IA64 arch
CRED: Wrap task credential accesses in the MIPS arch
CRED: Wrap task credential accesses in the PA-RISC arch
CRED: Wrap task credential accesses in the PowerPC arch
CRED: Wrap task credential accesses in the S390 arch
CRED: Wrap task credential accesses in the x86 arch
CRED: Wrap task credential accesses in the block loopback driver
CRED: Wrap task credential accesses in the tty driver
CRED: Wrap task credential accesses in the ISDN drivers
CRED: Wrap task credential accesses in the network device drivers
CRED: Wrap task credential accesses in the USB driver
CRED: Wrap task credential accesses in 9P2000 filesystem
CRED: Wrap task credential accesses in the AFFS filesystem
CRED: Wrap task credential accesses in the autofs filesystem
CRED: Wrap task credential accesses in the autofs4 filesystem
CRED: Wrap task credential accesses in the BFS filesystem
CRED: Wrap ...Hi David, On Thu, 18 Dec 2008 00:30:21 +0000 David Howells <dhowells@redhat.com> wrot= Added from today. Usual spiel: all patches in that branch must have been posted to a relevant mailing list reviewed unit tested destined for the next merge window (or the current release) *before* they are included. --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
I don't think we want fscache for .29 yet. I'd rather let the credential code settle for one release, and have more time for actually reviewing it properly and have it 100% ready for .30. --
On Thu, 18 Dec 2008 09:24:20 -0500 I don't believe that it has yet been convincingly demonstrated that we want to merge it at all. It's a huuuuuuuuge lump of new code, so it really needs to provide decent value. Can we revisit this? Yet again? What do we get from all this? --
I really don't understand why fs-cache is always rejected. Actually it is the perfect solution for NFS booted systems - you have a big cluster of nodes and in order to minimize administration overhead the nodes are booted over NFS from one common chroot. With unionfs (preferred solution here is unionfs-fuse) one then maintains files required to be differently by different clients. Caching files on the local disk minimized the network access and boosts the performance, so at least for this usage example fs-cache would be great. (Actually I have been thinking about to implement a caching branch into unionfs-fuse, but if the kernel can do it on its own, it is also fine.) In the past David already posted many benchmarks and just a few weeks ago again: http://lkml.indiana.edu/hypermail/linux/kernel/0811.3/00584.html Cheers, Bernd --
On Fri, 19 Dec 2008 00:07:33 +0100 It's never been rejected. For a long time it has been in a state where we're looking for the data which would allow us to agree that its benefits are worth its costs. AFAIK that has never really been convincingly demonstrated. Nor has the converse case been OK, benchmarks are good. But look: 303 files changed, 21049 insertions(+), 3726 deletions(-) it's an enormous hunk of code. That will be in the kernel for ever and ever, needing maintenance, adding additional burden to our effort to evolve the kernel, etc. Are any distros pushing for this? Or shipping it? If so, are they able to weigh in and help us with this quite difficult decision? --
Hi David, Given the ongoing discussions around FS-Cache, I have removed it from linux-next. Please ask me to include it again (if sensible) once some decision has been reached about its future. --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
Hi David, On Fri, 19 Dec 2008 11:05:39 +1100 Stephen Rothwell <sfr@canb.auug.org.au> = What was the result of discussions around FS-Cache? I ask because it reappeared in linux-next today via the nfs tree (merged into that on Dec 24 and 25) ... --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
There was none. Dan Muntz's question: Solaris has had CacheFS since ~1995, HPUX had a port of it since ~1997. I'd be interested in evidence of even a small fraction of Solaris and/or HPUX shops using CacheFS. I am aware of customers who thought it sounded like a good idea, but ended up ditching it for various reasons (e.g., CacheFS just adds overhead if you almost always hit your local mem cache). was an very very good one. Seems that instead of answering it, we've decided to investigate the oh. --
David has given you plenty of arguments for why it helps scale the
server (including specific workloads), has given you numbers validating
his claim, and has presented claims that Red Hat has customers using
cachefs in RHEL-5.
The arguments I've seen against it, have so far been:
1. Solaris couldn't sell their implementation
2. It's too big
3. It's intrusive
Argument (1) has so far appeared to be pure FUD. In order to discuss the
lessons of history, you need to first do the work of analysing and
understanding it first. I really don't see how it is relevant to Linux
whether or not the Solaris and HPUX cachefs implementations worked out
unless you can demonstrate that that their experience shows some fatal
flaw in the arguments and numbers that David presented, and that his
customers are deluded.
If you want examples of permanent caches that clearly do help servers
scale, then look no further than the on-disk caches used in almost all
http browser implemantations. Alternatively, as David mentioned, there
are the on-disk caches used by AFS/DFS/coda.
(2) may be valid, but I have yet to see specifics for where you'd like
to see the cachefs code slimmed down. Did I miss them?
(3) was certainly true 3 years ago, when the code was first presented
for review, and so we did a review and critique then. The NFS specific
changes have improved greatly as a result, and as far as I know, the
security folks are happy too. If you're not happy with the parts that
affect the memory management code then, again, it would be useful to see
specifics that what you want changed.
If there is still controversy concerning this, then I can temporarily
remove cachefs from the nfs linux-next branch, but I'm definitely
keeping it in the linux-mm branch until someone gives me a reason for
why it shouldn't be merged in its current state.
Trond
--
I can add that our Red Hat customers who tried the cachefs preview did find it useful for their workloads (and, by the way, also use the Solaris cachefs on solaris boxes if I remember correctly). They have been nagging me and others at Red Hat about getting it into supported state for quite a while :-) As you point out, this is all about getting more clients to be driven by a set of NFS servers. Regards, --
Before throwing the 'FUD' acronym around, maybe you should re-read the details. My point was that there were few users of cachefs even when the technology had the potential for greater benefit (slower networks, less powerful servers, smaller memory caches). Obviously cachefs can improve performance--it's simply a function of workload and the assumptions made about server/disk/network bandwidth. However, I would expect the real benefits and real beneficiaries to be fewer than in the past. HOWEVER^2 I did provide some argument(s) in favor of adding cachefs, and look forward to extensions to support delayed write, offline operation, and NFSv4 support with real consistency checking (as long as I don't have to take the customer calls ;-). BTW, animation/video shops were one group that did benefit, and I imagine they still could today (the one I had in mind did work across Britain, the US, and Asia and relied on cachefs for overcoming slow network connections). Wonder if the same company is a RH customer... All the comparisons to HTTP browser implementations are, imho, absurd. It's fine to keep a bunch of http data around on disk because a) it's RO data, b) correctness is not terribly important, and c) a human is generally the consumer and can manually request non-cached data if things look wonky. It is a trivial case of caching. As for security, look at what MIT had to do to prevent local disk caching from breaking the security guarantees of AFS. Customers (deluded or otherwise) are still customers. No one is forced to compile it into their kernel. Ship it. -Dan -----Original Message----- From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] Sent: Monday, December 29, 2008 6:31 AM To: Andrew Morton Cc: Stephen Rothwell; Bernd Schubert; nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com; dhowells@redhat.com; linux-next@vger.kernel.org; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: Re: Pull request for FS-Cache, including ...
I did read your argument. My point is that although the argument sounds reasonable, it ignores the fact that the customer bases are completely different. The people asking for cachefs on Linux typically run a cluster of 2000+ clients all accessing the same read-only data from just a handful of servers. They're primarily looking to improve the performance and stability of the _servers_, since those are the single point of failure of the cluster. As far as I know, historically there has never been a market for 2000+ HP-UX, or even Solaris based clusters, and unless the HP and Sun product plans change drastically, then simple economics dictates that nor will there ever be such a market, whether or not they have cachefs support. OpenSolaris is a different kettle of fish since it has cachefs, and does run on COTS hardware, but there are other reasons why that hasn't yet See above. The majority of people I'm aware of that have been asking for this are interested mainly in improving read-only workloads for data that changes infrequently. Correctness tends to be important, but the requirements are no different from those that apply to the page cache. You mentioned the animation industry: they are prime example of an industry that satisfies (a), (b), and (c). Ditto the oil and gas exploration industry, as well as pretty much all scientific computing, See what David has added to the LSM code to provide the same guarantees for cachefs... Trond --
Unless it (at least) leverages TPM, the issues I had in mind can't really be addressed in code. One requirement is to prevent a local root user from accessing fs information without appropriate permissions. This leads to unwieldly requirements such as allowing only one user on a machine at a time, blowing away the cache on logout, validating (e.g., refreshing) the kernel on each boot, etc. Sure, some applications won't care, but you're also potentially opening holes that users may not consider. -Dan -----Original Message----- From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] Sent: Tuesday, December 30, 2008 10:45 AM To: Muntz, Daniel Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert; nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com; dhowells@redhat.com; linux-next@vger.kernel.org; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: RE: Pull request for FS-Cache, including NFS patches I did read your argument. My point is that although the argument sounds reasonable, it ignores the fact that the customer bases are completely different. The people asking for cachefs on Linux typically run a cluster of 2000+ clients all accessing the same read-only data from just a handful of servers. They're primarily looking to improve the performance and stability of the _servers_, since those are the single point of failure of the cluster. As far as I know, historically there has never been a market for 2000+ HP-UX, or even Solaris based clusters, and unless the HP and Sun product plans change drastically, then simple economics dictates that nor will there ever be such a market, whether or not they have cachefs support. OpenSolaris is a different kettle of fish since it has cachefs, and does run on COTS hardware, but there are other reasons why that hasn't yet See above. The majority of people I'm aware of that have been asking for this are interested mainly in improving read-only workloads for data that changes infrequently. ...
You can't prevent a local root user from accessing cached data: that's true with or without cachefs. root can typically access the data using /dev/kmem, swap, intercepting tty traffic, spoofing user creds,... If root can't be trusted, then find another machine. The worry is rather that privileged daemons may be tricked into revealing said data to unprivileged users, or that unprivileged users may attempt to read data from files to which they have no rights using the cachefs itself. That is a problem that is addressable by means of LSM, and is what David has attempted to solve. Trond --
Yes, and if you have a single user on the machine at a time (with cache flushed inbetween, kernel refreshed), root can read /dev/kmem, swap, intercept traffic and read cachefs data to its heart's content--hence, those requirements. -Dan -----Original Message----- From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] Sent: Tuesday, December 30, 2008 2:36 PM To: Muntz, Daniel Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert; nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com; dhowells@redhat.com; linux-next@vger.kernel.org; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: RE: Pull request for FS-Cache, including NFS patches You can't prevent a local root user from accessing cached data: that's true with or without cachefs. root can typically access the data using /dev/kmem, swap, intercepting tty traffic, spoofing user creds,... If root can't be trusted, then find another machine. The worry is rather that privileged daemons may be tricked into revealing said data to unprivileged users, or that unprivileged users may attempt to read data from files to which they have no rights using the cachefs itself. That is a problem that is addressable by means of LSM, and is what David has attempted to solve. Trond --
Unless you _are_ root and can check every executable, after presumably rebooting into your own trusted kernel, then those requirements won't mean squat. If you're that paranoid, then you will presumably also be using a cryptfs-encrypted partition for cachefs, which you unmount when you're not logged in. That said, most cluster environments will tend to put most of their security resources into keeping untrusted users out altogether. The client nodes tend to be a homogeneous lot with presumably only a trusted few sysadmins... --
Actually... Cachefiles could fairly trivially add encryption. It would have to be simple encryption but you wouldn't have to store any keys locally. Currently cachefiles _copies_ data between the backingfs and the netfs pages because the direct-IO code is only usable to/from userspace. Rather than copying, encrypt/decrypt could be called. A key could be constructed at the point a cache file is looked up. It could be constructed from the coherency data. In the case of NFS that would be mtime, ctime, isize and change_attr. The coherency data would be encrypted with this key and then stored on disk, as would the contents of the file. It might be possible to chuck the cache key (NFS fh) into the encryption key too and also encrypt the cache key before it is turned into a filename, though we'd have to be careful to avoid collisions if each filename is encrypted with a different key. We'd probably have to be careful about the coherency data decrypting with a different key showing up as the wrong but valid thing. The nice thing about this is that the key need not be retained locally since it's entirely constructed from data fetched from the netfs. David --
Sure, trusted kernel and trusted executables, but it's easier than it sounds. If you start with a "clean" system, you don't need to verify excutables _if_ they're coming from the secured file server (by induction: if you started out secure, the executables on the file server will remain secure). You simply can't trust the local disk from one user to the next. Following the protocol, a student can log into a machine, su to do their OS homework, but not compromise the security of the distributed file system. If I can su while another user is logged in, or the kernel/cmds are not validated between users, cryptfs isn't safe either. If you're following the protocol, it doesn't even matter if a bad guy ("untrusted user"?) gets root on the client--they still can't gain inappropriate access to the file server. OTOH, if my security plan is simply to not allow root access to untrusted users, history says I'm going to lose. -Dan -----Original Message----- From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] Sent: Tuesday, December 30, 2008 3:18 PM To: Muntz, Daniel Cc: Andrew Morton; Stephen Rothwell; Bernd Schubert; nfsv4@linux-nfs.org; linux-kernel@vger.kernel.org; steved@redhat.com; dhowells@redhat.com; linux-next@vger.kernel.org; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: RE: Pull request for FS-Cache, including NFS patches Unless you _are_ root and can check every executable, after presumably rebooting into your own trusted kernel, then those requirements won't mean squat. If you're that paranoid, then you will presumably also be using a cryptfs-encrypted partition for cachefs, which you unmount when you're not logged in. That said, most cluster environments will tend to put most of their security resources into keeping untrusted users out altogether. The client nodes tend to be a homogeneous lot with presumably only a trusted few sysadmins... --
On Wed, 31 Dec 2008 20:11:13 -0800 "Muntz, Daniel" <Dan.Muntz@netapp.com> wrote: if you have a user, history says you're going to lose. you can make your system as secure as you want, with physical access all bets are off. keyboard sniffer.. easy. special dimms that mirror data... not even all THAT hard, just takes a bit of cash. running the user in a VM without him noticing.. not too hard either. etc. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Yeah... this is precisely the reason that the security-test-plan and system-design-document for any really security sensitive system starts with: [ ] The system is in a locked rack [ ] The rack is in a locked server room with detailed access logs [ ] The server room is in a locked and secured building with 24-hour camera surveillance and armed guards I've spent a little time looking into the security guarantees provided by DAC and by the FS-Cache LSM hooks, and it is possible to reasonably guarantee that no *REMOTE* user will be able to compromise the contents of the cache using a combination of DAC (file permissions, etc) and MAC (SELinux, etc) controls. As previously mentioned, local users (with physical hardware access) are an entirely different story. As far as performance considerations for the merge... FS-cache on flash-based storage also has very different performance tradeoffs from traditional rotating media. Specifically I have some sample 32GB SATA-based flash media here with ~230Mbyte/sec sustained read and ~200Mbyte/sec sustained write and with a 75usec read latency. It doesn't take much link latency at all to completely dwarf that kind of access time. Cheers, Kyle Moffett --
On Tue, 30 Dec 2008 14:15:42 -0800 we're talking about NFS here (but also local CDs and potentially CIFS etc). The level of security you're talking about is going to be the same before or after cachefs.... very little against local root. Frankly, any networking filesystem just trusts that the connection is authenticated... eg there is SOMEONE on the machine who has the right credentials. Cachefs doesn't change that; it still validates with the server before giving userspace the data. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I disagree with your assertion that there was no result. Various people,
beside myself, have weighed in with situations where FS-Cache is or may be
useful. You've been presented with benchmarks showing that it can make a
difference.
However, *you* are the antagonist, as strictly defined in the dictionary; we
were trying to convince *you*, so a result has to come from *you*. I feel
And to a large extent irrelevant. Yes, we know caching adds overhead; I've
never tried to pretend otherwise. It's an exercise in compromise. You don't
just go and slap a cache on everything. There *are* situations in which a
cache will help. We have customers who know about them and are willing to
live with the overhead.
What I have done is to ensure that, even if caching is compiled in, then the
overhead is minimal if there is _no_ cache present. That is requirement #1 on
my list.
Assuming I understand what he said correctly, I've avoided the main issue
listed by Dan because I don't do as Solaris does and interpolate the cache
between the user and NFS. Of course, that probably buys me other issues (FS
Sigh.
The main point is that caching _is_ useful, even with its drawbacks. Dan may
be aware of customers of Sun/HP who thought caching sounds like a good idea,
but then ended up ditching it. I can well believe it. But I am also aware of
customers of Red Hat who are actively using the caching we put in RHEL-5 and
customers who really want caching available in future RHEL and Fedora versions
for various reasons.
To sum up:
(1) Overhead is minimal if there is no cache.
(2) Benchmarks show that the cache can be effective.
(3) People are already using it and finding it useful.
(4) There are people who want it for various projects.
(5) The use of a cache does not automatically buy you an improvement in
performance: it's a matter of compromise.
(6) The performance improvement may be in the network or the servers, not the
client that is actually doing the ...And that of course means that many many 2.6.28 patches which I am maintaining will need significant rework to apply on top of linux-next, and then they won't apply to mainline. Or that linux-next will not apply on top of those patches. Mainly memory management. Please drop the NFS tree until after -rc1. Guys, this: http://lkml.org/lkml/2008/12/27/173 --
Hi Andrew, Trond, OK, it is dropped for now (including today's tree). --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
Significant rework to many many patches? The FS-Cache patches don't have all Okay, that's a reasonable request. David --
That is the result of discussions during the kernel summit in Portland. The discussion here is about whether Andrew agrees with adding the patches or not, as far as I can tell. There are a number of people/companies who want them; there is Andrew who does not. David --
I should tell you to go and reread LKML at this point.
But... What can FS-Cache do for you? Well, in your specific case, probably
nothing. If all your computers are local to your normal desktop box and are
connected by sufficiently fast network and you have sufficiently few of them,
or you don't use any of NFS, AFS, CIFS, Lustre, CRFS, CD-ROMs then it is
likely that won't gain you anything.
Even if you do use some of those "netfs's", it won't get you anything yet
because I haven't included patches to support anything other than NFS and the
in-kernel AFS client yet.
However, if you do use NFS (or my AFS client), and you are accessing computers
via slow networks, or you have lots of machines spamming your NFS server, then
it might avail you.
It's a compromise: a trade-off between the loading and latencies of your
network vs the loading and latencies of your disk; you sacrifice disk space to
make up for the deficiencies of your network. The worst bit is that the
latencies are additive under some circumstances (when doing a page read you
may have to check disk and then go to the network).
So, FS-Cache can do the following for you:
(1) Allow you to reduce network loading by diverting repeat reads to local
storage.
(2) Allow you to reduce the latency of slow network links by diverting repeat
reads to local storage.
(3) Allow you to reduce the effect of uncontactable servers by serving data
out of local storage.
(4) Allows you to reduce the latency of slow rotating media (such as CDROM
and CD-changers).
(5) Allow you to implement disconnected operation, partly by (3), but also by
caching changes for later syncing.
Now, (1) and (2) are readily demonstrable. I have posted benchmarks to do
this. (3) to (5) are not yet implemented; these have to be mostly implemented
in the filesystems that use FS-Cache rather than FS-Cache itself. FS-Cache
currently has sufficient functionality to do (3) and (4), but needs some extra
bits to ...Was that information captured/maintained somewhere? It really is important (I I want to be able to have an answer when someone asks me "why was all that Of course, their opinions (and supporting explanations) would be valuable. --
The users that I spoke to from the financial sector that tried it are still quite interested. One simple use case for them is a very large cluster of NFS clients for read-mostly workloads (say, 1000 or more NFS clients for shared system partitions). They like the ability to do persistent caching across reboots which allows them to have less (and less beefy) NFS servers for all of those boxes trying to reboot at once. The other use case was for the large rendering customers, but I don't have first hand knowledge of the details... Regards, Ric --
OK. I do agree that persistent caches can be (sometimes very!) useful for certain workloads. As far as I'm concerned, it is mainly a tool to help scale up the number of clients per server. One interesting use case that I didn't see David mention is for cluster boot up. In a lot of the HPC clustered set-ups there tend be a number of 'hot' files that all clients need to access at roughly the same time in the boot cycle. Pre-loading these files into the persistent cache before booting the cluster is one way to solve this problem. Server replication and/or copying the files to local storage on the clients are other solutions. The advantage the persistent cache confers over the two other solutions is that it simplifies the data management: if you do need to change one or two of those hot files on the server, then the clients will automatically update their caches (both the page cache and the persistent cache) without requiring any further action from the cluster administrator. The disadvantage is that cachefs doesn't yet appear to have a tool to select those files that are hot and are therefore best suited to cache (please correct me if I'm wrong, David). The fact that a file is 'hot' on some given client is not necessarily equivalent to saying that it is hot on the server and vice versa. Are there any plans to at some point introduce a tool to manage the persistent cache? Cheers Trond --
I've come across something similar, where a large company was distributing /usr to its UNIX/Linux workstations by AFS without persistent local caching. Cue a powercut, that took away the power from a large quantity of machines and then Yes. I have plans for tools to pin and unpin files, introduce culling priorities, make space reservations in the cache, and cache readahead. These aren't, however, immediately necessary to make local caching useful. To do this, ideally I want a set of ioctl, fcntl or fadvise commands that are common to all filesystems that will just be ignored if the filesystem isn't currently doing caching. Our customers also want to be able to configure this statically, perhaps in some /etc file. Something like, on NFS mount X from server Y, fully readahead all files in or under directory Z. I have an idea on how to do this, but I need to thrash it out with Al. David --
Not just boot up. Consider a room full of thin clients using nfsroot and
the lecturer saying "Now everybody open a browser" or "Now everybody
open Openoffice". With just NFS, it takes ages (there is a bottleneck of
a single gigabit link between the clients and the NFS server even though
the server itself has a 10gig card). If we redirect most of /usr/lib to
a small local flash with some LD_LIBRARY_PATH and bind mount trickery we
get an acceptable startup time. The flash is too small to hold even
/usr/lib (flash size: 500M, /usr/lib is: 927M) so it is not possible to
keep everything locally.
It would be really nice if the local caching could be handled
automatically and we would not need so many hacks, so I really look
forward trying FS-Cache if I have time. I used cachefs on Solaris ages
ago and I had good experiences back then; it would be really nice if
Linux would catch up.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
--
Local disk cache was great for AFS back around 1992. Typical networks were 10 or 100Mbps (slower than disk access at the time), and memories were small (typical 16MB). FS-Cache appears to help only with read traffic--one reason why the web loves caching--and only for reads that would miss the buffer/page cache (memories are now "large"). Solaris has had CacheFS since ~1995, HPUX had a port of it since ~1997. I'd be interested in evidence of even a small fraction of Solaris and/or HPUX shops using CacheFS. I am aware of customers who thought it sounded like a good idea, but ended up ditching it for various reasons (e.g., CacheFS just adds overhead if you almost always hit your local mem cache). One argument in favor that I don't see here is that local disk cache is persistent (I'm assuming it is in your implementation). Addressing 1 and 2 in your list, I'd be curious how often a miss in core is a hit on disk. Number 3 scares me. How does this play with the expected semantics of NFS? Number 5 is hard, if not provably requiring human intervention to do syncs when writes are involved (see Disconnected AFS work by UM/CITI/Huston, and work at CMU). Add persistence as number 6. This may be the best reason to have it, imho. -Dan -----Original Message----- From: David Howells [mailto:dhowells@redhat.com] Sent: Thursday, December 18, 2008 6:27 PM To: Andrew Morton Cc: sfr@canb.auug.org.au; linux-kernel@vger.kernel.org; nfsv4@linux-nfs.org; steved@redhat.com; dhowells@redhat.com; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: Re: Pull request for FS-Cache, including NFS patches I should tell you to go and reread LKML at this point. But... What can FS-Cache do for you? Well, in your specific case, probably nothing. If all your computers are local to your normal desktop box and are connected by sufficiently fast network and you have sufficiently few of them, or you don't use any of NFS, AFS, CIFS, Lustre, CRFS, CD-ROMs then it is likely that won't ...
Would a disk cache on SSD make any sense? Seems like it'd change the More details on the experiences of RHEL/Fedora users might be interesting. My (vague, mostly uniformed) impression is that the group of people who think they need it is indeed a lot larger than the group who really do need it--but that the latter group still exists. --
It could, as could using a battery backed RAM cache. David --
Yes. If you connect an NFS client to an NFS server by GigE and have no other load on the net, and the NFS server can keep the whole dataset in RAM, then the CacheFiles backend kills your performance. I have pointed out that you FS-Cache itself doesn't care whether you're trying to cache read traffic or write traffic. The in-kernel AFS filesystem will cache reads and writes, but the NFS fs will only cache reads as I have currently implemented it. The reason I did this is that if you make a write on an AFS file, you can immediately tell if it overlapped with a write from another client, and you can then nuke your caches. On NFS, you can't, at least not without proper change attribute support in NFSv4. This is not so much a problem with the NFS protocol as a problem with the fact that the backing filesystems that NFS uses don't or didn't support it. However, the same problem affects the NFS pagecache. If you do a write to an NFS server, *that* can then be undetectably out of sync with the server. If you don't mind that state persisting on disk, FS-Cache presents no barrier to I know that Solaris and Windows have it; I don't have any facts about how It can be. As I said in my email: I've tried to implement just the minimal useful functionality for persistent caching. There is more to be added, but it's not immediately necessary. The CacheFiles backend is persistent, but you could write a backend that isn't, if you, for example, wanted to make use of the >4G of RAM permitted to some Pentium motherboards as a huge ramdisk. Not having fully persistent caching allows you to skimp on what metadata you It depends on what you're doing. I'm not sure I can give you a better answer Yeah, I appreciate that it's, erm, 'yucky'. You can aliken it to doing a GIT pull or CVS update into your tree and then having to manually fix up the conflicts. David --
AFS was designed to support local disk cache, so with callbacks you can get a consistent system. I'll ass-u-me that Linux flavors of AFS have callbacks. It should be possible to *integrate* NFSv4.x with FS-Cache similarly (i.e., I don't think you could drop it in as a 'black-box' without breaking something, unless you explicitly build an independent proxy cache server). -Dan -----Original Message----- From: David Howells [mailto:dhowells@redhat.com] Sent: Friday, December 19, 2008 5:21 AM To: Muntz, Daniel Cc: dhowells@redhat.com; Andrew Morton; sfr@canb.auug.org.au; linux-kernel@vger.kernel.org; nfsv4@linux-nfs.org; steved@redhat.com; linux-fsdevel@vger.kernel.org; rwheeler@redhat.com Subject: Re: Pull request for FS-Cache, including NFS patches NFS? I don't know. Yet it's something you want to do for AFS, I think. David --
NFSv4 has equivalents of both the data version number and callbacks. David --
important. Maybe for consistency, but for the performance benefits of local disk caching, I believe the callbacks are pretty important. I say that because I regularly use an NFS 3 filesystem on the IBM internal network that is painfully slow on Linux and fine on AIX with CacheFS. It was also fine when this data was in AFS instead, and if I copy all the files to a local disk filesystem. In this case, the files are _all in local page cache_, so I assume the waiting is for the constant stream of transactions the client uses to make sure the data in the cache is still current every time I open a file. AIX CacheFS doesn't sweat the consistency, so makes these queries only periodically. AFS had the callbacks, so rather than the client asking every time if the data had changed, the server just told it when it did. -- Bryan Henderson IBM Almaden Research Center San Jose CA Storage Systems --
Indeed, but he said "so with callbacks you can get a _consistent_ system". David --
Callbacks, instead of assuming your data is good for some short period of time. It's all well and good that there are mechanisms to know that data has changed, but without some leasing/callback/(ugh)mandatory locking, you're just estimating consistency. -Dan -----Original Message----- From: David Howells [mailto:dhowells@redhat.com] Sent: Friday, December 19, 2008 5:20 PM To: Bryan Henderson Cc: dhowells@redhat.com; Andrew Morton; Muntz, Daniel; linux-fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org; nfsv4@linux-nfs.org; rwheeler@redhat.com; sfr@canb.auug.org.au; steved@redhat.com Subject: Re: Pull request for FS-Cache, including NFS patches Indeed, but he said "so with callbacks you can get a _consistent_ system". David --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano |
