A number of Linux kernel developers recently debated "swapiness" at length on the lkml [1], exploring when an application should or should not be swapped out, versus reclaiming memory from the cache. Fortunately a run-time tunable is available through the proc interface for anyone needing to adapt kernel behavior to their own requirements. To tune, simply echo a value from 0 to 100 onto /proc/sys/vm/swappiness. The higher a number set here, the more the system will swap. 2.6 kernel maintainer Andrew Morton [interview [2]] noted that on his own desktop machines he sets swapiness to 100, further explaining:
"My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful."
The other side of the argument is that if "BloatyApp" is swapped out too agressively, when the user returns to use it he has to wait for it to swap back in and thus detects a noticable delay. Rik van Riel explains, "Making the user have very bad interactivity for the first minute or so is a Bad Thing, even if the computer did run more efficiently while the user wasn't around to notice... IMHO, the VM on a desktop system really should be optimised to have the best interactive behaviour, meaning decent latency when switching applications." Andrew Morton humorously replied, "I'm gonna stick my fingers in my ears and sing 'la la la' until people tell me 'I set swappiness to zero and it didn't do what I wanted it to do'."
From: Brett E. [email blocked] To: linux-kernel mailing list [email blocked] Subject: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 14:27:47 -0700 Same thing happens on 2.4.18. I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I reproduce this behavior by simply untarring a 260meg file on a production server, the machine becomes sluggish as it swaps to disk. Is there a way to limit the cache so this machine, which has 1 gigabyte of memory, doesn't dip into swap? Thanks, Brett
From: Andrew Morton [3] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 17:01:06 -0700 "Brett E." [email blocked] wrote: > > I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I > reproduce this behavior by simply untarring a 260meg file on a > production server, the machine becomes sluggish as it swaps to disk. I see no swapout from the info which you sent. A `vmstat 1' trace would be more useful. > Is there a way to limit the cache so this machine, which has 1 gigabyte of > memory, doesn't dip into swap? Decrease /proc/sys/vm/swappiness? Swapout is good. It frees up unused memory. I run my desktop machines at swappiness=100.
From: Jeff Garzik [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 20:10:14 -0400 Andrew Morton wrote: > Swapout is good. It frees up unused memory. I run my desktop machines at > swappiness=100. The definition of "unused" is quite subjective and app-dependent... I've see reports with increasing frequency about the swappiness of the 2.6.x kernels, from people who were already annoyed at the swappiness of 2.4.x kernels :) Favorite pathological (and quite common) examples are the various 4am cron jobs that scan your entire filesystem. Running that process overnight on a quiet machines practically guarantees a huge burst of disk activity, with unwanted results: 1) Inode and page caches are blown away 2) A lot of your desktop apps are swapped out Additionally, a (IMO valid) maxim of sysadmins has been "a properly configured server doesn't swap". There should be no reason why this maxim becomes invalid over time. When Linux starts to swap out apps the sysadmin knows will be useful in an hour, or six hours, or a day just because it needs a bit more file cache, I get worried. There IMO should be some way to balance the amount of anon-vma's such that the sysadmin can say "stop taking 70% of my box's memory for disposable cache, use it instead for apps you would otherwise swap out, you memory-hungry kernel you." Jeff
From: Nick Piggin [4] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Thu, 29 Apr 2004 10:21:24 +1000 Jeff Garzik wrote: > Additionally, a (IMO valid) maxim of sysadmins has been "a properly > configured server doesn't swap". There should be no reason why this > maxim becomes invalid over time. When Linux starts to swap out apps the > sysadmin knows will be useful in an hour, or six hours, or a day just > because it needs a bit more file cache, I get worried. > I don't know. What if you have some huge application that only runs once per day for 10 minutes? Do you want it to be consuming 100MB of your memory for the other 23 hours and 50 minutes for no good reason? Anyway, I have a small set of VM patches which attempt to improve this sort of behaviour if anyone is brave enough to try them. Against -mm kernels only I'm afraid (the objrmap work causes some porting difficulty).
From: Wakko Warner [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 20:50:59 -0400 > I don't know. What if you have some huge application that only > runs once per day for 10 minutes? Do you want it to be consuming > 100MB of your memory for the other 23 hours and 50 minutes for > no good reason? I keep soffice open all the time. The box in question has 512mb of ram. This is one app, even though I use it infrequently, would prefer that it never be swapped out. Mainly when I want to use it, I *WANT* it now (ie not waiting for it to come back from swap) This is just my oppinion. I personally feel that cache should use available memory, not already used memory (swapping apps out for more cache). -- Lab tests show that use of micro$oft causes cancer in lab animals
From: Jeff Garzik [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 20:53:05 -0400 Wakko Warner wrote: > This is just my oppinion. I personally feel that cache should use available > memory, not already used memory (swapping apps out for more cache). Strongly agreed, though there are pathological cases that prevent this from being something that's easy to implement on a global basis. Jeff
From: Brett E. [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 17:49:43 -0700 Jeff Garzik wrote: > There IMO should be some way to balance the amount of anon-vma's such > that the sysadmin can say "stop taking 70% of my box's memory for > disposable cache, use it instead for apps you would otherwise swap out, > you memory-hungry kernel you." Or how about "Use ALL the cache you want Mr. Kernel. But when I want more physical memory pages, just reap cache pages and only swap out when the cache is down to a certain size(configurable, say 100megs or something)."
From: Andrew Morton [5] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 18:00:38 -0700 "Brett E." [email blocked] wrote: > > Or how about "Use ALL the cache you want Mr. Kernel. But when I want > more physical memory pages, just reap cache pages and only swap out when > the cache is down to a certain size(configurable, say 100megs or > something)." Have you tried decreasing /proc/sys/vm/swappiness? That's what it is for. My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful.
From: Jeff Garzik [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 21:24:45 -0400 Andrew Morton wrote: > Have you tried decreasing /proc/sys/vm/swappiness? That's what it is for. > > My point is that decreasing the tendency of the kernel to swap stuff out is > wrong. You really don't want hundreds of megabytes of BloatyApp's > untouched memory floating about in the machine. Get it out on the disk, > use the memory for something useful. Well, if it's truly untouched, then it never needs to be allocated a page or swapped out at all... just accounted for (overcommit on/off, etc. here) But I assume you are not talking about that, but instead talking about _rarely_ used pages, that were filled with some amount of data at some point in time. These are at the heart of the thread (or my point, at least) -- BloatyApp may be Oracle with a huge cache of its own, for which swapping out may be a huge mistake. Or Mozilla. After some amount of disk IO on my 512MB machine, Mozilla would be swapped out... when I had only been typing an email minutes before. BloatyApp? yes. Should it have been swapped out? Absolutely not. The 'SIZE' in top was only 160M and there were no other major apps running. Applications are increasingly playing second fiddle to cache ;-( Regardless of /proc/sys/vm/swappiness, I think it's a valid concern of sysadmins who request "hard cache limit", because they are seeing pathological behavior such that apps get swapped out when cache is over 50% of all available memory. Jeff
From: Andrew Morton [6] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 18:40:08 -0700 Jeff Garzik [email blocked] wrote: > > Well, if it's truly untouched, then it never needs to be allocated a > page or swapped out at all... just accounted for (overcommit on/off, > etc. here) > > But I assume you are not talking about that, but instead talking about > _rarely_ used pages, that were filled with some amount of data at some > point in time. Of course. My fairly modest desktop here stabilises at about 300 megs swapped out, with negligible swapin. That's all just crap which apps aren't using any more. Getting that memory out on disk, relatively freely is an important optimisation. > These are at the heart of the thread (or my point, at > least) -- BloatyApp may be Oracle with a huge cache of its own, for > which swapping out may be a huge mistake. Or Mozilla. After some > amount of disk IO on my 512MB machine, Mozilla would be swapped out... > when I had only been typing an email minutes before. OK, so it takes four seconds to swap mozilla back in, and you noticed it. Did you notice that those three kernel builds you just did ran in twenty seconds less time because they had more cache available? Nope. > Regardless of /proc/sys/vm/swappiness, I think it's a valid concern of > sysadmins who request "hard cache limit", because they are seeing > pathological behavior such that apps get swapped out when cache is over > 50% of all available memory. We should be sceptical of this. If they can provide *numbers* then fine. Otherwise, the subjective "oh gee, that took a long time" seat-of-the-pants stuff does not impress. If they want to feel better about it then sure, set swappiness to zero and live with less cache for the things which need it... Let me point out that the kernel right now, with default swappiness very much tends to reclaim cache rather than swapping stuff out. The top-of-thread report was incorrect, due to a misreading of kernel instrumentation.
From: Rik van Riel [7] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 21:47:45 -0400 (EDT) On Wed, 28 Apr 2004, Andrew Morton wrote: > OK, so it takes four seconds to swap mozilla back in, and you noticed it. > > Did you notice that those three kernel builds you just did ran in twenty > seconds less time because they had more cache available? Nope. That's exactly why desktops should be optimised to give the best performance where the user notices it most... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
From: Rik van Riel [8] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 21:46:35 -0400 (EDT) On Wed, 28 Apr 2004, Andrew Morton wrote: > You really don't want hundreds of megabytes of BloatyApp's untouched > memory floating about in the machine. But people do. The point here is LATENCY, when a user comes back from lunch and continues typing in OpenOffice, his system should behave just like he left it. Making the user have very bad interactivity for the first minute or so is a Bad Thing, even if the computer did run more efficiently while the user wasn't around to notice... IMHO, the VM on a desktop system really should be optimised to have the best interactive behaviour, meaning decent latency when switching applications. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
From: Andrew Morton [9] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 18:57:20 -0700 Rik van Riel [email blocked] wrote: > > IMHO, the VM on a desktop system really should be optimised to > have the best interactive behaviour, meaning decent latency > when switching applications. I'm gonna stick my fingers in my ears and sing "la la la" until people tell me "I set swappiness to zero and it didn't do what I wanted it to do".
From: Marc Singer [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 19:29:44 -0700 On Wed, Apr 28, 2004 at 06:57:20PM -0700, Andrew Morton wrote: > Rik van Riel [email blocked] wrote: > > > > IMHO, the VM on a desktop system really should be optimised to > > have the best interactive behaviour, meaning decent latency > > when switching applications. > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > me "I set swappiness to zero and it didn't do what I wanted it to do". It does, but it's a bit too coarse of a solution. It just means that the page cache always loses.
From: Andrew Morton [10] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 19:35:41 -0700 Marc Singer [email blocked] wrote: > > It does, but it's a bit too coarse of a solution. It just means that > the page cache always loses. That's what people have been asking for. What are you suggesting should happen instead?
From: Marc Singer [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 20:10:59 -0700 On Wed, Apr 28, 2004 at 07:35:41PM -0700, Andrew Morton wrote: > > That's what people have been asking for. What are you suggesting should > happen instead? I'm thinking that the problem is that the page cache is greedier that most people expect. For example, if I could hold the page cache to be under a specific size, then I could do some performance measurements. E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a machine with loads of RAM, where's the optimal page cache size?
From: Andrew Morton [11] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 20:19:24 -0700 Marc Singer [email blocked] wrote: > > > That's what people have been asking for. What are you suggesting should > > happen instead? > > I'm thinking that the problem is that the page cache is greedier that > most people expect. For example, if I could hold the page cache to be > under a specific size, then I could do some performance measurements. > E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a > machine with loads of RAM, where's the optimal page cache size? Nope, there's no point in leaving free memory floating about when the kernel can and will reclaim clean pagecache on demand. What you discuss above is just an implementation detail. Forget it. What are the requirements? Thus far I've seen a) updatedb causes cache reclaim b) updatedb causes swapout c) prefer that openoffice/mozilla not get paged out when there's heavy pagecache demand. For a) we don't really have a solution. Some have been proposed but they could have serious downsides. For b) and c) we can tune the pageout-vs-cache reclaim tendency with /proc/sys/vm/swappiness, only nobody seems to know that. What else is there?
From: Marc Singer [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 21:13:03 -0700 On Wed, Apr 28, 2004 at 08:19:24PM -0700, Andrew Morton wrote: > Marc Singer [email blocked] wrote: > > > > > That's what people have been asking for. What are you suggesting should > > > happen instead? > > > > I'm thinking that the problem is that the page cache is greedier that > > most people expect. For example, if I could hold the page cache to be > > under a specific size, then I could do some performance measurements. > > E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a > > machine with loads of RAM, where's the optimal page cache size? > > Nope, there's no point in leaving free memory floating about when the > kernel can and will reclaim clean pagecache on demand. It could work differently from that. For example, if we had 500M total, we map 200M, then we do 400M of IO. Perhaps we'd like to be able to say that a 400M page cache is too big. The problem isn't about reclaiming pagecache it's about the cost of swapping pages back in. The page cache can tend to favor swapping mapped pages over reclaiming it's own pages that are less likely to be used. Of course, it doesn't know that...which is the rub. If I thought I had an method for doing this, I'd write code to try it out. > What you discuss above is just an implementation detail. Forget it. What > are the requirements? Thus far I've seen The requirement is that we'd like to see pages aged more gracefully. A mapped page that is used continuously for ten minutes and then left to idle for 10 minutes is more valuable than an IO page that was read once and then not used for ten minutes. As the mapped page ages, it's value decays. > a) updatedb causes cache reclaim > > b) updatedb causes swapout > > c) prefer that openoffice/mozilla not get paged out when there's heavy > pagecache demand. > > For a) we don't really have a solution. Some have been proposed but they > could have serious downsides. > > For b) and c) we can tune the pageout-vs-cache reclaim tendency with > /proc/sys/vm/swappiness, only nobody seems to know that. I've read the source for where swappiness comes into play. Yet I cannot make a statement about what it means. Can you?
From: Andrew Morton [12] [email blocked] Subject: Re: ~500 megs cached yet 2.6.5 goes into swap hell Date: Wed, 28 Apr 2004 21:33:59 -0700 Marc Singer [email blocked] wrote: > > It could work differently from that. For example, if we had 500M > total, we map 200M, then we do 400M of IO. Perhaps we'd like to be > able to say that a 400M page cache is too big. Try it - you'll find that the system will leave all of your 200M of mapped memory in place. You'll be left with 300M of pagecache from that I/O activity. There may be a small amount of unmapping activity if the I/O is a write, or if the system has a small highmem zone. Maybe. Beware that both ARM and NFS seem to be doing odd things, so try it on a PC+disk first ;) > The problem isn't > about reclaiming pagecache it's about the cost of swapping pages back > in. The page cache can tend to favor swapping mapped pages over > reclaiming it's own pages that are less likely to be used. Of course, > it doesn't know that...which is the rub. No, the system will only start to unmap pages if reclaim of unmapped pagecache is getting into difficulty. The threshold of "getting into difficulty" is controlled by /proc/sys/vm/swappiness. > The requirement is that we'd like to see pages aged more gracefully. > A mapped page that is used continuously for ten minutes and then left > to idle for 10 minutes is more valuable than an IO page that was read > once and then not used for ten minutes. As the mapped page ages, it's > value decays. yes, remembering aging info over that period of time is hard. We only have six levels of aging: referenced+active, unreferenced+active, referenced+inactive,unreferenced+inactive, plus position-on-lru*2. > I've read the source for where swappiness comes into play. Yet I > cannot make a statement about what it means. Can you? It controls the level of page reclaim distress at which we decide to start reclaiming mapped pages. We prefer to reclaim pagecache, but we have to start swapping at *some* level of reclaim failure. swappiness sets that level, in rather vague units. It might make sense to recast swappiness in terms of pages_reclaimed/pages_scanned, which is the real metric of page reclaim distress. But that would only affect the meaning of the actual number - it wouldn't change the tunable's effect on the system.
Related Links:
- Archive of above thread [13]
- KernelTrap interview with Andrew Morton [14]
- KernelTrap interview with Nick Piggin [15]
- KernelTrap interview with Rik van Riel [16]
- Slashdot discussion [17]