Linux: VM Swappiness Autoregulation

Submitted by nimrod
on October 23, 2003 - 11:06am

Con Kolivas [interview] strikes again, this time with a patch that regulates the VM subsystem's "swappiness" on-the-fly, depending on the percent of RAM being used by applications (it does not take disk cache into account). Con explained the effects of this patch:

"This has the effect of preventing applications from being swapped out if the ram is filling up with cached data. Conversely, if many applications are in ram the swappiness increases which means the application currently in use gets to stay in physical ram while other less used applications are swapped out.

"For desktop enthusiasts this means if you are copying large files around like ISO images or leave your machine unattended for a while it will not swap out your applications. Conversely if the machine has a lot of applications currently loaded it will give the currently running applications preference and swap out the less used ones."

Swappiness is a kernel "knob" (located in /proc/sys/vm/swappiness) used to tweak how much the kernel favors swap over RAM; high swappiness means the kernel will swap out a lot, and low swappiness means the kernel will try not to use swap space.

Update: Con posted an updated version of the patch.


From: Con Kolivas [email blocked]
To: linux-kernel
Subject: [PATCH] Autoregulate vm swappiness 2.6.0-test8
Date: 2003-10-23 13:37:50

The vm_swappiness dial in 2.6 was never quite the right setting without me 
constantly changing it depending on the workload. If I was copying large 
files or encoding video it was best at 0. If I was using lots of applications 
it was best much higher. Furthermore it depended on the amount of ram in the 
machine I was using. This patch was done just for fun a while back but it 
turned out to be quite effectual so I thought I'd make it available for the 

wider community to play with. Do whatever you like with it.

This patch autoregulates the vm_swappiness dial in 2.6 by making it equal to 
the percentage of physical ram consumed by application pages. 

This has the effect of preventing applications from being swapped out if the 
ram is filling up with cached data. 

Conversely, if many applications are in ram the swappiness increases which 
means the application currently in use gets to stay in physical ram while 
other less used applications are swapped out. 

For desktop enthusiasts this means if you are copying large files around like 
ISO images or leave your machine unattended for a while it will not swap out 
your applications. Conversely if the machine has a lot of applications 
currently loaded it will give the currently running applications preference 
and swap out the less used ones.

The performance effect on larger boxes seems to be either unchanged or slight 
improvement (1%) in database benchmarks.

The value in vm_swappiness is updated only when the vm is under pressure to 
swap and you can check the last vm_swappiness value under pressure by
cat /proc/sys/vm/swappiness

Manually setting the swappiness with this patch in situ has no effect. This 
patch has been heavily tested without noticable harm. Note I am not sure of 
the best way to do this so it may look rather crude.

Patch against 2.6.0-test8

Con

--- linux-2.6.0-test8-base/mm/vmscan.c	2003-10-19 20:24:36.000000000 +1000
+++ linux-2.6.0-test8-am/mm/vmscan.c	2003-10-22 17:56:18.501329888 +1000
@@ -47,7 +47,7 @@
 /*
  * From 0 .. 100.  Higher means more swappy.
  */
-int vm_swappiness = 60;
+int vm_swappiness = 0;
 static long total_memory;
 
 #ifdef ARCH_HAS_PREFETCH
@@ -595,11 +595,13 @@ refill_inactive_zone(struct zone *zone, 
 	int pgmoved;
 	int pgdeactivate = 0;
 	int nr_pages = nr_pages_in;
+	int pg_size;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);	/* Pages to go onto the inactive_list */
 	LIST_HEAD(l_active);	/* Pages to go onto the active_list */
 	struct page *page;
 	struct pagevec pvec;
+	struct sysinfo i;
 	int reclaim_mapped = 0;
 	long mapped_ratio;
 	long distress;
@@ -642,6 +644,16 @@ refill_inactive_zone(struct zone *zone, 
 	mapped_ratio = (ps->nr_mapped * 100) / total_memory;
 
 	/*
+	 * Autoregulate vm_swappiness to be application pages % -ck.
+	 */
+	si_meminfo(&i);
+	si_swapinfo(&i);
+	pg_size = get_page_cache_size() - i.bufferram ;
+	vm_swappiness = 100 - (((i.freeram + i.bufferram +
+		(pg_size - swapper_space.nrpages)) * 100) /
+		(i.totalram ? i.totalram : 1));
+
+	/*
 	 * Now decide how much we really want to unmap some pages.  The mapped
 	 * ratio is downgraded - just because there's a lot of mapped memory
 	 * doesn't necessarily mean that page reclaim isn't succeeding.



From: Martin J. Bligh [email blocked]
To: linux-kernel
Subject: Re: [PATCH] Autoregulate vm swappiness 2.6.0-test8
Date: 2003-10-23 14:42:34

> +	 * Autoregulate vm_swappiness to be application pages % -ck.
> +	 */
> +	si_meminfo(&i);
> +	si_swapinfo(&i);
> +	pg_size = get_page_cache_size() - i.bufferram ;
> +	vm_swappiness = 100 - (((i.freeram + i.bufferram +
> +		(pg_size - swapper_space.nrpages)) * 100) /
> +		(i.totalram ? i.totalram : 1));
> +
> +	/*

It seems that you don't need si_swapinfo here, do you? i.freeram,
i.bufferram, and i.totalram all come from meminfo, as far as I can
see? Maybe I'm missing a bit ...

M.



From: Con Kolivas [email blocked]
To: linux-kernel
Subject: Re: [PATCH] Autoregulate vm swappiness 2.6.0-test8
Date: 2003-10-23 15:03:19

On Friday 24 October 2003 00:42, Martin J. Bligh wrote:
> > +	 * Autoregulate vm_swappiness to be application pages % -ck.
> > +	 */
> > +	si_meminfo(&i);
> > +	si_swapinfo(&i);
> > +	pg_size = get_page_cache_size() - i.bufferram ;
> > +	vm_swappiness = 100 - (((i.freeram + i.bufferram +
> > +		(pg_size - swapper_space.nrpages)) * 100) /
> > +		(i.totalram ? i.totalram : 1));
> > +
> > +	/*
>
> It seems that you don't need si_swapinfo here, do you? i.freeram,
> i.bufferram, and i.totalram all come from meminfo, as far as I can
> see? Maybe I'm missing a bit ...

Well I did do it a while ago and it seems I got carried away adding and 
subtracting info indeed. :-) Here's a simpler patch that does the same thing.

Con


--- linux-2.6.0-test8-base/mm/vmscan.c	2003-10-19 20:24:36.000000000 +1000
+++ linux-2.6.0-test8-am/mm/vmscan.c	2003-10-24 00:46:52.000000000 +1000
@@ -47,7 +47,7 @@
 /*
  * From 0 .. 100.  Higher means more swappy.
  */
-int vm_swappiness = 60;
+int vm_swappiness = 0;
 static long total_memory;
 
 #ifdef ARCH_HAS_PREFETCH
@@ -600,6 +600,7 @@ refill_inactive_zone(struct zone *zone, 
 	LIST_HEAD(l_active);	/* Pages to go onto the active_list */
 	struct page *page;
 	struct pagevec pvec;
+	struct sysinfo i;
 	int reclaim_mapped = 0;
 	long mapped_ratio;
 	long distress;
@@ -642,6 +643,13 @@ refill_inactive_zone(struct zone *zone, 
 	mapped_ratio = (ps->nr_mapped * 100) / total_memory;
 
 	/*
+	 * Autoregulate vm_swappiness to be application pages% -ck
+	 */
+	si_meminfo(&i);
+	vm_swappiness = 100 - (((i.freeram + get_page_cache_size() -
+		swapper_space.nrpages) * 100) / (i.totalram ? i.totalram : 1));
+
+	/*
 	 * Now decide how much we really want to unmap some pages.  The mapped
 	 * ratio is downgraded - just because there's a lot of mapped memory
 	 * doesn't necessarily mean that page reclaim isn't succeeding.

Related Links:

Keep required tuning and user control

Anonymous
on
October 23, 2003 - 4:25pm

Just a thought:

What about making swappiness a linear combination of freeram, bufferram, etc, by replacing the single swappiness constant in /proc/... with coefficients such as:

freeram_coeff, bufferram_coeff, base_swappiness, min_swappiness, max_swappiness

so now
swappiness = free_coeff * i.freeram + buff_coeff * i.bufferram + base_swappiness
swappiness = MAX(swappiness,max_swappiness)
swappiness = MIN(swappiness,min_swappiness)

Tunables go against autoregulation

Con Kolivas
on
October 23, 2003 - 6:12pm

While people seem to love tunables, and I did consider putting a ceiling on the swappiness based on one, this goes against the whole point of this patch. Autoregulation should allow a properly designed feedback system to use the best settings for any scenario.

Tunables go against autoregulation?

Anonymous
on
October 25, 2003 - 5:27pm

The reason I suggested this was that I was wondering if:

Is possible to make the same autoregulation work for server and desktop users?

I didn't imagine that people would play with these, but, for example, the distros would set up the coefficients when the user choose either a server of workstation install.

Is it possible that one shoe fits all?

One variable sized shoe.

Con Kolivas
on
October 25, 2003 - 8:23pm

Yes, that is the point of autoregulation exactly. The shoe size is being modified on the fly according to the circumstances.

The knob is restricting you to one value which is only going to be correct at a certain workload for a certain hardware. Putting a "range of knobs" is only likely to make it impossible for the admin to get the setting right without intrinsic kernel knowledge, and Andrew Morton has basically created the useful range already (0 - 100). Once you have a range you need some way for the kernel to modify it; and this is what my patch does.

wow this sounds great!

Anonymous
on
October 23, 2003 - 5:38pm

seems like a simple concept for a long time issue, I look forward to seeing it in 2.6....

The Linus Freeze

Anonymous
on
October 23, 2003 - 9:28pm

Would something like this get in now that there is the infamous "linus freeze" ?

Re: The Linus Freeze

nimrod
on
October 23, 2003 - 10:53pm

I think it's a sure bet that it will make it into the -mm tree. That aside, it's a simple patch (some 4 lines or so changed), you probably won't have trouble applying it to newer linus releases.

Impressive

Anonymous
on
October 24, 2003 - 3:05am

even as it stands now Linux 2.6 "feels" better then say Windows in the VM department, no comparison. Hell, everything feels better...I can shake a playing video window in Linux or Mac OS X and it continues to play, no dropped frames - try that in Windows (same hardware as Linux, dual Athlon, reasonably beefy) and it stutters like child reporting to the police what happened after he was mugged. Theres no excuse for that. Linux 2.6, as of /right now/, is already - IMHO - more responsive then Windows in every way I care to note. So yeah, I'm happy. :-)

now for userspace?

Anonymous
on
October 24, 2003 - 3:14am

I think Linux's userspace could use some sharpening up. Just in general, no specific complaints. I'd rather apps that are nicely polished with low load times and sharp response. I'll happily give up the kitchen sink ;)

Agreed

Anonymous
on
October 24, 2003 - 4:38am

Gnome and KDE are too heavy for christ's sake, when I first started using Linux back with Red Hat 5.1 it was considerably faster in most every way then Windows 98 on my lowly Pentium 133 w/ 48 megs of RAM. What happened? Outside of size/bloat I love Gnome, but someone needs to put it on a diet.

Gnome is on a diet

Cuboci
on
October 24, 2003 - 5:46am

I still remember what Gnome 1.x was like. Gnome 2.x has seen a tremendous performance boost imho. It's not that bad anymore.

er....

Anonymous
on
October 24, 2003 - 6:09am

run it on older hardware and say that with a straight face. Most people hold onto their computers for longer then perhaps they should, only enthusiasts upgrade constantly.

*puts on a straight face*

Cuboci
on
October 26, 2003 - 12:28am

I run it on my Duron 800 (not exactly state of the art, is it?) and at home on my mum's PII400. The PII was mine and I ran Gnome 1.4 on it. And I really do think it was slower than 2.x. Take nautilus for example. In 1.4 it was almost unusable (performance-wise... the constant crashes were another matter), now it works quite well.

1.4

Kibble
on
October 26, 2003 - 1:41am

IMHO you should consider 1.4 minus nautilus, since it came in late and was shocking.

Re: Gnome is on a diet

catfeeder
on
October 25, 2003 - 3:15am

Really? I sort of preferred Gnome 1.x's lack of power-sucking eye candy. Granted, I turn that stuff off, but I don't remember Gnome on distros like Redhat 6.2 or Mandrake 6.x being bad at all on my Pentium II 233 of the time.

Swappiness ...

Anonymous
on
October 24, 2003 - 3:44am

Swappine-ess is a warm gun mama (ol' song)

:)))))

bang bang shoot shoot when i

Adam M Garrett (not verified)
on
January 4, 2008 - 2:26pm

bang bang shoot shoot

when i hold you in my arms and place my finger in your trigger

bang bang shoot shoot

excellent!

florin
on
October 24, 2003 - 10:56am

This has the effect of preventing applications from being swapped out if the ram is filling up with cached data.

Dude, this is awesome! For a long time, i though i was crazy or something, or i was the only one in the world who noticed how wrong is the current "swapping policy". The exact same behaviour that you mentioned above is very annoying for a desktop.

Thank you for fixing it!



Linus, we want this in the mainstream kernel!

not so bad in vanilla 2.6

Anonymous
on
October 24, 2003 - 12:09pm

I had this problem constantly with 2.4, every time I had watched a movie my RAM would be full och useless data (the movie) and practically all my apps were swapped out - absolutely crazy. However, when I switched to 2.6.0-test* that problem dissapeared... this patch will probably make it even better though, I'm glad someone has noticed the problem...

O_STREAM

Anonymous
on
October 27, 2003 - 4:27am

That's why 2.6 has a new feature when opening files. If you open a file with O_STREAM the kernel doesn't cache the contents of the file, thus its not wasting your precious RAM for non-important data.

I'll second that

namtro
on
October 24, 2003 - 7:37pm

Con, your work is impressive, but perhaps more impressive is that you're self taught (and humble). Personally, it serves as motivation for me. Thanks for your work on this and other issues.

Thanks, further refinement is sorely needed!

Anonymous
on
October 27, 2003 - 2:46am

Hi,

I have tried 2.6.0-test7 on a system with the following specs: P3/800, 128MB SDRAM, 3 SCSI HDDs (9.1G, 4G, 2.1G).

I had two Mozilla Firebirds, VNC server, SETI, sometimes OpenOffice, many, many terminals and a cross-compile Toolchain build and many more things running there.

Whatever I did, I had horrible swapping all the way (not really astonishing on such a setup), and whenever it swapped, system performance seemed to be soooooo much worse than what you're used to with the new great scheduling behaviour when having enough RAM available.
In other words: when the system had to swap, I sometimes had complete hangups of EVEN UP TO 15 SECONDS!! (yes, really!) when trying to do something.
"hangup" means that the system was basically unusable for such a long period of time, with no mouse pointer or keyboard input
reaction whatsoever.

I think this is quite unacceptable, and something could be done to improve that (I wonder whether any of the main kernel developers are even still doing testing with <= 128MB RAM).

Somehow I've got the impression that either the swap process has a
priority that's much too high, or (more likely) too many much too important user-visible parts (X11, IceWM, Mozilla, ...) got swapped out which took very long to get swapped back in.

Of what use is a high value of cache and buffer memory if I then have to wait for hours for my application pages to swap back in?

While I'm aware that 128MB for such daring tasks is ridiculous, I think that Linux should be able to do better than very visible 15 second hangups even on such a miserable setup.
(it's a bit ridiculous to have the whole system improved down to tiniest fractions of a femto-second latency in scheduling ;-), but when it comes to swapping, you easily reach 5 second delays and longer...)

I'm afraid I cannot provide hard evidence ("free" printout etc.) any more, since I had my system upgraded to 256MB memory.
Also, the 128MB stick is gone for now since it turned out to be slightly defective.
And who could blame me for upgrading anyway? I have some more important things to do other than continually waiting for the system to stop swapping... ;-)

All I remember is that 133MB used swap (of 256MB) remained pretty constant after some initial settling time (few hours) after bootup.

But OTOH maybe this setup actually IS crazy enough to account for 15 seconds delays ;-)
Anyway, I just suspect that there could be done more tweaking, since most developers probably only use power machines, which the small Linux developer guy in a 3 man company (or so) often won't have access to...
As such, proper swapping optimization potentially needed by many lower-specs people might never get done due to the actual developers having too high-end boxes.

BTW: has anyone actually tried running 2.6.0-testX on a 386SX-16?? :-)
I'd bet it's not too funny any more...

Greetings,

Andreas Mohr

Limit memory

Con Kolivas
on
October 27, 2003 - 3:20am

Try the mem= parameter with your bootloader to emulate any degree of underpowered crapulence. Yes it's true that we need vm scheduling to tame what you describe... but that's for 2.8 now, and I happen to know at least one person who is working on it [not me!].

Compare this to windows2000

Magnus Sundberg
on
October 27, 2003 - 6:51am

I beleive Linnux has come quite far.
I use Win2000 as primary desktop, you know, it has MS Office.

My first computer experience for the first six years out of college was Apollo/Domain, SunOS, Solaris and HPUX. I got accustomed to some bad habits, like never logging out, letting your applications run until they crash, running a tremendous amount of applications in parallell, well more or less just keep your computer going. I usually lock the computer with the screen-saver when I leave for the day.

I have continued to do this on my windows boxes, with these I quite often experience that the whole machine freezes for a much longer time than 15 seconds. btw the machine is equipped with 256Mbyte of RAM



Magnus

Bad habits?

Mr_Z
on
October 28, 2003 - 10:55am

Why do you call these bad habits? I find them quite reasonable. Why wait for a reboot? Why log out? Why wait for your desktop environment to do its thing, only to be followed by you reestablishing your "center"? It takes an awful lot of time to reestablish state in a working environment.

I have a laptop whose "hibernate" function I do not trust. (Corporate forces WinXP and no dual-boot on me, so don't suggest Linux patches to fix it.) I spend about 5 to 10 minutes getting "reestablished" whenever I bring my laptop up. Part of that is just waiting for Windows to finish bringing up all the corporate-mandated stuff (anti-virus, backup software, ZoneAlarm, etc...) and part is opening all the windows I want open. (Two or three command prompts, two or three PuTTY sessions to real computers, Mozilla, and on bad days, MS Word.) It's frustrating that after login, I have to wait a couple minutes before the desktop environment starts being responsive so I can even initiate this process. Then, once it's initiated, it still takes a couple minutes before I'm ready to go.

Linux and UNIX aren't much better. At least I'm not waiting for the reboot, but logging into most windowing environments still hits me with a noticeable time delay from login to actual work.

So, given that, I find staying logged in for months at a time to be entirely reasonable for a real computer. It's sad that even WinXP still requires the occasional hygenic reboot.

I think he may've been being

Anonymous (not verified)
on
August 29, 2006 - 8:06pm

I think he may've been being sarcastic there about the "bad habits".

Pre-swap idea

Anonymous
on
October 27, 2003 - 10:30am

First of let me say I'm not a kernel developer, so if my idea is crazy, please let me know why.

The question/idea, is: would some sort of pre-swap help improve response/performance under memory contention. I would define pre-swap as copying a page of a programs memory into a swap file, but not deleting it from the programs memory space.

If a process tries to update part of it's memory that has been pre-swapped, it would segfault causing a system routing to unmark the block as pre-swapped, and marking the swap file space taken by the block as free. The program would then be allowed to run again.

So if the system needs memory, blocks are chosen that have been pre-swapped first. The blocks are taken from their process cleared and given to the proccess that needs it. No disk activity is needed at this time. For the original owner proccess, the page would now be marked as paged out, and would have to be copied in from disk to be used again

A pre-swap thread would decide when to pre-swap based on the following criteria: %space already pre-swapped, is disk spinning(laptop bat life), is disk busy (try to use idle time)

Only under heavy memory contention would the system need to fall back to normal swapping.

Thanks
Adam Ashenfelter

Good idea?

Con Kolivas
on
October 27, 2003 - 1:34pm

It does this already under certain circumstances, so yes it is a good idea... ;-)

When?

Anonymous
on
October 27, 2003 - 5:27pm

So when does it do this?

I think the system does swap some memory out to have a little free space(Top shows free space even though I have swap used), but that would be different, because if you need the pages that are swapped out you have to swap them in. With preswap, the pre-swapped pages would be instantly available with no performance degredataion (as fast or nearly as fast as free pages).

We could preswap (my def) a lot more space with little performance degredation. The only degredation would be from write operations to pre-swapped pages which cause a page faults. Also pages brought in from swap on read operations would be marked as pre-swapped also.

So for maximum performance on low memory systems, we would want to pre-swap all memory pages that aren't activily being written to (granted we have enough swap space). Very little time would have to be wasted writing swap under memory contention. If we have heavy memory contention, some pages that are being modified might have to be written out.

Thanks
Adam Ashenfelter

Swap/on ram

Con Kolivas
on
October 28, 2003 - 2:15am

No, it does swap out some pages and keep them in ram under pressure, but only reclaims the physical ram if it's actually needed, and then keeps only the reference to the swapped version (or vice versa). Doing it just for the sake of it is of questionable benefit.. what if you happen to start doing something important while it's wasting time writing to swap? This is never free activity as it cannot be interrupted cheaply and maintain low latency.

Would it depend on pre-swap thread?

Anonymous
on
October 28, 2003 - 7:20pm

I could see that being a problem if you were to send every pre-swap candidate to the drive at once. If instead the preswap thread were to schedule a few at a time and wait until completion, wouldn't the latency caused by this be minimal, and only related to access to the hard drives with the swap partitions? This would be a slower way , but since nothing is waiting on the preswap thread it will not matter.

So the pre-swap thread would use an algorithm similar to the following.

Wake up.
is preswap level > Y sleep and restart # Y is preswap level parameter
Do we have and empty queue on a drive with free swap space else sleep and restart.
pick X candidate pages for pre-swap. # X is a tunable parameter

mark swap pages as pre-swap in progress and read only, and send pages to drive
Sleep until complete
Mark pages that are still read only as pre-swapped.
begin again.

The one thing i'm not sure about is using a page that is in the harddrive queue. Can a proccess access a memory page at the same time it is being copied to the harddrive? If it is possible, and a process needs to update the page, we would pagefault, and reset the page as read/write, and remove the pre-swap flag. The bottome half of the pre-swap thread would notice the change, and not mark the page as pre-swapped. The swapped page would discarded from the swap file (Mark unused etc, we can't detemin if it's a good swap page).

Benefits (all when system would normally use swap)
1. A process allocates more memory.
(standard) Process sleeps while pages are writen to swap (Hard drive speeds) + swap file accounting.
(pr-swap) Process sleeps while pages are allocated from pre-swap(CPU speeds)
2. Program loading
(standard) For every page read in, a page is writen to swap 2x amount of drive activity + swap file accounting.
(pre-swap) Program is read into pre-swapped same drive activity as loading without memory contention.
3. Swap in. A program requires a page.
(standard) Other page is first written to swap to make room. Page is then read from drive ( a read and a write for every swap in, and the swap structure is updated)
(pre-swap) Page is read in, and marked as pre-swapped (a swap read, no swap file accounting)

On paper, pre-swap looks like it will generate 1/2 the swap file activity during memory contention ( I get the feeling this would improve performace on machines with low memory). Memory pages that are not updated will remain in pre-swap until a process exits, so the pre-swap thread will have less work to do the older the processes are on the system.

Cons.

The system would be more likely to swap non-updating pages (program code)
Extra cpu pagefaults generated while updating pre-swapped pages.
The swap files might be used up faster than a normal system (Durring high swap file use, pre-swapped pages would have to be thrown out of the swap file to make space).

Anyway, am I trivializing everything, and that is why it sounds like a good idea?
Thanks for taking the time to discuss it.
Adam Ashenfelter

Writing is costly

Con Kolivas
on
October 28, 2003 - 7:31pm

No, writing to disk anything at any time no matter how small is costly if you have to interrupt it to do something important. Unless the machine is certain nothing is going to happen when it tries to write then there is no way of guaranteeing that this will be for free. If you had a spare cpu and a spare drive and spare bus all dedicated to this one task... anyway you get the idea.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.