Re: ~500 megs cached yet 2.6.5 goes into swap hell

Submitted by kmerley
on June 22, 2005 - 6:24pm

In April 2004 this issue was discussed, and some record of it is at:

http://lwn.net/Articles/83593/

Well, part of the explanation may be in what meminfo and free believe is "cached." I made four 61MB (default size in this distro) ramdisks and filled them to over 98% capacity with regular files. I had 328 MB main RAM, and these ramdisks took up about 240 MB of that, leaving a high estimate of 100 MB for regular RAM usage. Well, meminfo and free both report over 240 MB cached even though there are less than 100 MB in which there can be any reclaimable cache and a lot of that memory is for the operating system and KDE.

More Details:

Here is what I just did. I made four 61 MB ramdisks in my system with 328 MB or RAM. I filled all the ramdisks up to 98% of capacity, so I know that at least 98% of the ramdisk areas are populated with .jpg files. So I know that at least 230 MB of my 328 MB of RAM is taken up in actual stored files. This means there is only 328-230 or 98 MB max available for regular RAM use. Despite this, free is telling me I have 254 MB cached, 8 MB in buffers and 4 MB free. Meminfo says I have 271413248 bytes in cache (258 MB). So either one uses a different definition of cached than what people think it should mean, or it is just wrong. There is no way that with at least 230 MB out of 328 MB of RAM being locked in ramdisks, that there can be 258 MB or so in what we think of as cached, that is, memory that could be reclaimed and used for new applications, etc. There are only 98 MB not in ramdisks (remember the ramdisks are purposely filled to over 98% capacity with actual files just to make sure the memory is locked up and can't be used for caches and buffers). There is no way to fit 258 MB of cached data into 98 MB of RAM available for use as regular RAM.

So something is amiss here.

But this does explain how someone can think they have ~500 megs cached yet 2.6.x still goes into swap hell. I mean, meminfo and free both report the 500 Megs, but perhaps they really only have 10 MB cached and just 4 MB regular RAM free, and that would result in swap hell.

Below are the outputs of the df -h, free -mt, and cat /proc/meminfo commands on my system that has 4 approx 61 MB ramdisks 98% filled with regular files. Swap doesn't even enter into this equation.

-------------------------------------------------------------

linux:~ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda5 17G 3.1G 13G 20% /
/dev/hda1 9.8G 7.8G 2.1G 80% /windows/C
tmpfs 165M 0 165M 0% /dev/shm
/dev/ram1 61M 60M 1.4M 98% /sw1
/dev/ram2 61M 60M 1.4M 98% /sw2
/dev/ram3 61M 60M 1.4M 98% /sw3
/dev/ram4 61M 60M 1.4M 98% /sw4
linux:~ # free -mt
total used free shared buffers cached
Mem: 328 323 5 0 8 254
-/+ buffers/cache: 60 268
Swap: 511 21 489
Total: 840 345 495
linux:~ # cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 344920064 339152896 5767168 0 9150464 271413248
Swap: 536657920 22999040 513658880
MemTotal: 336836 kB
MemFree: 5632 kB
MemShared: 0 kB
Buffers: 8936 kB
Cached: 260328 kB
SwapCached: 4724 kB
Active: 237848 kB
Inactive: 73980 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 336836 kB
LowFree: 5632 kB
SwapTotal: 524080 kB
SwapFree: 501620 kB
BigFree: 0 kB

-------------------------------------------------------------------

The ramdisks show up as /dev/ramx.

Kim

Very Interesting

Anony_mous (not verified)
on
June 25, 2005 - 9:52am

If this is true it explains a lot. It would explain what I've seen with VMWare.

simple explanation...

on
June 25, 2005 - 11:05am

The type of ramdisk you are using lives in page cache. When you write to a file, on ram disk or normal disk, the data is first inserted into the page cache. If the memory is needed and there is a backing store (e.g. a hard disk), the pages are written out. Ram disks don't have a backing store and simply leave their data in the page cache, i.e. ram (what you would expect...). This way they don't waste memory, if they are part empty, the implementation is simple and their contents can even be swappend out.

Obviously the contents of your ram disks (230M) are counted as cached (aka page cache) (258M), and everything is well.

If you _really_ want to have a disk-like ram-disk (which wastes the memory not used by files and is not swapped out, if memory is needed for better purposes), you have to use the mtdblock and slram (or phram) drivers and boot with a mem= kernel argument to prevent the kernel from using it for normal pages. But if this is perfectly normal cachable ram i never found a reason to do this, only did this with uncachable (slow) ram (as swap space) or broken memory modules (to get a scratch disk for unimportant data).

The Problem Still Exists, Though

on
June 27, 2005 - 11:58am

This comment from strcmp is very good and thoughtful. It explains some things.

It may be that "Obviously the contents of your ram disks (230M) are counted as cached (aka page cache) (258M), and everything is well."

Perhaps it is all working like it is supposed to work.

But this makes the output of meminfo and free very misleading. It makes people feel like they have a large amount of memory that can be reclaimed when it can't.

I would assume that a tool to find the amount of memory in caches would be able to differentiate between locked caches and caches that can be reclaimed. But these cannot do that apparently. So the usefulness of those tools is severely diminished. They tell you what is in cache, but not how much can be reclaimed, so the information of how many MB are in caches is of limited usefulness and may mislead. Apparently at least the author of the email entitled "~500 Megs in Caches Yet 2.6.5 Goes Into Swap Hell" didn't know of this behavior. And most people apparently assume that the amount shown to be in caches is at least mostly reclaimable. But a full ramdisk is not reclaimable.

So, that is partly how this "Is Swap Necessary" all got started, on the basis of what free and meminfo report. It may also be why people are reporting problems with VMWare even though free and meminfo report multiple hundreds of MB in caches. VMWare probably makes a lot of the memory it uses non-reclaimable cache.

Anyway, this explanation seems reasonable, and as long as people understand that what they get from free and meminfo is not really telling them about non-reclaimable cache, that would help a lot. Then a tool that does report on non-reclaimable cache would be an even larger help.

From "Linux Device Drivers" Rubini & Corbet

on
June 28, 2005 - 4:25pm

From "Linux Device Drivers" Rubini & Corbet

On page 336, in Chapter 12: Loading Block Drivers:

"Every block passed to a driver's request function either lives in the buffer cache, or, on rare occasion, lives elsewhere but has been made to look as if it lived in the buffer cache.*"

then the footnote:

"* The RAM-disk driver, for example, makes its memory look as if it were in the buffer cache. Since the "disk" buffer is already in system RAM, there's no need to keep a copy in the buffer cache."

So the RAM-disk makes its memory look as if it were in buffer cache.

So free and meminfo believe that "as if it were" and report that we have at least as much buffer cache as is in the RAM-disk. It is not usable by other programs, it is not reclaimable, but it is "as if it were" buffer cache.

So, does VMWare make its memory appear as if it were buffer cache? Other large programs too?

So when free says -/+ buffers, cache, it may be, and if you are using a ramdisk or apparently VMWare, it is not free at all. Most of the buffers/cache shown will not be usable, so forget it. You don't really have 500 Megs in cache that is reclaimable. A lot of that 500 MB is locked for use.

So the swapping was not the problem. When there is only 10 MB out of the 500 Megs that is actually reclaimable page cache, and there is only 10 MB free, maybe 8 MB in buffers, you are in a world of low memory hurt. That is why is it swapping. That is what is has to do. Nothing is working incorrectly, but the results of free and meminfo are quite misleading.

The True Amount of "Free" Memory?

Anonymous* (not verified)
on
June 30, 2005 - 5:34pm

So, how can we tell how much "free" memory there is? How can we know how close we are to running out of memory by ways other than going into swap hell or getting OOMed.

Wow, this would explain OOMs when there is allegedly 300 MB in caches.

And now someone will probably say "everything is working the way it is supposed to." How about making it so we can see why there is swap hell or OOM? Instead of being told there are 300 MB in caches? That is how I would think it should be "supposed to" work.

I'm not aware of any way. Mem

AnonymousC (not verified)
on
July 1, 2005 - 6:09am

I'm not aware of any way. Memory management has so many kinds of memory that it's almost forbidden to think about "free" memory, which you correctly quoted. The only kind of free memory is the totally unused memory: nothing is hurting if you eat into the totally unused region.

After that it's all tradeoffs, and something is always hurting by having its share of memory shrinking. Whether it's interesting to you depends on what kind of code you are running at that moment.

So yeah I guess someone should come up with ways to classify workloads and then measure system metrics based of workload's type to determine whether your Linux system is performing optimally.

Less Misleading meminfo/free

on
July 5, 2005 - 2:58am

That could be the problem, that it is not easy to tell what is locked "cache", and what is cache that can be removed/reclaimed.

Perhaps it is too time expensive to differentiate these because the kernel has to test each allocation block to see what owns it, and if it can be released. It wouldn't be acceptable for free and meminfo to slow machine operation for 10 seconds or more just to give a more accurate picture of memory use.

I know that to reclaim pages this time-expensive process has to be done, so apparently it follows that to have this memory use type information for meminfo and free would require this time-expensive process. Of course, if there were a small change so that the locked status is in an easy to read table, then meminfo and free could easily obtain that info. It might also make page reclaimation faster and easier too.

Just looking at it from the outside, it seems like a ramdisk should not be showing up as being composed of cache. It is locked.

How to differentiate reclaimable?

Anonymous! (not verified)
on
July 13, 2005 - 12:30pm

Well, how can we differentiate what is reclaimable? The kernel only does that when it absolutely needs the memory, and it only does it to the extent it needs to in order to get the needed memory.

Why isn't this better documented?

Anonymous* (not verified)
on
August 12, 2005 - 12:03pm

Why isn't this better documented. I don't see this mentioned in the man pages. Did I miss it?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.