> Rik van Riel (
riel@redhat.com) wrote:
>> On 11/01/2010 03:43 PM, Mandeep Singh Baines wrote:
>>
>> >Yes, this prevents you from reclaiming the active list all at once. But if the
>> >memory pressure doesn't go away, you'll start to reclaim the active list
>> >little by little. First you'll empty the inactive list, and then
>> >you'll start scanning
>> >the active list and pulling pages from inactive to active. The problem is that
>> >there is no minimum time limit to how long a page will sit in the inactive list
>> >before it is reclaimed. Just depends on scan rate which does not depend
>> >on time.
>> >
>> >In my experiments, I saw the active list get smaller and smaller
>> >over time until eventually it was only a few MB at which point the system came
>> >grinding to a halt due to thrashing.
>>
>> I believe that changing the active/inactive ratio has other
>> potential thrashing issues. Specifically, when the inactive
>> list is too small, pages may not stick around long enough to
>> be accessed multiple times and get promoted to the active
>> list, even when they are in active use.
>>
>> I prefer a more flexible solution, that automatically does
>> the right thing.
>>
>> The problem you see is that the file list gets reclaimed
>> very quickly, even when it is already very small.
>>
>> I wonder if a possible solution would be to limit how fast
>> file pages get reclaimed, when the page cache is very small.
>> Say, inactive_file * active_file < 2 * zone->pages_high ?
>>
>> At that point, maybe we could slow down the reclaiming of
>> page cache pages to be significantly slower than they can
>> be refilled by the disk. Maybe 100 pages a second - that
>> can be refilled even by an actual spinning metal disk
>> without even the use of readahead.
>>
>> That can be rounded up to one batch of SWAP_CLUSTER_MAX
>> file pages every 1/4 second, when the number of page cache
>> pages is very low.
>>
>> This way HPC and virtual machine hosting nodes can still
>> get rid of totally unused page cache, but on any system
>> that actually uses page cache, some minimal amount of
>> cache will be protected under heavy memory pressure.
>>
>> Does this sound like a reasonable approach?
>>
>> I realize the threshold may have to be tweaked...
>>
>> The big question is, how do we integrate this with the
>> OOM killer? Do we pretend we are out of memory when
>> we've hit our file cache eviction quota and kill something?
>>
>> Would there be any downsides to this approach?
>>
>> Are there any volunteers for implementing this idea?
>> (Maybe someone who needs the feature?)
>>
>
> I've created a patch which takes a slightly different approach.
> Instead of limiting how fast pages get reclaimed, the patch limits
> how fast the active list gets scanned. This should result in the
> active list being a better measure of the working set. I've seen
> fairly good results with this patch and a scan inteval of 1
> centisecond. I see no thrashing when the scan interval is non-zero.
>
> I've made it a tunable because I don't know what to set the scan
> interval. The final patch could set the value based on HZ and some
> other system parameters. Maybe relate it to sched_period?
>
> ---
>
> [PATCH] vmscan: add a configurable scan interval
>
> On ChromiumOS, we see a lot of thrashing under low memory. We do not
> use swap, so the mm system can only free file-backed pages. Eventually,
> we are left with little file back pages remaining (a few MB) and the
> system becomes unresponsive due to thrashing.
>
> Our preference is for the system to OOM instead of becoming unresponsive.
>
> This patch create a tunable, vmscan_interval_centisecs, for controlling
> the minimum interval between active list scans. At 0, I see the same
> thrashing. At 1, I see no thrashing. The mm system does a good job
> of protecting the working set. If a page has been referenced in the
> last vmscan_interval_centisecs it is kept in memory.
>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>