"The current VM can get itself into trouble fairly easily on systems with a small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory," Rik van Riel said explaining a small patch to cmscan.c. He continued, "on one side, page_alloc() will allocate down to zone->pages_low, while on the other side, kswapd() and balance_pgdat() will try to free memory from every zone, until every zone has more free pages than zone->pages_high." He noted that highmem could be filled up with "page tables, ramfs, vmalloc allocations and other unswappable things quite easily and without many bad side effects, since we still have a huge ZONE_NORMAL to do future allocations from. However, as long as the number of free pages in the highmem zone is below zone->pages_high, kswapd will continue swapping things out from ZONE_NORMAL, too! Sami Farin managed to get his system into a stage where kswapd had freed about 700MB of low memory and was still 'going strong'." He described his patch:
"The attached patch will make kswapd stop paging out data from zones when there is more than enough memory free. We do go above zone->pages_high in order to keep pressure between zones equal in normal circumstances, but the patch should prevent the kind of excesses that made Sami's computer totally unusable."
From: Rik van Riel [email blocked] Subject: [PATCH] prevent kswapd from freeing excessive amounts of lowmem Date: Wed, 05 Sep 2007 19:01:25 -0400 The current VM can get itself into trouble fairly easily on systems with a small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory. On one side, page_alloc() will allocate down to zone->pages_low, while on the other side, kswapd() and balance_pgdat() will try to free memory from every zone, until every zone has more free pages than zone->pages_high. Highmem can be filled up to zone->pages_low with page tables, ramfs, vmalloc allocations and other unswappable things quite easily and without many bad side effects, since we still have a huge ZONE_NORMAL to do future allocations from. However, as long as the number of free pages in the highmem zone is below zone->pages_high, kswapd will continue swapping things out from ZONE_NORMAL, too! Sami Farin managed to get his system into a stage where kswapd had freed about 700MB of low memory and was still "going strong". The attached patch will make kswapd stop paging out data from zones when there is more than enough memory free. We do go above zone->pages_high in order to keep pressure between zones equal in normal circumstances, but the patch should prevent the kind of excesses that made Sami's computer totally unusable. Please merge this into -mm. Signed-off-by: Rik van Riel [email blocked] --- linux-2.6.22.noarch/mm/vmscan.c.excessive 2007-09-05 12:19:49.000000000 -0400 +++ linux-2.6.22.noarch/mm/vmscan.c 2007-09-05 12:21:40.000000000 -0400 @@ -1371,7 +1371,13 @@ loop_again: temp_priority[i] = priority; sc.nr_scanned = 0; note_zone_scanning_priority(zone, priority); - nr_reclaimed += shrink_zone(priority, zone, &sc); + /* + * We put equal pressure on every zone, unless one + * zone has way too many pages free already. + */ + if (!zone_watermark_ok(zone, order, 8*zone->pages_high, + end_zone, 0)) + nr_reclaimed += shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
From: Andrew Morton [email blocked] Subject: Re: [PATCH] prevent kswapd from freeing excessive amounts of lowmem Date: Wed, 5 Sep 2007 18:23:05 -0700 > On Wed, 05 Sep 2007 19:01:25 -0400 Rik van Riel [email blocked] wrote: > The current VM can get itself into trouble fairly easily on systems > with a small ZONE_HIGHMEM, which is common on i686 computers with > 1GB of memory. > > On one side, page_alloc() will allocate down to zone->pages_low, > while on the other side, kswapd() and balance_pgdat() will try > to free memory from every zone, until every zone has more free > pages than zone->pages_high. > > Highmem can be filled up to zone->pages_low with page tables, > ramfs, vmalloc allocations and other unswappable things quite > easily and without many bad side effects, since we still have > a huge ZONE_NORMAL to do future allocations from. > > However, as long as the number of free pages in the highmem > zone is below zone->pages_high, kswapd will continue swapping > things out from ZONE_NORMAL, too! crap. I guess suitably-fashioned mlock could do the same thing. > Sami Farin managed to get his system into a stage where kswapd > had freed about 700MB of low memory and was still "going strong". > > The attached patch will make kswapd stop paging out data from > zones when there is more than enough memory free. hm. Did highmem's all_unreclaimable get set? If so perhaps we could use that in some way. > We do go above > zone->pages_high in order to keep pressure between zones equal > in normal circumstances, but the patch should prevent the kind > of excesses that made Sami's computer totally unusable. > > Please merge this into -mm. > > Signed-off-by: Rik van Riel [email blocked] > > > [linux-2.6-excessive-pageout.patch text/x-patch (715B)] > --- linux-2.6.22.noarch/mm/vmscan.c.excessive 2007-09-05 12:19:49.000000000 -0400 > +++ linux-2.6.22.noarch/mm/vmscan.c 2007-09-05 12:21:40.000000000 -0400 > @@ -1371,7 +1371,13 @@ loop_again: > temp_priority[i] = priority; > sc.nr_scanned = 0; > note_zone_scanning_priority(zone, priority); > - nr_reclaimed += shrink_zone(priority, zone, &sc); > + /* > + * We put equal pressure on every zone, unless one > + * zone has way too many pages free already. > + */ > + if (!zone_watermark_ok(zone, order, 8*zone->pages_high, > + end_zone, 0)) > + nr_reclaimed += shrink_zone(priority, zone, &sc); > reclaim_state->reclaimed_slab = 0; > nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, > lru_pages); I guess for a very small upper zone and a very large lower zone this could still put the scan balancing out of whack, fixable by a smarter version of "8*zone->pages_high" but it doesn't seem very likely that this will affect things much. Why doesn't direct reclaim need similar treatment?
From: Rik van Riel [email blocked] Subject: Re: [PATCH] prevent kswapd from freeing excessive amounts of lowmem Date: Thu, 06 Sep 2007 12:38:13 -0400 Andrew Morton wrote: > I guess for a very small upper zone and a very large lower zone this could > still put the scan balancing out of whack, fixable by a smarter version of > "8*zone->pages_high" but it doesn't seem very likely that this will affect > things much. > > Why doesn't direct reclaim need similar treatment? Because we only go into the direct reclaim path once every zone is at or below zone->pages_low, and the direct reclaim path will exit once we have freed more than swap_cluster_max pages. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic.
From: Andrew Morton [email blocked] Subject: Re: [PATCH] prevent kswapd from freeing excessive amounts of lowmem Date: Thu, 6 Sep 2007 15:34:26 -0700 > On Thu, 06 Sep 2007 12:38:13 -0400 Rik van Riel [email blocked] wrote: > Andrew Morton wrote: > (What happened to the other stuff I said?) > > I guess for a very small upper zone and a very large lower zone this could > > still put the scan balancing out of whack, fixable by a smarter version of > > "8*zone->pages_high" but it doesn't seem very likely that this will affect > > things much. > > > > Why doesn't direct reclaim need similar treatment? > > Because we only go into the direct reclaim path once > every zone is at or below zone->pages_low, and the > direct reclaim path will exit once we have freed more > than swap_cluster_max pages. > hm. Now I need to remember why direct-reclaim does that :(
From: Rik van Riel [email blocked] Subject: Re: [PATCH] prevent kswapd from freeing excessive amounts of lowmem Date: Thu, 06 Sep 2007 18:47:30 -0400 Andrew Morton wrote: >> On Thu, 06 Sep 2007 12:38:13 -0400 Rik van Riel [email blocked] wrote: >> Andrew Morton wrote: > > (What happened to the other stuff I said?) Mlock can cause the problem too. As for all_unreclaimable, it is ignored when priority == DEF_PRIORITY, balance_pgdat always seems to start in this stage. >>> I guess for a very small upper zone and a very large lower zone this could >>> still put the scan balancing out of whack, fixable by a smarter version of >>> "8*zone->pages_high" but it doesn't seem very likely that this will affect >>> things much. >>> >>> Why doesn't direct reclaim need similar treatment? >> Because we only go into the direct reclaim path once >> every zone is at or below zone->pages_low, and the >> direct reclaim path will exit once we have freed more >> than swap_cluster_max pages. >> > > hm. Now I need to remember why direct-reclaim does that :( This is done so the system does not end up with the first process that goes into page reclaim staying there forever, while the other processes in the system happily consume the pages freed by that poor first process. There may be other reasons, too. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic.