Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock

Previous thread: [RFC PATCH 0/4] Implementation of IR support using the input subsystem by Jon Smirl on Monday, September 29, 2008 - 9:17 am. (7 messages)

Next thread: [PATCH] x86: enable CPUID on Cyrix cpus with CPUID disabled by Krzysztof Helt on Monday, September 29, 2008 - 10:24 am. (3 messages)
From: Gerald Schaefer
Date: Monday, September 29, 2008 - 10:10 am

Hi,

is zone->lru_lock really the right lock to take in setup_per_zone_pages_min()?
All other functions in mm/page_alloc.c take zone->lock instead, for working
with page->lru free-list or PageBuddy().

setup_per_zone_pages_min() eventually calls move_freepages(), which is also
manipulating the page->lru free-list and checking for PageBuddy(). Both
should be protected by zone->lock instead of zone->lru_lock, if I understood
that right, or else there could be a race with the other functions in
mm/page_alloc.c.

We ran into a list corruption bug in free_pages_bulk() once, during memory
hotplug stress test, but cannot reproduce it easily. So I cannot verify if
using zone->lock instead of zone->lru_lock would fix it, but to me it looks
like this may be the problem.

Any thoughts?

BTW, I also wonder if a spin_lock_irq() would be enough, instead of
spin_lock_irqsave(), because this function should never be called from
interrupt context, right?

Thanks,
Gerald


--

From: Andy Whitcroft
Date: Monday, September 29, 2008 - 10:36 am

The allocator protects it freelists using zone->lock (as we can see in
rmqueue_bulk), so anything which manipulates those should also be using
that lock.  move_freepages() is scanning the cmap and picking up free
pages directly off the free lists, it is expecting those lists to be
stable; it would appear to need zone->lock.  It does look like
setup_per_zone_pages_min() is holding the wrong thing at first look.

-apw
--

From: Gerald Schaefer
Date: Monday, September 29, 2008 - 2:20 pm

I just noticed that the spin_lock in that function is much older than the
call to setup_zone_migrate_reserve(), which then calls move_freepages().
So it seems that the zone->lru_lock there does (did?) have another purpose,
maybe protecting zone->present_pages/pages_min/etc.

Looks like the need for a zone->lock (if any) was added later, but I'm not
sure if makes sense to take both locks together, or if the lru_lock is still
needed at all.

Thanks,
Gerald


--

From: KAMEZAWA Hiroyuki
Date: Monday, September 29, 2008 - 5:40 pm

On Mon, 29 Sep 2008 23:20:05 +0200
At first look, replacing zone->lru_lock with zone->lock is enough...
This function is an only one function which use zone->lru_lock in page_alloc.c
And zone_watermark_ok() which access zone->pages_min/low/high is not under any
locks. So, taking zone->lru_lock here doesn't seem to be necessary...


Thanks,
-Kame

--

From: Yasunori Goto
Date: Monday, September 29, 2008 - 6:53 pm

The zone->lru_lock() have been used before memory hotplug code was

I agree.


Bye.
-- 
Yasunori Goto 


--

From: Gerald Schaefer
Date: Wednesday, October 1, 2008 - 10:39 am

From: Gerald Schaefer <gerald.schaefer@de.ibm.com> 

This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
There seems to be no need for the lru_lock anymore, but there is a need for
zone->lock instead, because that function may call move_freepages() via
setup_zone_migrate_reserve().

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>

---
 mm/page_alloc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
 	for_each_zone(zone) {
 		u64 tmp;
 
-		spin_lock_irqsave(&zone->lru_lock, flags);
+		spin_lock_irqsave(&zone->lock, flags);
 		tmp = (u64)pages_min * zone->present_pages;
 		do_div(tmp, lowmem_pages);
 		if (is_highmem(zone)) {
@@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
 		zone->pages_low   = zone->pages_min + (tmp >> 2);
 		zone->pages_high  = zone->pages_min + (tmp >> 1);
 		setup_zone_migrate_reserve(zone);
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+		spin_unlock_irqrestore(&zone->lock, flags);
 	}
 
 	/* update totalreserve_pages */


--

From: KAMEZAWA Hiroyuki
Date: Wednesday, October 1, 2008 - 10:49 pm

On Wed, 01 Oct 2008 19:39:32 +0200
Thank you!.


--

From: Yasunori Goto
Date: Thursday, October 2, 2008 - 3:00 am

Thanks!


-- 
Yasunori Goto 


--

Previous thread: [RFC PATCH 0/4] Implementation of IR support using the input subsystem by Jon Smirl on Monday, September 29, 2008 - 9:17 am. (7 messages)

Next thread: [PATCH] x86: enable CPUID on Cyrix cpus with CPUID disabled by Krzysztof Helt on Monday, September 29, 2008 - 10:24 am. (3 messages)