Re: [PATCH 2/6] change alloc function in pcpu_alloc_pages

Previous thread: Re: [PATCH] drivers:staging: sources for ST core by Pavan Savoy on Tuesday, April 13, 2010 - 8:06 am. (2 messages)

Next thread: [PATCH]: x86: remove extra bootmem.h from arch/x86/mm/init_64.c by Prarit Bhargava on Tuesday, April 13, 2010 - 8:47 am. (2 messages)
From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:24 am

alloc_pages calls alloc_pages_node with numa_node_id().
alloc_pages_node can't see nid < 0.

So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/gfp.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..b65f003 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr);
 #else
 #define alloc_pages(gfp_mask, order) \
-		alloc_pages_node(numa_node_id(), gfp_mask, order)
+		alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
 #define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
-- 
1.7.0.5

--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:24 am

alloc_pages_node is called with cpu_to_node(cpu).
I think cpu_to_node(cpu) never returns -1.
(But I am not sure we need double check.)

So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/percpu.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 768419d..ec3e671 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
 		for (i = page_start; i < page_end; i++) {
 			struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
 
-			*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+			*pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
 			if (!*pagep) {
 				pcpu_free_pages(chunk, pages, populated,
 						page_start, page_end);
-- 
1.7.0.5

--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 8:48 am

Well, numa_node_id() is implemented as

#ifndef numa_node_id
#define numa_node_id()          (cpu_to_node(raw_smp_processor_id()))
#endif

and the mapping table on x86 at least is based on possible CPUs in
init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
node is reported in apicid_to_node as -1. It would appear on power that
the node will be 0 for possible CPUs as well.

Hence, I believe this to be safe but a confirmation from Tejun would be
nice. I would continue digging but this looks like an initialisation path

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Tejun Heo
Date: Wednesday, April 14, 2010 - 4:39 pm

Hello,


This being a pretty cold path, I don't really see much benefit in
converting it to alloc_pages_node_exact().  It ain't gonna make any
difference.  I'd rather stay with the safer / boring one unless
there's a pressing reason to convert.

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Wednesday, April 14, 2010 - 6:31 pm

Hi, Tejun.


Actually, It's to weed out not-good API usage as well as some performance gain.
But I don't think to need it strongly.
Okay. Please keep in mind about this and correct it if you confirms it



-- 
Kind regards,
Minchan Kim
--

From: Tejun Heo
Date: Thursday, April 15, 2010 - 12:21 am

Hello,


Hmm... if most users are converting over to alloc_pages_node_exact(),
I think it would be better to convert percpu too.  I thought it was a
performance optimization (of rather silly kind too).  So, this is to
weed out -1 node id usage?  Wouldn't it be better to update
alloc_pages_node() such that it whines once per each caller if it's
called with -1 node id and after updating most users convert the
warning into WARN_ON_ONCE()?  Having two variants for this seems
rather extreme to me.

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Thursday, April 15, 2010 - 1:00 am

Yes. I don't like it.
With it, someone who does care about API usage uses alloc_pages_exact_node but
someone who don't have a time or careless uses alloc_pages_node.
It would make API fragmentation and not good.
Maybe we can weed out -1 and make new API which is more clear.

* struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
* struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);

So firstly we have to make sure users who use alloc_pages_node can
change alloc_pages_node with alloc_pages_exact_node.

After all of it was weed out, I will change alloc_pages_node with
alloc_pages_any_node.


-- 
Kind regards,
Minchan Kim
--

From: Tejun Heo
Date: Thursday, April 15, 2010 - 1:15 am

Hello,


I'm not an expert on that part of the kernel but isn't
alloc_pages_any_node() identical to alloc_pages_exact_node()?  All
that's necessary to do now is to weed out callers which pass in
negative nid to alloc_pages_node(), right?  If so, why not just do a
clean sweep of alloc_pages_node() users and update them so that they
don't call in w/ -1 nid and add WARN_ON_ONCE() in alloc_pages_node()?
Is there any reason to keep both variants going forward?  If not,
introducing new API just to weed out invalid usages seems like an
overkill.

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Thursday, April 15, 2010 - 2:40 am

alloc_pages_any_node means user allows allocated pages in any
node(most likely current node)
alloc_pages_exact_node means user allows allocated pages in nid node

It might be my final goal. I hope user uses alloc_pages_any_node

I am not sure someone still need alloc_pages_node. That's because
sometime he want to allocate page from specific node but sometime not.

It might be.

It think it's almost same add_to_page_cache and add_to_page_cache_locked.
If user knows the page is already locked, he can use
add_to_page_cache_locked for performance gain and code readability
which we need to lock the page before calling it.
The important point is that user uses it as he is conscious of locked page.
I think if user already know to want page where from specific node, it
would be better to use alloc_pages_exact_node instead of
alloc_pages_node.

If he want to allocate page from any node[current..fallback list], he
have to use alloc_pages_any_node without nid argument. It would make a

Now, most of user uses alloc_pages_exact_node. So I think we can do it.
But someone else also might think of overkill. :)
It's a not urgent issue. So I will take it easy.

Thanks.

-- 
Kind regards,
Minchan Kim
--

From: Tejun Heo
Date: Thursday, April 15, 2010 - 3:08 am

Hello,


Ooh, sorry, I meant alloc_pages().  What would be the difference

Yeah, if both APIs are necessary at the end of the conversion, sure.
I was just saying that introducing new APIs just to weed out invalid
usages and then later killing the old API would be rather excessive.

I was just wondering whether we could just clean up alloc_pages_node()
users and kill alloc_pages_exact_node().

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Thursday, April 15, 2010 - 3:21 am

It's no different. It's same. Just naming is more explicit. :)
I think it could be following as.

#define alloc_pages alloc_pages_any_node.
strucdt page * alloc_pages_node() {
   int nid = numa_node_id();
   ...
   return page;

kill alloc_pages_exact_node?
Sorry but I can't understand your point.
I don't want to kill user of alloc_pages_exact_node.
That's opposite.
I want to kill user of alloc_pages_node and change it with
alloc_pages_any_node or alloc_pages_exact_node. :)

I think we can do it. That's because all of caller already can check nid == -1
before calling allocation function explicitly if he cares node locality.

-- 
Kind regards,
Minchan Kim
--

From: Minchan Kim
Date: Thursday, April 15, 2010 - 3:33 am

typo. Sorry.


-- 
Kind regards,
Minchan Kim
--

From: Tejun Heo
Date: Thursday, April 15, 2010 - 4:43 am

Hello,


I see, so...

 alloc_pages()		-> alloc_pages_any_node()
 alloc_pages_node()	-> alloc_pages_exact_node()

right?  It just seems strange to me and different from usual naming
convention - ie. something which doesn't care about nodes usually
doesn't carry _node postfix.  Anyways, no big deal, those names just
felt a bit strange to me.

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Thursday, April 15, 2010 - 4:49 am

I don't want to remove alloc_pages for UMA system.
#define alloc_pages alloc_page_sexact_node

What I want to remove is just alloc_pages_node. :)
Sorry for confusing you.

-- 
Kind regards,
Minchan Kim
--

From: Christoph Lameter
Date: Friday, April 16, 2010 - 9:07 am

Why remove it? If you want to get rid of -1 handling then check all the
callsites and make sure that they are not using  -1.

Also could you define a constant for -1? -1 may have various meanings. One
is the local node and the other is any node. The difference is if memory
policies are obeyed or not. Note that alloc_pages follows memory policies
whereas alloc_pages_node does not.

Therefore

alloc_pages() != alloc_pages_node(  , -1)

--

From: Lee Schermerhorn
Date: Friday, April 16, 2010 - 12:13 pm

NUMA_NO_NODE is #defined as (-1) and can be used for this purpose.  '-1'
has been replaced by this in many cases.   It can be interpreted as "No
node specified" == "any node is acceptable".  But, it also has multiple
meanings.  E.g., in the hugetlb sysfs attribute and sysctl functions it
indicates the global hstates [all nodes] vs a per node hstate.  So, I
suppose one could define a NUMA_ANY_NODE, to make the intention clear at
the call site.

I believe that all usage of -1 to mean the local node has been removed,
unless I missed one.  Local allocation is now indicated by a mempolicy
mode flag--MPOL_F_LOCAL.  It's treated as a special case of

--

From: Minchan Kim
Date: Sunday, April 18, 2010 - 8:55 am

Hi, Lee. 


Thanks for good information. :)

-- 
Kind regards,
Minchan Kim


--

From: Minchan Kim
Date: Sunday, April 18, 2010 - 8:54 am

Hi, Christoph. 


I don't want to force using '_node' postfix on UMA users.
Maybe they don't care getting page from any node and event don't need to

alloc_pages_node have multiple meaning as you said. So some of users


Yes, now it's totally different. 
On UMA, It's any node but on NUMA, local node.

My concern is following as. 

alloc_pages_node means any node but it has nid argument. 
Why should user of alloc_pages who want to get page from any node pass
nid argument? It's rather awkward. 

Some of user misunderstood it and used alloc_pages_node instead of
alloc_pages_exact_node although he already know exact _NID_. 
Of course, it's not a BUG since if nid >= 0 it works well.

But I want to remove such multiple meaning to clear intention of user. 



-- 
Kind regards,
Minchan Kim


--

From: Tejun Heo
Date: Sunday, April 18, 2010 - 2:22 pm

Yeah, then, remove alloc_pages_any_node().  I can't really see the
point of any_/exact_node.  alloc_pages() and alloc_pages_node() are

The name is fine.  Just clean up the users and make the intended usage
clear in documentation and implementation (ie. trigger a big fat
warning) and make all the callers use named constants instead of -1
for special meanings.

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Sunday, April 18, 2010 - 5:03 pm

Hi, Tejun.


Let's tidy my table.

I made quick patch to show the concept with one example of pci-dma.
(Sorry but I attach patch since web gmail's mangling.)

On UMA, we can change alloc_pages with
alloc_pages_exact_node(numa_node_id(),....)
(Actually, the patch is already merged mmotm)

on NUMA, alloc_pages is some different meaning, so I don't want to change it.
on NUMA, alloc_pages_node means _ANY_NODE_.
So let's remove nid argument and change naming with alloc_pages_any_node.

Then, whole users of alloc_pages_node can be changed between
alloc_pages_exact_node and alloc_pages_any_node.

It was my intention. What's your concern?

Thanks for your interest, Tejun. :)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a4ac764..dc511cb 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -152,12 +152,21 @@ void *dma_generic_alloc_coherent(struct device
*dev, size_t size,
        unsigned long dma_mask;
        struct page *page;
        dma_addr_t addr;
+       int nid;

        dma_mask = dma_alloc_coherent_mask(dev, flag);

        flag |= __GFP_ZERO;
 again:
-       page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+       nid = dev_to_node(dev);
+       /*
+        * If pci-dma maintainer makes sure nid never has NUMA_NO_NODE
+        * we can remove this ugly checking.
+        */
+       if (nid == NUMA_NO_NODE)
+               page = alloc_pages_any_node(flag, get_order(size));
+       else
+               page = alloc_pages_exact_node(nid, flag, get_order(size));
        if (!page)
                return NULL;

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..47fba21 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -278,13 +278,10 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
        return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }

-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
+static inline struct page ...
From: Christoph Lameter
Date: Monday, April 19, 2010 - 10:45 am

UMA does not have the concept of nodes. Whatever node you specify is


It means allocate on the indicated node if possible. Memory could come





This is very confusing. Because it is

	alloc_pages_numa_node_id()


alloca_pages_any_node suggests that the kernel randomly picks a node?
From: Minchan Kim
Date: Monday, April 19, 2010 - 5:20 pm

On Tue, Apr 20, 2010 at 2:45 AM, Christoph Lameter

I didn't change API name. The patch is just for optimization.
http://lkml.org/lkml/2010/4/14/225
I think it's reasonable in UMA.
Why do you want to remove it?

Do you dislike alloc_pages_exact_node naming?
I added comment.
http://lkml.org/lkml/2010/4/14/230
Do you think it isn't enough?

This patch results from misunderstanding of alloc_pages_exact_node.
(http://marc.info/?l=linux-mm&m=127109064101184&w=2)
At that time, I thought naming changing is worth.
But many people don't like it.
Okay. It was just trial and if everyone dislike, I don't have any strong cause.
But this patch series don't relate to it. Again said, It's just for
optimization patch.

Let's clarify other's opinion.

1. "I dislike alloc_pages_exact_node naming. Let's change it with more
clear name."
2. "I hate alloc_pages_exact_node. It's trivial optimization. Let's
remove it and replace it with alloc_pages_node."
3. "alloc_pages_exact_node naming is not bad. Let's add the comment to
clear name"
4. "Let's cleanup alloc_pages_xxx in this change as well as 3.
5. "Please, don't touch. Remain whole of thing like as-is."

I think Chrsitop selects 5 or 1, Tejun selects 2, Mel selects 3, me
want to 4 but is satisfied with 3. Right?

If we selects 5, In future, there are confusing between
alloc_pages_node and alloc_pages_exact_node.So I don't want it.

If we select 2, We already have many place of alloc_pages_exact_node.
And I add this patch series. So most of caller uses alloc_pages_exact_node now.
Isn't it trivial?

So I want 3 at lest although you guys don't like 4.
Please, suggest better idea to me. :)

-- 
Kind regards,
Minchan Kim
--

From: Christoph Lameter
Date: Monday, April 19, 2010 - 10:38 am

Its not awkward but an optimization. The page can be placed on any node
but the user would prefer a certain node. Most of the NUMA things are
there for optimization purposes and not for correctness. If you must have
an allocation on certain nodes for correctness (like SLAB) then

Its not clear to me that this renaming etc helps. You
must use GFP_THISNODE if allocation must occur from a certain node.
alloc_pages_exact_node results in more confusion because it does suggest
that fallback to other nodes is not allowed.


--

From: Tejun Heo
Date: Monday, April 19, 2010 - 3:27 pm

Hello, Christoph.


I can't see why alloc_pages_exact_node() exists at all.  It's in the
mainline and if you look at the difference between alloc_pages_node()
and alloc_pages_exact_node(), it's almost silly.  :-(

-- 
tejun
--

From: Mel Gorman
Date: Tuesday, April 20, 2010 - 8:05 am

alloc_pages_exact_node() avoids a branch in a hot path that is checking for
something the caller already knows. That's the reason it exists.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Christoph Lameter
Date: Wednesday, April 21, 2010 - 7:15 am

We can avoid alloc_pages_exact_node() by making all callers of
alloc_pages_node() never use -1. -1 is ambiguous and only rarely will a
caller pass that to alloc_pages_node().


--

From: Minchan Kim
Date: Wednesday, April 21, 2010 - 10:06 am

On Wed, Apr 21, 2010 at 11:15 PM, Christoph Lameter

That's very reasonable to me.
Then, we can remove alloc_pages_exact_node and nid < 0 check in
alloc_pages_node at the same time.

Mel. Could you agree?

Firstly Tejun suggested this but I didn't got the point.
Sorry for bothering you.

Okay. I will dive into this approach.



-- 
Kind regards,
Minchan Kim
--

From: Tejun Heo
Date: Wednesday, April 21, 2010 - 3:48 am

Hello,


Yeah sure but Minchan is trying to tidy up the API by converting
alloc_pages_node() users to use alloc_pages_exact_node(), at which
point, the distinction becomes pretty useless.  Wouldn't just making
alloc_pages_node() do what alloc_pages_exact_node() does now and
converting all its users be simpler?  IIRC, the currently planned
transformation looks like the following.

 alloc_pages()			-> alloc_pages_any_node()
 alloc_pages_node()		-> basically gonna be obsoleted by _exact_node
 alloc_pages_exact_node()	-> gonna be used by most NUMA aware allocs

So, let's just make sure no one calls alloc_pages_node() w/ -1 nid,
kill alloc_pages_node() and rename alloc_pages_exact_node() to
alloc_pages_node().

Thanks.

-- 
tejun
--

From: Minchan Kim
Date: Thursday, April 22, 2010 - 3:15 am

Yes. It was a stupid idea. I hope Mel agree this suggestion.
Thanks for careful review, Tejun.


-- 
Kind regards,
Minchan Kim
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:25 am

alloc_slab_page never calls alloc_pages_node with -1.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/slub.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b364844..9984165 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
 	if (node == -1)
 		return alloc_pages(flags, order);
 	else
-		return alloc_pages_node(node, flags, order);
+		return alloc_pages_exact_node(node, flags, order);
 }
 
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
-- 
1.7.0.5

--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 8:52 am

Are you certain? What about

__kmalloc
  -> slab_alloc (passed -1 as a node from __kmalloc)
    -> __slab_alloc
      -> new_slab
        -> allocate_slab

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 9:01 am

Sorry for writing confusing changelog.

I means if node == -1 alloc_slab_page always calls alloc_pages.
So we don't need redundant check.

-- 
Kind regards,
Minchan Kim
--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 9:14 am

When the changelog is fixed up, feel free to add;

Reviewed-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: David Rientjes
Date: Tuesday, April 13, 2010 - 2:37 pm

Slub changes need to go through its maintainer, Pekka Enberg 
<penberg@cs.helsinki.fi>.
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 4:40 pm

Thanks, David. It was by mistake.

Pekka.

This changlog is bad.
I will change it with following as when I send v2.

"alloc_slab_page always checks nid == -1, so alloc_page_node can't be
called with -1.
 It means node's validity check in alloc_pages_node is unnecessary.
 So we can use alloc_pages_exact_node instead of alloc_pages_node.
 It could avoid comparison and branch as 6484eb3e2a81807722 tried."

Thanks.


-- 
Kind regards,
Minchan Kim
--

From: David Rientjes
Date: Tuesday, April 13, 2010 - 4:55 pm

Each patch in this series seems to be independent and can be applied 
seperately, so it's probably not necessary to make them incremental.
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 5:02 pm

Surely.
Without considering, I used git-formant-patch -n --thread.
I will keep it in mind.

Thanks, David.


-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, April 13, 2010 - 5:18 pm

On Wed, 14 Apr 2010 00:25:00 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Pekka Enberg
Date: Wednesday, April 14, 2010 - 5:23 am

On Wed, Apr 14, 2010 at 3:18 AM, KAMEZAWA Hiroyuki

Minchan, care to send a v2 with proper changelog and reviewed-by attributions?
--

From: Christoph Lameter
Date: Friday, April 16, 2010 - 9:10 am

Still wondering what the big deal about alloc_pages_node_exact is. Its not
exact since we can fall back to another node. It is better to clarify the
API for alloc_pages_node and forbid / clarify the use of -1.

--

From: Pekka Enberg
Date: Sunday, April 18, 2010 - 11:49 am

From: Mel Gorman
Date: Monday, April 19, 2010 - 2:05 am

There should be a comment clarifing it now. I admit the naming fault is
mine. At the time, the intended meaning was "allocate pages from any node in
the fallback list and the caller knows exactly which node to start from". I
did not take into account that the meaning of "exact" depends on context.

With a comment clarifying the meaning, I do not think a rename is necessary.
However, I'd rather not see a mass renaming of functions like alloc_pages()
that have existed a long times. If nothing else, they are documented in books
like "Linux Device Drivers" so why make life harder on device authors than
it already is?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:25 am

if node_state is N_HIGH_MEMORY, node doesn't have -1.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/sparse-vmemmap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 392b9bb..7710ebc 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -53,7 +53,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 		struct page *page;
 
 		if (node_state(node, N_HIGH_MEMORY))
-			page = alloc_pages_node(node,
+			page = alloc_pages_exact_node(node,
 				GFP_KERNEL | __GFP_ZERO, get_order(size));
 		else
 			page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
-- 
1.7.0.5

--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 8:59 am

Also, if node_state is called with -1, a negative index is being checked in
a bitmap and that would be pretty broken in itself. I can't see a problem
with this patch.


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, April 13, 2010 - 5:19 pm

On Wed, 14 Apr 2010 00:25:01 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:25 am

__vmalloc_area_node never pass -1 to alloc_pages_node.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/vmalloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..7abf423 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1499,7 +1499,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		if (node < 0)
 			page = alloc_page(gfp_mask);
 		else
-			page = alloc_pages_node(node, gfp_mask, 0);
+			page = alloc_pages_exact_node(node, gfp_mask, 0);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
-- 
1.7.0.5

--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 9:02 am

Seems fine, the check for node id being negative has already been made.


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, April 13, 2010 - 5:22 pm

On Wed, 14 Apr 2010 00:25:02 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

But, in another thinking,

-	if (node < 0)
-		page = alloc_page(gfp_mask);

may be better ;)

Thanks,

--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 5:33 pm

On Wed, Apr 14, 2010 at 9:22 AM, KAMEZAWA Hiroyuki

I thought it.
but alloc_page is different function with alloc_pages_node in NUMA.
It calls alloc_pages_current.


-- 
Kind regards,
Minchan Kim
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 8:25 am

alloc_pages_exact_node naming makes some people misleading.
They considered it following as.
"This function will allocate pages from node which I wanted
exactly".
But it can allocate pages from fallback list if page allocator
can't find free page from node user wanted.

So let's comment this NOTE.

Actually I wanted to change naming with better.
ex) alloc_pages_explict_node.
But I changed my mind since the comment would be enough.

If anybody suggests better name, I will do with pleasure.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/gfp.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b65f003..7539c17 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -288,6 +288,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
+/*
+ * NOTE : Allow page from fallback if page allocator can't find free page
+ * in your nid. Only if you want to allocate page from your nid, use
+ * __GFP_THISNODE flags with gfp_mask.
+ */
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-- 
1.7.0.5

--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 9:13 am

It's a little tough to read. How about

/*
 * Use this instead of alloc_pages_node when the caller knows
 * exactly which node they need (as opposed to passing in -1
 * for current). Fallback to other nodes will still occur
 * unless __GFP_THISNODE is specified.
 */



-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Minchan Kim
Date: Tuesday, April 13, 2010 - 9:20 am

I agree.
I will repost modified comment after Tejun comment [2/6].
Thanks for quick review, Mel. :)
-- 
Kind regards,
Minchan Kim
--

From: Mel Gorman
Date: Tuesday, April 13, 2010 - 8:32 am

Makes sense.


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, April 13, 2010 - 5:04 pm

On Wed, 14 Apr 2010 00:24:58 +0900

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com>



--

Previous thread: Re: [PATCH] drivers:staging: sources for ST core by Pavan Savoy on Tuesday, April 13, 2010 - 8:06 am. (2 messages)

Next thread: [PATCH]: x86: remove extra bootmem.h from arch/x86/mm/init_64.c by Prarit Bhargava on Tuesday, April 13, 2010 - 8:47 am. (2 messages)