Hello.
This patch set is to rearrange event notifier for memory hotplug,
because the old notifier has some defects. For example, there is no
information like new memory's pfn and # of pages for callback functions.Fortunately, nothing uses this notifier so far, there is no impact by
this change. (SLUB will use this after this patch set to make
kmem_cache_node structure).In addition, descriptions of notifer is added to memory hotplug
document.This patch was a part of patch set to make kmem_cache_node of SLUB
to avoid panic of memory online. But, I think this change becomes
not only for SLUB but also for others. So, I extracted this from it.This patch set is for 2.6.23-rc8-mm2.
I tested this patch on my ia64 box.Please apply.
Bye.
--
Yasunori Goto-
This patch set is to fix panic due to access NULL pointer of SLUB.
When new memory is hot-added on the new node (or memory less node),
kmem_cache_node for the new node is not prepared,
and panic occurs by it. So, kmem_cache_node should be created for the node
before new memory is available on the node.
Incidentally, it is freed on memory offline if it becomes not necessary.This is the first user of the callback of memory notifier, and
requires its rearrange patch set.This patch set is for 2.6.23-rc8-mm2.
I tested this patch on my ia64 box.Please apply.
Bye.
--
Yasunori Goto-
This is to make kmem_cache_nodes of all SLUBs for new node when
memory-hotadd is called. This fixes panic due to access NULL pointer at
discard_slab() after memory hot-add.If pages on the new node available, slub can use it before making
new kmem_cache_nodes. So, this callback should be called
BEFORE pages on the node are available.When memory online is called, slab_mem_going_online_callback() is
called to make kmem_cache_node(). if it (or other callbacks) fails,
then slab_mem_offline_callback() is called for rollback.In memory offline, slab_mem_going_offline_callback() is called to
shrink cache, then slab_mem_offline_callback() is called later.Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
mm/slub.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 117 insertions(+)Index: current/mm/slub.c
===================================================================
--- current.orig/mm/slub.c 2007-10-11 20:31:37.000000000 +0900
+++ current/mm/slub.c 2007-10-11 21:58:10.000000000 +0900
@@ -20,6 +20,7 @@
#include <linux/mempolicy.h>
#include <linux/ctype.h>
#include <linux/kallsyms.h>
+#include <linux/memory.h>/*
* Lock order:
@@ -2711,6 +2712,120 @@ int kmem_cache_shrink(struct kmem_cache
}
EXPORT_SYMBOL(kmem_cache_shrink);+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+static int slab_mem_going_offline_callback(void *arg)
+{
+ struct kmem_cache *s;
+ struct memory_notify *marg = arg;
+ int local_node, offline_node = marg->status_change_nid;
+
+ if (offline_node < 0)
+ /* node has memory yet. nothing to do. */
+ return 0;
+
+ down_read(&slub_lock);
+ list_for_each_entry(s, &slab_caches, list) {
+ local_node = page_to_nid(virt_to_page(s));
+ if (local_node == offline_node)
+ /* This slub is on the offline node. */
+ return -EBUSY;
+ }
+ up_read(&slub_lock);
+
+ kmem_cache_shrink_node(s, offline_node);
+
+ return ...
On Fri, 12 Oct 2007, Yasunori Goto wrote:
If its called before pages on the node are available then it must=20
Please clarify the comment. This seems to indicate that we should not
do anything because the node still has memory?So this checks if the any kmem_cache structure is on the offlined node? If
kmem_cache_shrink(s) would be okay here I would think. The function is
We call this after we have established that no kmem_cache structures are=20
on this and after we have shrunk the slabs. Is there any guarantee thatIt may be clearer to say:
"If nr_slabs > 0 then slabs still exist on the node that is going down.
"We are bringing a node online. No memory is available yet. We must=20
"kmem_cache_alloc node will fallback to other nodes since memory is=20
Hmm. My description may be wrong. I would like to just
mention that kmem_cache_node should be created before the node's pageYes. kmem_cache_node is still necessary for remaining memory on the
If node doesn't have memory and offline_pages() called for it,
it must be check and fail. This callback shouldn't be called.Right. If slabs' migration is possible, here would be good place for
If slabs still exist, it can't be migrated and offline_pages() has
to give up offline. This means MEM_OFFLINE event is not generated when
slabs are on the removing node.
In other word, when this event is generated, all of pages onAgain. If nr_slabs > 0, offline_pages must be fail due to slabs
kmem_cache_node is created at boot time if the node has memory.
(Or, it is created by this callback on first added memory on the node).When nid = - 1, kmem_cache_node is created before this node due to
Your mention might be ok.
But. I would like to prefer to define status of node hotplug for
exactitude like followingsA)Node online -- pgdat is created and can be accessed for this node.
but there are no gurantee that cpu or memory is onlined.
This status is very close from memory-less node.
But this might be halfway status for node hotplug.
Node online bit is set. But N_HIGH_MEMORY
(or N_NORMAL_MEMORY) might be not set.B)Node has memory--
one or more sections memory is onlined on the node.
N_HIGH_MEMORY (or N_NORMAL_MEMORY) is set.If first memory is onlined on the node, the node status changes
from A) to B).I feel this is very useful to manage "halfway status" of node
hotplug. (So, memory-less node patch is very helpful for me.)So, I would like to avoid using the word "node online" at here.
Yes.
Thanks for your comment.
--
Yasunori Goto-
I think you can avoid this check. The kmem_cache structures are allocated
from the kmalloc array. The check if the kmalloc slabs are empty will failOk can we talk about this as
node online
and
node memory available?
-
Yes. Thanks.
--
Yasunori Goto-
Make kmem_cache_shrink_node() for callback routine of memory hotplug
notifier. This is just extract a part of kmem_cache_shrink().Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
mm/slub.c | 111 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 61 insertions(+), 50 deletions(-)Index: current/mm/slub.c
===================================================================
--- current.orig/mm/slub.c 2007-10-11 20:30:45.000000000 +0900
+++ current/mm/slub.c 2007-10-11 21:58:47.000000000 +0900
@@ -2626,6 +2626,56 @@ void kfree(const void *x)
}
EXPORT_SYMBOL(kfree);+static inline void __kmem_cache_shrink_node(struct kmem_cache *s, int node,
+ struct list_head *slabs_by_inuse)
+{
+ struct kmem_cache_node *n;
+ int i;
+ struct page *page;
+ struct page *t;
+ unsigned long flags;
+
+ n = get_node(s, node);
+
+ if (!n->nr_partial)
+ return;
+
+ for (i = 0; i < s->objects; i++)
+ INIT_LIST_HEAD(slabs_by_inuse + i);
+
+ spin_lock_irqsave(&n->list_lock, flags);
+
+ /*
+ * Build lists indexed by the items in use in each slab.
+ *
+ * Note that concurrent frees may occur while we hold the
+ * list_lock. page->inuse here is the upper limit.
+ */
+ list_for_each_entry_safe(page, t, &n->partial, lru) {
+ if (!page->inuse && slab_trylock(page)) {
+ /*
+ * Must hold slab lock here because slab_free
+ * may have freed the last object and be
+ * waiting to release the slab.
+ */
+ list_del(&page->lru);
+ n->nr_partial--;
+ slab_unlock(page);
+ discard_slab(s, page);
+ } else
+ list_move(&page->lru, slabs_by_inuse + page->inuse);
+ }
+
+ /*
+ * Rebuild the partial list with the slabs filled up most
+ * first and the least used slabs at the end.
+ */
+ for (i = s->objects - 1; i >= 0; i--)
+ list_splice(slabs_by_inuse + i, n->partial.prev);
+
+ spin_unlock_irqrestore(&n->list_lock, flags);
+}
+
/*
...
Could we just call kmem_cache_shrink? It will do the shrink on every node
but memory hotplug is rare?
-
Yes it is. Memory hotplug is rare.
Ok. I'll do it.Thanks.
--
Yasunori Goto-
Current memory notifier has some defects yet. (Fortunately, nothing uses it.)
This patch is to fix and rearrange for them.- Add information of start_pfn, nr_pages, and node id if node status is
changes from/to memoryless node for callback functions.
Callbacks can't do anything without those information.
- Add notification going-online status.
It is necessary for creating per node structure before the node's
pages are available.
- Move GOING_OFFLINE status notification after page isolation.
It is good place for return memory like cache for callback,
because returned page is not used again.
- Make CANCEL events for rollingback when error occurs.
- Delete MEM_MAPPING_INVALID notification. It will be not used.
- Fix compile error of (un)register_memory_notifier().Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
drivers/base/memory.c | 9 +--------
include/linux/memory.h | 27 +++++++++++++++------------
mm/memory_hotplug.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 61 insertions(+), 23 deletions(-)Index: current/drivers/base/memory.c
===================================================================
--- current.orig/drivers/base/memory.c 2007-10-11 14:33:02.000000000 +0900
+++ current/drivers/base/memory.c 2007-10-11 14:33:07.000000000 +0900
@@ -137,7 +137,7 @@ static ssize_t show_mem_state(struct sys
return len;
}-static inline int memory_notify(unsigned long val, void *v)
+int memory_notify(unsigned long val, void *v)
{
return blocking_notifier_call_chain(&memory_chain, val, v);
}
@@ -183,7 +183,6 @@ memory_block_action(struct memory_block
break;
case MEM_OFFLINE:
mem->state = MEM_GOING_OFFLINE;
- memory_notify(MEM_GOING_OFFLINE, NULL);
start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
ret = remove_memory(start_paddr,
PAGES_PER_SECTION << PAGE_SHIFT);
@@ -191,7 +190,6 @@ memory_block_action(struct memory_...
Add description about event notification callback routine to the document.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
Documentation/memory-hotplug.txt | 56 ++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 3 deletions(-)Index: current/Documentation/memory-hotplug.txt
===================================================================
--- current.orig/Documentation/memory-hotplug.txt
+++ current/Documentation/memory-hotplug.txt
@@ -2,7 +2,8 @@
Memory Hotplug
==============-Last Updated: Jul 28 2007
+Created: Jul 28 2007
+Add description of notifier of memory hotplug Oct 11 2007This document is about memory hotplug including how-to-use and current status.
Because Memory Hotplug is still under development, contents of this text will
@@ -24,7 +25,8 @@ be changed often.
6.1 Memory offline and ZONE_MOVABLE
6.2. How to offline memory
7. Physical memory remove
-8. Future Work List
+8. Memory hotplug event notifier
+9. Future Work ListNote(1): x86_64's has special implementation for memory hotplug.
This text does not describe it.
@@ -307,8 +309,68 @@ Need more implementation yet....
- Notification completion of remove works by OS to firmware.
- Guard from remove if not yet.+--------------------------------
+8. Memory hotplug event notifier
+--------------------------------
+Memory hotplug has event notifer. There are 6 types of notification.
+
+MEMORY_GOING_ONLINE
+ This is notified before memory online. If some structures must be prepared
+ for new memory, it should be done at this event's callback.
+ The new onlining memory can't be used yet.
+
+MEMORY_CANCEL_ONLINE
+ If memory online fails, this event is notified for rollback of setting at
+ MEMORY_GOING_ONLINE.
+ (Currently, this event is notified only the case which a callback routine
+ of MEMORY_GOING_ONLINE fails).
+
+MEMORY_ONLINE
+ This event is called when memory online is completed. The page allocator us...
Looks good. Some suggestions on improving the wording.
Generated before new memory becomes available in order to be able to
prepare subsystems to handle memory. The page allocator is still unableGenerated when memory has succesfully brought online. The callback may
Generated to begin the process of offlining memory. Allocations are no
longer possible from the memory but some of the memory to be offlined
is still in use. The callback can be used to free memory known to aGenerated if MEMORY_GOING_OFFLINE fails. Memory is available again from
Generated after offlining memory is complete.
-
> Looks good. Some suggestions on improving the wording.
Thanks! I'll fix them.
--
Yasunori Goto-
| Benjamin Herrenschmidt | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Mariusz Kozlowski | [PATCH 03] drivers/sbus/char/bbc_envctrl.c: kmalloc + memset conversion to kzalloc |
| Yinghai Lu | [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 |
git: | |
| Gerrit Renker | [PATCH 13/37] dccp: Deprecate Ack Ratio sysctl |
| James Morris | Re: [GIT]: Networking |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
