Hi KAME,
Thank you for the fix for online/offline page_cgroup panic.
While running memory offline/online tests ran into another
mem_cgroup panic.Thanks,
BadariUnable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xc0000000001055e4
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c0000000001055e4 LR: c00000000010557c CTR: c0000000000bfb74
REGS: c0000000f6c7f1b0 TRAP: 0300 Tainted: G D (2.6.28-rc4)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44044422 XER: 20000018
DAR: 0000000000000020, DSISR: 0000000042000000
TASK = c0000000f6c56cc0[4610] 'crash' THREAD: c0000000f6c7c000 CPU: 0
GPR00: c0000000e910b560 c0000000f6c7f430 c000000000b36fc0 0000000000000001
GPR04: c000000005355278 0000000000000001 0000000000000000 0000000000000000
GPR08: c000000005355290 0000000000000018 c0000000e910b558 c0000000e910b548
GPR12: 0000000000000000 c000000000b58300 00000400001ca30a 0000000000000000
GPR16: 0000000000000000 0000000000000006 c0000000d43cb5c0 c0000000e66d0b88
GPR20: 0000000000000004 0000000000000000 c0000000e64c6180 0000000000000000
GPR24: 00000000000000d0 0000000000000005 c000000000bac418 0000000000000001
GPR28: c0000000e910b538 c000000005355278 c000000000aacad8 c0000000f6c7f430
NIP [c0000000001055e4] .mem_cgroup_charge_common+0x26c/0x330
LR [c00000000010557c] .mem_cgroup_charge_common+0x204/0x330
Call Trace:
[c0000000f6c7f430] [c00000000010557c] .mem_cgroup_charge_common+0x204/0x330 (unreliable)
[c0000000f6c7f4f0] [c000000000105c70] .mem_cgroup_cache_charge+0x130/0x154
[c0000000f6c7f590] [c0000000000c29bc] .add_to_page_cache_locked+0x64/0x18c
[c0000000f6c7f640] [c0000000000c2b64] .add_to_page_cache_lru+0x80/0xe4
[c0000000f6c7f6e0] [c000000000144348] .mpage_readpages+0xc8/0x170
[c0000000f6c7f810] [c000000000182e68] .reiserfs_readpages+0x50/0x78
[c0000000f6c7f8b0] [c0000000000cee80] .__do_page_cache_readahead+0x174/0x280
[c0000000f6c7f980] [c0000000000cf6e0] .do_page_...
Badari, I think you used SLUB. If so, page_cgroup's notifier callback was not
called and newly allocated page's page_cgroup wasn't allocated.
This is a fix. (notifier saw STOP_HERE flag added by slub's notifier.)I'm now testing modified kernel, which does alloc/free page_cgroup by notifier.
(Usually, all page_cgroups are from bootmem and not freed.
so, modified a bit for test)And I cannot reproduce panic. I think you do "real" memory hotplug other than
online/offline and saw panic caused by this.Is this slub's behavior intentional ? page_cgroup's notifier has lower priority
than slub, now.Thanks,
-Kame
==
notifier callback's notifier_from_errno() just works well in error
route. (It adds mask for "stop here")Hanlder should return NOTIFY_OK in explict way.
Signed-off-by:KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/page_cgroup.c | 5 ++++-
mm/slub.c | 6 ++++--
2 files changed, 8 insertions(+), 3 deletions(-)Index: mmotm-2.6.28-Nov10/mm/slub.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/mm/slub.c
+++ mmotm-2.6.28-Nov10/mm/slub.c
@@ -3220,8 +3220,10 @@ static int slab_memory_callback(struct n
case MEM_CANCEL_OFFLINE:
break;
}
-
- ret = notifier_from_errno(ret);
+ if (ret)
+ ret = notifier_from_errno(ret);
+ else
+ ret = NOTIFY_OK;
return ret;
}Index: mmotm-2.6.28-Nov10/mm/page_cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/mm/page_cgroup.c
+++ mmotm-2.6.28-Nov10/mm/page_cgroup.c
@@ -216,7 +216,10 @@ static int page_cgroup_callback(struct n
break;
}- ret = notifier_from_errno(ret);
+ if (ret)
+ ret = notifier_from_errno(ret);
+ else
+ ret = NOTIFY_OK;return ret;
}--
No. I wasn't using SLUB.
# egrep "SLUB|SLAB" .config
CONFIG_SLAB=y
# CONFIG_SLUB is not set
CONFIG_SLABINFO=y
# CONFIG_DEBUG_SLAB is not setI can test the patch and let you know.
Thanks,
--
On Mon, 10 Nov 2008 13:43:28 -0800
Hm, should I avoid freeing mem_cgroup at memory Offline ?
(memmap is also not free AFAIK.)Anyway, I'll dig this. thanks.
--
On Tue, 11 Nov 2008 10:14:40 +0900
it seems not the same kind of bug..Could you give me disassemble of mem_cgroup_charge_common() ?
(I'm not sure I can read ppc asm but I want to know what is "0x20"
of fault address....)As first impression, it comes from page migration..
rc4's page migration handler of memcg handles *usual* path but not so good.new migration code of memcg in mmotm is much better, I think.
Could you try mmotm if you have time ?Thanks,
--
I tried mmtom. Its even worse :(
Ran into following quickly .. Sorry!!
Thanks,
BadariOops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000109e50 LR: c000000000109de8 CTR: c0000000000c2414
REGS: c0000000e66d72d0 TRAP: 0300 Not tainted (2.6.28-rc4-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44008424 XER: 20000010
DAR: 0000000000000008, DSISR: 0000000042000000
TASK = c0000000e7dfa050[4835] 'drmgr' THREAD: c0000000e66d4000 CPU: 0
GPR00: 0000000000000020 c0000000e66d7550 c000000000b472d8 c0000000e910d958
GPR04: c00000000010a730 c0000000000b96a8 c0000000e66d7660 0000000000000000
GPR08: c000000005520440 0000000000000000 c0000000e910d960 c0000000e910d948
GPR12: 0000000024000422 c000000000b68300 00000000200957bc 0000000000000000
GPR16: 0000000000000000 c0000000e66d78f8 0000000000000000 c000000000aeea70
GPR20: 000000000000006d 0000000000000000 c000000003709e78 00000000000e6000
GPR24: 0000000000000000 0000000000000000 0000000000000001 c0000000e910d938
GPR28: c000000005520428 0000000000000001 c000000000abc220 c0000000e66d7550
NIP [c000000000109e50] .__mem_cgroup_add_list+0x98/0xec
LR [c000000000109de8] .__mem_cgroup_add_list+0x30/0xec
Call Trace:
[c0000000e66d7550] [c000000000109de8] .__mem_cgroup_add_list+0x30/0xec (unreliable)
[c0000000e66d75f0] [c00000000010a730] .__mem_cgroup_commit_charge+0x108/0x154
[c0000000e66d7690] [c00000000010adf8] .mem_cgroup_end_migration+0xb4/0x130
[c0000000e66d7730] [c000000000108c84] .migrate_pages+0x460/0x62c
[c0000000e66d7880] [c000000000106760] .offline_pages+0x398/0x5ac
[c0000000e66d7990] [c0000000001069b8] .remove_memory+0x44/0x60
[c0000000e66d7a20] [c00000000040758c] .memory_block_change_state+0x198/0x230
[c0000000e66d7ad0] [c000000000407cac] .store_mem_state+0xcc/0x144
[c0000000e66d7b70] [c0000000003fa8b0] .sysdev_store+0x74/0xa4
[c0000000e66d7c10] [c00000000017b084] .sysfs_write_file+0x128/0x1a4
[c0000000e66d7cd0] [c00000000010fb7c] .vfs_write+0xf0/0x1c4
[c0000000e66...
On Wed, 12 Nov 2008 14:02:56 -0800
the reason doesn't seem to be different from the one you saw in rc4.
We'do add_list() hear, so (maybe) used page_cgroup is zero-cleared, I think.
We usually do migration test on cpuset and confirmed this works with migration.Hmm...I susupect following. could you try ?
Sorry.
-Kame
==Index: mmotm-2.6.28-Nov10/mm/page_cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/mm/page_cgroup.c
+++ mmotm-2.6.28-Nov10/mm/page_cgroup.c
@@ -166,7 +166,7 @@ int online_page_cgroup(unsigned long sta
end = ALIGN(start_pfn + nr_pages, PAGES_PER_SECTION);for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
- if (!pfn_present(pfn))
+ if (!pfn_valid(pfn))
continue;
fail = init_section_page_cgroup(pfn);
}--
I tried mmtom + startpfn fix + this fix + notifier fix. Didn't help.
I am not using SLUB (using SLAB). Yes. I am testing "real" memory
remove (not just offline/online), since it executes more code of
freeing memmap etc.Code that is panicing is list_add() in mem_cgroup_add_list().
I will debug it further.Thanks,
BadariOops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000109e50 LR: c000000000109de8 CTR: c0000000000c2414
REGS: c0000000e653f2d0 TRAP: 0300 Not tainted (2.6.28-rc4-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44008484 XER: 20000018
DAR: 0000000000000007, DSISR: 0000000042000000
TASK = c0000000e7db9950[4927] 'drmgr' THREAD: c0000000e653c000 CPU: 0
GPR00: 0000000000000020 c0000000e653f550 c000000000b472d8 c0000000e910d558
GPR04: c00000000010a730 c0000000000b96a8 c0000000e653f660 0000000000000000
GPR08: c000000005432358 ffffffffffffffff c0000000e910d560 c0000000e910d548
GPR12: 0000000024000482 c000000000b68300 00000000200957bc 0000000000000000
GPR16: 0000000000000000 c0000000e653f8f8 0000000000000000 c000000000aeea70
GPR20: 0000000000000056 0000000000000000 c00000000370c300 00000000000e6000
GPR24: 0000000000000000 0000000000000000 0000000000000001 c0000000e910d538
GPR28: c000000005432340 0000000000000001 c000000000abc220 c0000000e653f550
NIP [c000000000109e50] .__mem_cgroup_add_list+0x98/0xec
LR [c000000000109de8] .__mem_cgroup_add_list+0x30/0xec
Call Trace:
[c0000000e653f550] [c000000000109de8] .__mem_cgroup_add_list+0x30/0xec (unreliable)
[c0000000e653f5f0] [c00000000010a730] .__mem_cgroup_commit_charge+0x108/0x154
[c0000000e653f690] [c00000000010adf8] .mem_cgroup_end_migration+0xb4/0x130
[c0000000e653f730] [c000000000108c84] .migrate_pages+0x460/0x62c
[c0000000e653f880] [c000000000106760] .offline_pages+0x398/0x5ac
[c0000000e653f990] [c0000000001069b8] .remove_memory+0x44/0x60
[c0000000e653fa20] [c000000000407590] .memory_block_change_state+0x198/0x230
[c0000000e653fad0] [c000000000407c...
On Thu, 13 Nov 2008 10:53:24 -0800
Considering difference between "real" memory hotplug and logical ones,
I found this. I hope this fixes the bug.
But I myself can't do test this..Thanks,
-Kame==
Fixes for memcg/memory hotplug.While memory hotplug allocate/free memmap, page_cgroup doesn't free
page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
(Because freeing bootmem requires special care.)Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
by memory hotplug, page_cgroup->page == page is no longer true and
we have to update that.But current MEM_ONLINE handler doesn't check it and update page_cgroup->page
if it's not necessary to allocate page_cgroup.And I noticed that MEM_ONLINE can be called against "part of section".
So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.
(freeing used page_cgroup)
Don't rollback at CANCEL.One more, current memory hotplug notifier is stopped by slub
because it sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback
never be called. (low priority than slub now.)I think this slub's behavior is not intentional(BUG). and fixes it.
Another way to be considered about page_cgroup allocation:
- free page_cgroup at OFFLINE even if it's from bootmem
and remove specieal handler. But it requires more changes.Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/page_cgroup.c | 39 +++++++++++++++++++++++++++------------
mm/slub.c | 6 ++++--
2 files changed, 31 insertions(+), 14 deletions(-)Index: mmotm-2.6.28-Nov10/mm/page_cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/mm/page_cgroup.c
+++ mmotm-2.6.28-Nov10/mm/page_cgroup.c
@@ -104,18 +104,30 @@ int __meminit init_section_page_cgroup(u
unsigned long table_size;
int nid, index;- if (section->page_cgroup)
- return 0;
+ if (!section->page_cgroup) {- nid = page_to_nid(pfn_to_page...
Kame,
With this patch I am able to run tests without any issues.
Sorry for delayed response, I wanted to make sure test runs fine over
the weekend.Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Thanks,
--
On Mon, 17 Nov 2008 13:30:08 -0800
Wow, Thank you!--
On Wed, 12 Nov 2008 14:02:56 -0800
Hmm...mmotm + start_pfn fix ?disassemble is also helpful.
Thanks,
--
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 008/196] Chinese: add translation of volatile-considered-harmful.txt |
| Kyle Moffett | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching |
git: | |
| Gerrit Renker | [PATCH 28/37] dccp: Integration of dynamic feature activation - part 3 (client side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [GIT]: Networking |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
