hello,
Resend these two patches for bug fixing:
The bug is that when memory hotplug-adding happens for a large enough area that a new PGD entry is
needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some
CPUs oopsing when they have to access the unmapped areas, e.g. onlining CPUs on the new added node.
Thanks!
---
From 04d9fc860db40f15ad629f8b341ff84b83a00a8d Mon Sep 17 00:00:00 2001
From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Wed, 19 May 2010 17:42:14 +0800
Subject: [PATCH] x86, mem-hotplug: separate x86_64 vmalloc_sync_all() into separate functions
No behavior change here.
Move some of vmalloc_sync_all() code into a new function
sync_global_pgds() that will be useful for memory hotplug.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
CC: Andi Kleen <ak@linux.intel.com>
---
arch/x86/include/asm/pgtable_64.h | 2 ++
arch/x86/mm/fault.c | 24 +-----------------------
arch/x86/mm/init_64.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 181be52..317026d 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -102,6 +102,8 @@ static inline void native_pgd_clear(pgd_t *pgd)
native_set_pgd(pgd, native_make_pgd(0));
}
+extern void sync_global_pgds(unsigned long start, unsigned long end);
+
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f627779..7ae0897 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -326,29 +326,7 @@ out:
void vmalloc_sync_all(void)
{
- unsigned long address;
-
- for (address = VMALLOC_START & PGDIR_MASK; address <= VMALLOC_END;
- address ...x86, mem-hotplug: update all PGDs for direct mapping and vmemmap mapping changes on 64bit. When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e ...
Commit-ID: f4a9f26dd0041423da94bb22dfed742540915487 Gitweb: http://git.kernel.org/tip/f4a9f26dd0041423da94bb22dfed742540915487 Author: Haicheng Li <haicheng.li@linux.intel.com> AuthorDate: Fri, 20 Aug 2010 17:50:16 +0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 25 Aug 2010 14:01:03 -0700 x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: ...
Commit-ID: 9b861528a8012e7bc4d1f7bae07395b225331477 Gitweb: http://git.kernel.org/tip/9b861528a8012e7bc4d1f7bae07395b225331477 Author: Haicheng Li <haicheng.li@linux.intel.com> AuthorDate: Fri, 20 Aug 2010 17:50:16 +0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Thu, 26 Aug 2010 14:02:33 -0700 x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: ...
The patches look good to me. Can we please move forward with this? Reviewed-by: Andi Kleen <ak@linux.intel.com> -Andi -- ak@linux.intel.com -- Speaking for myself only. --
The patches are mangled so they don't apply even with patch -l -- Haicheng, could you send me another copy, as an attachment if necessary? -hpa --
Never mind, I fixed them up by hand. -hpa --
Commit-ID: 7b61b1ccb1a84c3391360ac94c908bfbbb35299c Gitweb: http://git.kernel.org/tip/7b61b1ccb1a84c3391360ac94c908bfbbb35299c Author: Haicheng Li <haicheng.li@linux.intel.com> AuthorDate: Wed, 19 May 2010 17:42:14 +0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 25 Aug 2010 13:47:14 -0700 x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --
Commit-ID: 6afb5157b9eba4092e2f0f54d24a3806409bdde5 Gitweb: http://git.kernel.org/tip/6afb5157b9eba4092e2f0f54d24a3806409bdde5 Author: Haicheng Li <haicheng.li@linux.intel.com> AuthorDate: Wed, 19 May 2010 17:42:14 +0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Thu, 26 Aug 2010 14:02:29 -0700 x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/include/asm/pgtable_64.h | 2 ++ arch/x86/mm/fault.c | 24 +----------------------- arch/x86/mm/init_64.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 076052c..f96ac9b 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -102,6 +102,8 @@ static inline void native_pgd_clear(pgd_t *pgd) native_set_pgd(pgd, native_make_pgd(0)); } +extern void sync_global_pgds(unsigned long start, unsigned long end); + /* * Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 4c4508e..51f7ee7 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -326,29 +326,7 @@ out: void vmalloc_sync_all(void) { - unsigned long address; - - for (address = VMALLOC_START & PGDIR_MASK; address <= VMALLOC_END; - address += PGDIR_SIZE) { - - const pgd_t *pgd_ref = pgd_offset_k(address); - unsigned long flags; - struct page *page; - - if ...
