On my T-60 laptop, i686 system with 2.6.37-rc4 kernel, "echo c > /proc/sysrq-trigger" just hung the system. Kdump works on 2.6.36. Is this known issue? If not, what info I should provide to solve it (I think the easiest way to solve the problem would be bisect) ? Stanislaw --
Here is the photo http://people.redhat.com/sgruszka/20101203_005.jpg There are two BUGs, first "sleeping function called from invalid context" and then "unable to handle null pointer dereference". Stanislaw --
The warning about sleeping is an artifact of the fact that we panic the box with irqs disabled I think (although I would think the fault handler would have re-enabled them properly). Not sure what the NULL pointer is from --
NULL pointer dereferece is ok, that's the way sysrq_handle_crash
trigger a crash. Problem here is that secondary kdump kernel hung at
start.
Bisection shows that bad commit is
commit 72d7c3b33c980843e756681fb4867dc1efd62a76
Author: Yinghai Lu <yinghai@kernel.org>
Date: Wed Aug 25 13:39:17 2010 -0700
x86: Use memblock to replace early_res
Before commit kdump work. After it kernel doesn't compile (!?!). I fixed
compilation, but sill crash kernel can not be even loaded, I fixed that
using hunks from 9f4c13964b58608fbce05540743281ea3146c0e8 "x86, memblock:
Fix crashkernel allocation". After that crash kernel can be loaded, but
it hung at start, what is the problem that still happen in -rc4.
I'm attaching config, hope this is enough to fix.
Stanislaw
please check debug patches, and boot first kernel and kexec second kernel with "ignore_loglevel debug earlyprintk...." Thanks Yinghai
Second kernel does not print anything, so maybe it not even start. Dmesg from primary kernel attached. Stanislaw
please try attached debug patch. Thanks Yinghai
With debug patch kdump kernel boot. Dmesg's from kdump and primary kernel in attachment. Stanislaw
Yes, with patch kdump works. Thanks Stanislaw --
peter, vivek,
it seems 32bit kdump need crashkernel much low than we expect...
Maybe we have to find_in_range_low() to make 32bit kdump happy.
Thanks
Yinghai
Subject: [PATCH] x86, memblock: Add memblock_x86_find_in_range_low()
Generic version is going from high to low, and it seems it can not find
right area compact enough.
the x86 version will go from goal to limit and just like the way We used
for early_res
to make crashkernel happy with 32bit kdump
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/include/asm/memblock.h | 2 +
arch/x86/kernel/setup.c | 2 -
arch/x86/mm/memblock.c | 52 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 55 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -346,3 +346,55 @@ u64 __init memblock_x86_hole_size(u64 st
return end - start - ((u64)ram << PAGE_SHIFT);
}
+
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+ u64 addr = *addrp;
+ bool changed = false;
+ struct memblock_region *r;
+again:
+ for_each_memblock(reserved, r) {
+ if ((addr + size) > r->base && addr < (r->base + r->size)) {
+ addr = round_up(r->base + r->size, align);
+ changed = true;
+ goto again;
+ }
+ }
+
+ if (changed)
+ *addrp = addr;
+
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range from bottom up
+ */
+u64 __init memblock_x86_find_in_range_low(u64 start, u64 end, u64 size, u64 align)
+{
+ struct memblock_region *r;
+
+ for_each_memblock(memory, r) {
+ u64 ei_start = r->base;
+ u64 ei_last = ei_start + r->size;
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ continue;
+ while ...Not this garbage again... sigh. Once again, I will want to know what the actual constraint is... not just "oh, this seems to work on this one system." I realize that the kdump interfaces are probably beyond saving -- we have had this discussion enough times -- but I'm not happy about it and I will really want to know what the heck the real issue is. Furthermore, such a function should NOT be private to x86 core; if it's needed at all it should live in the memblock core. -hpa --
Same here Yinghai. We need to debug that what is that upper limit for loading x86 32bit kernel and if we know/understand that, we can fail the loading of kdump kernel citing the appropriate reason. Last time our understanding was that as long as we allocate memory below 896MB things should be fine. Stanislaw, how much memory you are reserving at what address with -rc4 kernel? Can you please look at /proc/iomem? And try to reserve same amount of memory at roughly same address at 2.6.36 kernel, and see if kdump works. So how I used to debug problems in kdump path. - Try earlyprintk for second kernel. - Try --debug, --console-serial options with kexec while loading second kernel. Important thing to know here is control reached to purgatory or not. - If that gives me nothing then it boils down to putting some outb() statements in first kernel and second kernel boot path to know where things went wrong. Because the issue was resolved by reserving memory in low memory area, it sounds like second kernel failed to boot early. So early printk might help otherwise outb() and serial console is the friend. Thanks Vivek --
I could debug this problem, but I do not suffer from free time right now :-) Would be better someone bootmem/kdump experienced debug this. I just check other laptop (T500, 2.6.37-rc5, x86_64, RHEL6 user space, crashkernel=256M, 1.6G mem), kdump does not work there too. So I do think problem is hard to reproduce. Stanislaw --
ok, will try to find some old machine with less memory and devices to duplicate the problem. Yinghai --
please check
[PATCH] x86, crashkernel, 32bit: only try to get range under 512M
Steanishlaw report kdump is 32bit is broken.
in misc.c for decompresser, it will do sanity checking to make sure heap
heap under 512M.
So limit it in first kernel under 512M for 32bit system.
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -499,7 +499,19 @@ static inline unsigned long long get_tot
return total << PAGE_SHIFT;
}
+/*
+ * arch/x86/boot/compressed/misc.c will check heap size for decompresser
+ * 32bit will have more strict limitation
+ */
#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF
+#define HEAP_LIMIT_32BIT 0x20000000
+
+#ifdef CONFIG_X86_64
+#define CRASH_KERNEL_LIMIT DEFAULT_BZIMAGE_ADDR_MAX
+#else
+#define CRASH_KERNEL_LIMIT HEAP_LIMIT_32BIT
+#endif
+
static void __init reserve_crashkernel(void)
{
unsigned long long total_mem;
@@ -521,7 +533,7 @@ static void __init reserve_crashkernel(v
* kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
*/
crash_base = memblock_find_in_range(alignment,
- DEFAULT_BZIMAGE_ADDR_MAX, crash_size, alignment);
+ CRASH_KERNEL_LIMIT, crash_size, alignment);
if (crash_base == MEMBLOCK_ERROR) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
--
Patch fix problem on my T-60 laptop. As expected patch does not help on my other T-500 x86_64 system, kdump not work there, but perhaps this is a different problem, I'm going to check it. Stanislaw --
I think limiting kdump below 512 MiB on 32 bits may make sense; perhaps even on 64 bits. It's pretty conservative, after all... Opinions? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
Actually it will be good to know why 512MB. I know in the past we have been talking of reserving memory in higher memory regions and Neil Horman had been trying to boot bzImage in 64 bit mode so that it can be run from higher addresses. So right now limiting it is easy but it is desirable to be able to run bzImage from as high a address as possible and knowing why to limit it to 512MB can help see if there is a way to get rid of that limitation. I probably would not worry about 32bit systems but for 64 bit, I cerntainly want to make it boot from higher addresses (if it is possible technically). Thanks Vivek --
It's worth noting that there is almost always going to be a need for *some* low memory. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
Can you try crashkernel=256M@128M on your T-500 x86_64 system? Thanks Yinghai --
Thanks Yinghai. I am wondering why on 32bit heap has to be with-in 512MB.
I think you are referring to following check in
arch/x86/boot/compressed/misc.c.
if (end > ((-__PAGE_OFFSET-(512 <<20)-1) & 0x7fffffff))
error("Destination address too large");
It was introduced here.
commit 968de4f02621db35b8ae5239c8cfc6664fb872d8
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Thu Dec 7 02:14:04 2006 +0100
[PATCH] i386: Relocatable kernel support
Eric,
It has been long. By any chance would you remember where does above
constraint come from?
Thanks
--
It might, in fact, be bogus; specifically a proxy for the fact that we need the kernel memory including bss and brk below the lowmem boundary, which isn't well-defined. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
the brk is complaining if i change that to
if (end > ((-__PAGE_OFFSET-(128 <<20)-1) & 0x7fffffff))
error("Destination address too large");
brk is complaining when try to get more for dmi ...
...
I'm in purgatory
bootconsole [uart0] enabled
Kernel Layout:
.text: [0x2e000000-0x2e3f08ca]
.rodata: [0x2e3f2000-0x2e5a2fff]
.data: [0x2e5a3000-0x2e5f6467]
.init: [0x2e5f7000-0x2e670fff]
.bss: [0x2e675000-0x2e76ffff]
.brk: [0x2e770000-0x2e894fff]
memblock_x86_reserve_range: [0x00001000-0x00001fff] EX TRAMPOLINE
memblock_x86_reserve_range: [0x2e000000-0x2e76ffff] TEXT DATA BSS
memblock_x86_reserve_range: [0x35bdd000-0x35f49fff] RAMDISK
memblock_x86_reserve_range: [0x0009c800-0x000fffff] * BIOS reserved
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.37-rc5-tip+ (root@mpk12-3214-189-181) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #4 SMP Wed Dec 15 11:04:32 PST 2010
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
BIOS-provided physical RAM map:
BIOS-e820: [0x00000000000100-0x0000000009c7ff] (usable)
BIOS-e820: [0x0000000009c800-0x0000000009ffff] (reserved)
BIOS-e820: [0x000000000e0000-0x000000000fffff] (reserved)
BIOS-e820: [0x00000000100000-0x0000007ff9ffff] (usable)
BIOS-e820: [0x0000007ffae000-0x0000007ffaffff] (usable)
BIOS-e820: [0x0000007ffb0000-0x0000007ffbdfff] (ACPI data)
BIOS-e820: [0x0000007ffbe000-0x0000007ffeffff] (ACPI NVS)
BIOS-e820: [0x0000007fff0000-0x0000007fffffff] (reserved)
BIOS-e820: [0x000000e0000000-0x000000efffffff] (reserved)
BIOS-e820: [0x000000fec00000-0x000000fec00fff] (reserved)
BIOS-e820: [0x000000fee00000-0x000000feefffff] (reserved)
BIOS-e820: [0x000000ff700000-0x000000ffffffff] (reserved)
last_pfn = 0x7ffb0 max_arch_pfn = 0x1000000
NX (Execute Disable) protection: active
user-defined ...I'm assuming it bails due to: BUG_ON((char *)(_brk_end + size) > __brk_limit); ... could you find out what _brk_end and __brk_limit are? -hpa --
void __init print_kernel_layout(void)
{
printk("Kernel Layout:\n");
printk(" .text: [%#010lx-%#010lx]\n", __pa_symbol(&_text), __pa_symbol(&_etext) - 1);
printk(".rodata: [%#010lx-%#010lx]\n", __pa_symbol(&__start_rodata), __pa_symbol(&__end_rodata) - 1);
printk(" .data: [%#010lx-%#010lx]\n", __pa_symbol(&_sdata), __pa_symbol(&_edata) - 1);
printk(" .init: [%#010lx-%#010lx]\n", __pa_symbol(&__init_begin), __pa_symbol(&__init_end) - 1);
printk(" .bss: [%#010lx-%#010lx]\n", __pa_symbol(&__bss_start), __pa_symbol(&__bss_stop) - 1);
printk(" .brk: [%#010lx-%#010lx]\n", __pa_symbol(&__brk_base), __pa_symbol(&__brk_limit) - 1);
so __brk_limit should be right?
--
void __init print_kernel_layout(void)
{
printk("Kernel Layout:\n");
printk(" .text: [%#010lx-%#010lx]\n", __pa_symbol(&_text), __pa_symbol(&_etext) - 1);
printk(".rodata: [%#010lx-%#010lx]\n", __pa_symbol(&__start_rodata), __pa_symbol(&__end_rodata) - 1);
printk(" .data: [%#010lx-%#010lx]\n", __pa_symbol(&_sdata), __pa_symbol(&_edata) - 1);
printk(" .init: [%#010lx-%#010lx]\n", __pa_symbol(&__init_begin), __pa_symbol(&__init_end) - 1);
printk(" .bss: [%#010lx-%#010lx]\n", __pa_symbol(&__bss_start), __pa_symbol(&__bss_stop) - 1);
printk(" .brk: [%#010lx-%#010lx]\n", __pa_symbol(&__brk_base), __pa_symbol(&__brk_limit) - 1);
DMI present.
_brk_end: ee8e6000, __brk_limit: ee895000
--
looks like in arch/x86/kernel/head_32.S will put page_table in _brk.... if the whole range is some high, it will use more buffer in _brk for ... brk pre-calucation could be wrong and too small. Yinghai --
32bit have assume KERNEL_IMAGE_SIZE is 512M arch/x86/include/asm/page_32_types.h:#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) arch/x86/include/asm/page_64_types.h:#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) arch/x86/kernel/head64.c: BUILD_BUG_ON(MODULES_VADDR-KERNEL_IMAGE_START < KERNEL_IMAGE_SIZE); arch/x86/kernel/head64.c: BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE); arch/x86/kernel/head64.c: max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT; arch/x86/kernel/head_32.S: * (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE) arch/x86/kernel/head_32.S: * (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE) arch/x86/kernel/head_32.S: * KERNEL_IMAGE_SIZE should be greater than pa(_end) arch/x86/kernel/head_32.S:KERNEL_PAGES = (KERNEL_IMAGE_SIZE + MAPPING_BEYOND_END)>>PAGE_SHIFT and use that to estimate BRK size. so we could change the BRK calculating code to handle 896M or just limit crashkernel for 32bit to 512M... handle 896M one: --- arch/x86/boot/compressed/misc.c | 2 +- arch/x86/kernel/head_32.S | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/boot/compressed/misc.c =================================================================== --- linux-2.6.orig/arch/x86/boot/compressed/misc.c +++ linux-2.6/arch/x86/boot/compressed/misc.c @@ -365,7 +365,7 @@ asmlinkage void decompress_kernel(void * if (heap > 0x3fffffffffffUL) error("Destination address too large"); #else - if (heap > ((-__PAGE_OFFSET-(512<<20)-1) & 0x7fffffff)) + if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff)) error("Destination address too large"); #endif #ifndef CONFIG_RELOCATABLE Index: linux-2.6/arch/x86/kernel/head_32.S =================================================================== --- linux-2.6.orig/arch/x86/kernel/head_32.S +++ linux-2.6/arch/x86/kernel/head_32.S @@ -68,8 +68,10 @@ MAPPING_BEYOND_END = \ * Worst-case size of the kernel mapping we need to make: ...
Grmf... this was originally 4 GiB, but someone tried to tighten the bound. I think we should set it back to 4 GiB; 896 MiB is still approximate. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
Thinking about it, we probably should *both* fix the brk and limit the crashkernel to 512 MiB (for compatibility with older crashkernels.) -hpa --
Can whomever has a test case for this please test the attached test patch? -hpa
it works ...
with PAGE_OFFSET=0xc0000000
'm in purgatory
bootconsole [uart0] enabled
Kernel Layout:
.text: [0x2e000000-0x2e3f08ca]
.rodata: [0x2e3f2000-0x2e5a2fff]
.data: [0x2e5a3000-0x2e5f6467]
.init: [0x2e5f7000-0x2e670fff]
.bss: [0x2e675000-0x2e76ffff]
.brk: [0x2e770000-0x2e954fff]
memblock_x86_reserve_range: [0x00001000-0x00001fff] EX TRAMPOLINE
memblock_x86_reserve_range: [0x2e000000-0x2e76ffff] TEXT DATA BSS
memblock_x86_reserve_range: [0x35c20000-0x35f49fff] RAMDISK
memblock_x86_reserve_range: [0x0009c800-0x000fffff] * BIOS reserved
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.37-rc5-tip+ (root@mpk12-3214-189-181) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #9 SMP Thu Dec 16 08:46:56 PST 2010
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
BIOS-provided physical RAM map:
BIOS-e820: [0x00000000000100-0x0000000009c7ff] (usable)
BIOS-e820: [0x0000000009c800-0x0000000009ffff] (reserved)
BIOS-e820: [0x000000000e0000-0x000000000fffff] (reserved)
BIOS-e820: [0x00000000100000-0x0000007ff9ffff] (usable)
BIOS-e820: [0x0000007ffae000-0x0000007ffaffff] (usable)
BIOS-e820: [0x0000007ffb0000-0x0000007ffbdfff] (ACPI data)
BIOS-e820: [0x0000007ffbe000-0x0000007ffeffff] (ACPI NVS)
BIOS-e820: [0x0000007fff0000-0x0000007fffffff] (reserved)
BIOS-e820: [0x000000e0000000-0x000000efffffff] (reserved)
BIOS-e820: [0x000000fec00000-0x000000fec00fff] (reserved)
BIOS-e820: [0x000000fee00000-0x000000feefffff] (reserved)
BIOS-e820: [0x000000ff700000-0x000000ffffffff] (reserved)
last_pfn = 0x7ffb0 max_arch_pfn = 0x1000000
NX (Execute Disable) protection: active
user-defined physical RAM map:
user: [0x00000000000000-0x0000000009ffff] (usable)
user: [0x0000002e000000-0x00000035f59fff] (usable)
user: [0x0000007ffb0000-0x0000007ffeffff] ...with PAGE_OFFSET=0x40000000
I'm in purgatory
bootconsole [uart0] enabled
Kernel Layout:
.text: [0x2f000000-0x2f3fbf4a]
.rodata: [0x2f3fe000-0x2f5b1fff]
.data: [0x2f5b2000-0x2f60e067]
.init: [0x2f60f000-0x2f690fff]
.bss: [0x2f695000-0x2f796fff]
.brk: [0x2f797000-0x2fdbafff]
memblock_x86_reserve_range: [0x00001000-0x00001fff] EX TRAMPOLINE
memblock_x86_reserve_range: [0x2f000000-0x2f796fff] TEXT DATA BSS
memblock_x86_reserve_range: [0x36d8a000-0x36f49fff] RAMDISK
memblock_x86_reserve_range: [0x0009c800-0x000fffff] * BIOS reserved
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.37-rc5-tip+ (root@mpk12-3214-189-181) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #11 SMP Thu Dec 16 10:49:27 PST 2010
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
BIOS-provided physical RAM map:
BIOS-e820: [0x00000000000100-0x0000000009c7ff] (usable)
BIOS-e820: [0x0000000009c800-0x0000000009ffff] (reserved)
BIOS-e820: [0x000000000e0000-0x000000000fffff] (reserved)
BIOS-e820: [0x00000000100000-0x0000007ff9ffff] (usable)
BIOS-e820: [0x0000007ffae000-0x0000007ffaffff] (usable)
BIOS-e820: [0x0000007ffb0000-0x0000007ffbdfff] (ACPI data)
BIOS-e820: [0x0000007ffbe000-0x0000007ffeffff] (ACPI NVS)
BIOS-e820: [0x0000007fff0000-0x0000007fffffff] (reserved)
BIOS-e820: [0x000000e0000000-0x000000efffffff] (reserved)
BIOS-e820: [0x000000fec00000-0x000000fec00fff] (reserved)
BIOS-e820: [0x000000fee00000-0x000000feefffff] (reserved)
BIOS-e820: [0x000000ff700000-0x000000ffffffff] (reserved)
last_pfn = 0x7ffb0 max_arch_pfn = 0x1000000
NX (Execute Disable) protection: active
user-defined physical RAM map:
user: [0x00000000000000-0x0000000009ffff] (usable)
user: [0x0000002f000000-0x00000036f59fff] (usable)
user: [0x0000007ffb0000-0x0000007ffeffff] (ACPI data)
DMI ...Commit-ID: 147dd5610c8d1bacb88a6c1dfdaceaf257946ed0 Gitweb: http://git.kernel.org/tip/147dd5610c8d1bacb88a6c1dfdaceaf257946ed0 Author: H. Peter Anvin <hpa@linux.intel.com> AuthorDate: Thu, 16 Dec 2010 19:11:09 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Thu, 16 Dec 2010 19:11:09 -0800 x86-32: Make sure we can map all of lowmem if we need to A relocatable kernel can be anywhere in lowmem -- and in the case of a kdump kernel, is likely to be fairly high. Since the early page tables map everything from address zero up we need to make sure we allocate enough brk that we can map all of lowmem if we need to. Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD3ED.8070607@kernel.org> --- arch/x86/boot/compressed/misc.c | 2 +- arch/x86/kernel/head_32.S | 12 +++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 23f315c..325c052 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -355,7 +355,7 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap, if (heap > 0x3fffffffffffUL) error("Destination address too large"); #else - if (heap > ((-__PAGE_OFFSET-(512<<20)-1) & 0x7fffffff)) + if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff)) error("Destination address too large"); #endif #ifndef CONFIG_RELOCATABLE diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index bcece91..d7cdf5b 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -60,16 +60,18 @@ #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD) #endif +/* Number of possible pages in the lowmem region */ +LOWMEM_PAGES = (((1<<32) - __PAGE_OFFSET) >> PAGE_SHIFT) + /* Enough space to fit pagetables for the low memory linear map */ -MAPPING_BEYOND_END = ...
Fix confirmed, thanks Stanislaw --
Yinghai, On my system above change works fine and I can boot into second kernel. So it will boil down to knowing what are the exact constraints on heap for decompression and for 32bit can we allow heap upto 896MB or not. Thanks Vivek --
really? what is you CONFIG_PAGE_OFFSET? 0x40000000 or 0xc0000000? Yinghai --
By the way, 896 MiB is almost certainly too aggressive; the vmalloc area is adjustable and there are other bits that can chew off a few MiB of address space. I would suggest we either make it 512 or 768 MiB *and* fix the brk limit. Opinions? -hpa --
I'd like to apply a modified version of this patch (attached.) Ack/nak, people? -hpa
Please don't do that to 64 bit My big system with 1024g memory and a lot of cards with rhel 6 to make kdump work must have crashkernel=512m and second kernel need to take pci=nomsi Thanks --
Hm, this seems like an epic FAIL. First of all, the current code still limits it to 896 MiB, so 512 MiB is not a significant restriction. Second, this patch only applies if "crashkernel=" is not specified, so it doesn't even apply to your situation. Third, if you have to specify "crashkernel=" that means that there is yet another problem here that should be fixed. -hpa --
current code:
/* 0 means: find the address automatically */
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */
/*
* kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
*/
crash_base = memblock_find_in_range(alignment,
DEFAULT_BZIMAGE_ADDR_MAX, crash_size, alignment);
if (crash_base == MEMBLOCK_ERROR) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;
start = memblock_find_in_range(crash_base,
crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
}
}
first branch : will take only crash_size.
no, every kdump need to specify crashkernel=128M or more.
Yinghai
--
Oh, you're referring to crashkernel size. Okay, this is somewhat different. However, the margin is just too small on 64 bits, then. How far up can we actually get away with on 64 bits currently? 4 GiB? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
I agree here that we should not do it for 64 bit. - Just because we need it for 32 bit does not mean we should limit it for 64bit. And we do want to have the capability to boot the kernel from as high memory as possible so creating another aritificial limit is counter to that. - I would not worry too much about backward compatibility and allow booting 32bit kernel till 768MB. The reason being that most of the distros use same kernel for crash dumping as regular kernel. Maintainig two separate kernels is big hassle. So a small set of people who run into issue, would need to change kernel command line "crashkernel=128M@64M" or something similar. Thanks --
Do we have actual testing for how high the 64-bit kernel will load? I'm assuming that the usage of a 32-bit kdump kernel for a 64-bit main kernel is nonexistent. -hpa --
In the past I have run into 1-2 folks who were using 32bit kdump kernel on 64bit main. But again for those, the workaround is to specify the different crashkernel= syntax and explicityly specify where to reserve memory. Thanks Vivek --
if bzImage is used, it is 896M. or crashkernel=... will take two ranges like one high and one low. also kexec bzImage in 64bit should use startup_64 aka 0x200 offset instead of startup_32 in arch/x86/boot/compressed/head_64.S then bzImage can be put above 4G... Thanks Yinghai --
Strangely on my x86_84 systems with 37-rc6, I am trying to reserve memory and nothing shows up on /proc/iomem. dmesg says that I am reaserving 128M at 64M but nothing in /proc/iomeme. Going back to .36 kernel and see what happens. Ok, last time we had looked that kexec-tools had constraint to load Neil had been trying that but AFAIK, he had no success. I don't know but he was struggling with setting up pages tables in kexec for 64bit startup. But yes, making use of 64bit entry point is in the wish list. Thanks Vivek --
Why? 896 MiB is a 32-bit kernel limitation which doesn't have anything to do with the bzImage format. So unless there is something going on here, I suspect you're just plain flat wrong. -hpa --
Yinghai, I think x86_64 might have just inherited the settings of 32bit without giving it too much of thought. At that point of time nobody bothered to load the kernel from high addresses. So these might be artificial limits. Thanks Vivek --
Yinghai, On x86_64, I am not seeing "Crash kernel" entry in /proc/iomem. I see following in dmesg. "[ 0.000000] Reserving 128MB of memory at 64MB for crashkernel (System RAM: 5120MB)" Following is my /proc/iomem. # cat /proc/iomem 00000100-0000ffff : reserved 00010000-00096fff : System RAM 00097000-0009ffff : reserved 000c0000-000e7fff : pnp 00:0f 000e8000-000fffff : reserved 00100000-bffc283f : System RAM 01000000-015d1378 : Kernel code 015d1379-01aee00f : Kernel data 01bc8000-024b4c4f : Kernel bss bffc2840-bfffffff : reserved So there is RAM available at the requested address still no entry for "Crash Kernel". This is both with 2.6.36 as well as 37-rc6 kernel. I am wondering if insert_resource() is failing here? Thanks Vivek --
looks like memblock_x86_reserve() is fine. Following is dmesg output with your debug patches applied. [ 0.000000] memblock_x86_reserve_range: [0x01000000-0x024bcb77] TEXT DATA BSS [ 0.000000] memblock_x86_reserve_range: [0x7fafb000-0x7fff3fff] RAMDISK [ 0.000000] memblock_x86_reserve_range: [0x00097000-0x000fffff] * BIOS reserved [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 2.6.37-rc6+ (root@chilli.lab.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #73 SMP Fri Dec 17 15:24:34 EST 2010 [ 0.000000] Command line: ro root=/dev/mapper/vg_chilli-lv_root rd_LVM_LV=vg_chilli/lv_root rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=tty0, console=ttyS0,115200n8 selinux=0 crashkernel=128M@64M kexec_jump_back_entry=0x6148206465520a0f [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000100 - 0000000000097000 (usable) [ 0.000000] BIOS-e820: 0000000000097000 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 00000000bffc2840 (usable) [ 0.000000] BIOS-e820: 00000000bffc2840 - 00000000c0000000 (reserved) [ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) [ 0.000000] BIOS-e820: 0000000100000000 - 0000000140000000 (usable) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.5 present. [ 0.000000] DMI: 0A9Ch/HP xw6600 Workstation, BIOS 786F4 v00.32 09/18/2007 [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) [ 0.000000] No AGP bridge found [ 0.000000] last_pfn = 0x140000 max_arch_pfn = 0x400000000 [ 0.000000] ...
Hi Yinghai, Please ignore this. The problem was with my setup with some user space script setting kexec_crash_size = 0 hence freeing up the memory. I think it is time to put a kernel message when memory is freed/shrinked. I wasted a lot of time debugging it. Sorry for the noise here. thanks Vivek --
Can we do this in the meantime, just so we fix the immediate problem? -hpa
Peter, kexec-tools on 64bit currently seems to be allowing loding bzImage till 896MB. So I am not too keen it to reduce it to 768MB in kernel just because x86_64 could be booted from even higher addresses and somebody first has to do some auditing and experiments. IMHO, we should have 768MB limit for 32bit and continue with 896MB limit for 64bit and once somebody makes x86_64 boot from even higher address reliably then we can change both kernel and kexec-tools. Thanks Vivek --
If we're splitting by architectures anyway, why not leave 32 bits at 512 MiB and thus making older crashkernels usable just in case someone has a frozen toolset? -hpa --
If you are more comfortable with 512MB for i386, that's fine with me. I care more for 64bit at this point of time. Thanks Vivek --
I'm not sure what going on, but I can no logner reproduce kdump problem with -rc6 on my T-500 x86_64 system. I tested below patch together with previous patch "x86-32: Make sure we can map all of lowmem if we need to", and on my both laptops i686 and x86_64 system boots and kdump works. --
Commit-ID: 7f8595bfacef279f06c82ec98d420ef54f2537e0 Gitweb: http://git.kernel.org/tip/7f8595bfacef279f06c82ec98d420ef54f2537e0 Author: H. Peter Anvin <hpa@linux.intel.com> AuthorDate: Thu, 16 Dec 2010 19:20:41 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Fri, 17 Dec 2010 15:04:00 -0800 x86, kexec: Limit the crashkernel address appropriately Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB for 64 bits. For 32 bits, this retains compatibility with earlier kernel releases, and makes it work even if the vmalloc= setting is adjusted. For 64 bits, we should be able to increase this substantially once a hard-coded limit in kexec-tools is fixed. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101217195035.GE14502@redhat.com> --- arch/x86/kernel/setup.c | 17 ++++++++++++++--- 1 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 21c6746..c9089a1 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -501,7 +501,18 @@ static inline unsigned long long get_total_mem(void) return total << PAGE_SHIFT; } -#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF +/* + * Keep the crash kernel below this limit. On 32 bits earlier kernels + * would limit the kernel to the low 512 MiB due to mapping restrictions. + * On 64 bits, kexec-tools currently limits us to 896 MiB; increase this + * limit once kexec-tools are fixed. + */ +#ifdef CONFIG_X86_32 +# define CRASH_KERNEL_ADDR_MAX (512 << 20) +#else +# define CRASH_KERNEL_ADDR_MAX (896 << 20) +#endif + static void __init reserve_crashkernel(void) { unsigned long long total_mem; @@ -520,10 +531,10 @@ static void __init reserve_crashkernel(void) const unsigned long long alignment = 16<<20; /* 16M */ /* - * kexec want bzImage is below ...
So kexec-tools are broken? -hpa --
Yinghai, is it possible to add the debug patch to upstream too? For debugging future kdump issues like this. Thanks. --
I created a Bugzilla entry at https://bugzilla.kernel.org/show_bug.cgi?id=24372 for your bug report, please add your address to the CC list in there, thanks! -- Maciej Rutecki http://www.maciek.unixy.pl --
