From: "Alexander Beregalov" <a.beregalov@gmail.com>
Something is screwey here... Hmmm...
When I added the changeset in question, it fixed a problem in that
any backtrace of a kernel thread would loop forever at the end.
Any stack backtrace would hang or reach a safety limit (such as
the one imposed by lockdep).Please double check that you are precisely reverting this patch
below _before_ doing these tests:commit a051bc5bb1ac6dc138d529077fa20cbbc6622d95
Author: David S. Miller <davem@davemloft.net>
Date: Wed May 21 18:14:28 2008 -0700sparc64: Fix kernel thread stack termination.
Because of the silly way I set up the initial stack for
new kernel threads, there is a loop at the top of the
stack.To fix this, properly add another stack frame that is copied
from the parent and terminate it in the child by setting
the frame pointer in that frame to zero.Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c
index 0a0c05f..2084f81 100644
--- a/arch/sparc64/kernel/process.c
+++ b/arch/sparc64/kernel/process.c
@@ -657,20 +657,39 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long sp,
struct task_struct *p, struct pt_regs *regs)
{
struct thread_info *t = task_thread_info(p);
+ struct sparc_stackf *parent_sf;
+ unsigned long child_stack_sz;
char *child_trap_frame;
+ int kernel_thread;- /* Calculate offset to stack_frame & pt_regs */
- child_trap_frame = task_stack_page(p) + (THREAD_SIZE - (TRACEREG_SZ+STACKFRAME_SZ));
- memcpy(child_trap_frame, (((struct sparc_stackf *)regs)-1), (TRACEREG_SZ+STACKFRAME_SZ));
+ kernel_thread = (regs->tstate & TSTATE_PRIV) ? 1 : 0;
+ parent_sf = ((struct sparc_stackf *) regs) - 1;- t->flags = (t->flags & ~((0xffUL << TI_FLAG_CWP_SHIFT) | (0xffUL << TI_FLAG_CURRENT_DS_SHIFT))) |
+ /* Calculate offset to stack_frame & pt_regs */
+ ch...
Yes, I am sure. It runs without this commit and hangs with it.
I can connect serial console, but if it is a infinite loop it will not
provide more info.$ git log arch/sparc64/kernel/process.c
commit 99d3b2d0d3df1fa171a7ee1d2d3a92f540873b15
Author: alexb <alexb@sparky>
Date: Thu Jun 19 18:49:46 2008 +0400Revert "sparc64: Fix kernel thread stack termination."
This reverts commit a051bc5bb1ac6dc138d529077fa20cbbc6622d95.
commit a051bc5bb1ac6dc138d529077fa20cbbc6622d95
Author: David S. Miller <davem@davemloft.net>
Date: Wed May 21 18:14:28 2008 -0700sparc64: Fix kernel thread stack termination.
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
Ok I have to find some way to reproduce this. Please post the
kernel .config you are using during these tests. Also please
let me know what distribution and compiler version you are using.Thanks.
--
It is Gentoo; gcc version 4.1.2 (Gentoo 4.1.2 p1.0.1); sys-devel/kgcc64-4.1.2.
I cross-compiled it as you advised me:
make -j2 CROSS_COMPILE=sparc64-unknown-linux-gnu- image modules &&
sudo make modules_installConfig is in attachment.
Hi David
Is it possible to add some debug code to understand what is going on?
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
What do you want me to add? The thing hangs and we have no idea
where since there are no messages on the console and the hang
happens before we have any kind of console output.Why do you think I haven't been able to fix this yet?
Besides I'm too busy with some networking stuff to work at
all on this at the moment, so folks will just need to be
patient or do the grunt work of debugging this themselves.
--
David Miller writes:
> From: "Alexander Beregalov" <a.beregalov@gmail.com>
> Date: Mon, 7 Jul 2008 13:19:05 +0400
>
> > Hi David
> >
> > Is it possible to add some debug code to understand what is going on?
>
> What do you want me to add? The thing hangs and we have no idea
> where since there are no messages on the console and the hang
> happens before we have any kind of console output.
>
> Why do you think I haven't been able to fix this yet?
>
> Besides I'm too busy with some networking stuff to work at
> all on this at the moment, so folks will just need to be
> patient or do the grunt work of debugging this themselves.My Ultra 5 (same mainboard as the Ultra 10) boots 2.6.26-rc9
just fine. So Alexander's problem is probably caused by .config
settings or his toolchain.My working 2.6.26-rc9 .config and a boot log are available in
<http://user.it.uu.se/~mikpe/linux/ultra5/> in case someone wants
to use them as a starting point for debugging this problem.The toolchain I used has gcc-4.3.1 and binutils-2.17.50.0.3.
/Mikael
--
I have turned off LOCKDEP and it boots properly.
2.6.26-rc9-00005-g1b40a89Mikael's config also does not contain LOCKDEP.
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
I have finally reproduced the problem locally and figured out the
bug.Please try this patch:
sparc64: Fix end-of-stack checking in save_stack_trace().
Bug reported by Alexander Beregalov.
Before we dereference the stack frame or try to peek at the
pt_regs magic value, make sure the entire object is within
the kernel stack bounds.Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/kernel/stacktrace.c b/arch/sparc64/kernel/stacktrace.c
index c73ce3f..c5576e8 100644
--- a/arch/sparc64/kernel/stacktrace.c
+++ b/arch/sparc64/kernel/stacktrace.c
@@ -25,13 +25,15 @@ void save_stack_trace(struct stack_trace *trace)/* Bogus frame pointer? */
if (fp < (thread_base + sizeof(struct thread_info)) ||
- fp >= (thread_base + THREAD_SIZE))
+ fp > (thread_base + THREAD_SIZE - sizeof(struct sparc_stackf)))
break;sf = (struct sparc_stackf *) fp;
regs = (struct pt_regs *) (sf + 1);- if ((regs->magic & ~0x1ff) == PT_REGS_MAGIC) {
+ if (((unsigned long)regs <=
+ (thread_base + THREAD_SIZE - sizeof(*regs))) &&
+ (regs->magic & ~0x1ff) == PT_REGS_MAGIC) {
if (!(regs->tstate & TSTATE_PRIV))
break;
pc = regs->tpc;
--
Thanks David, but 2.6.27-rc2-00166-gaeee90d hangs in the same way.
Config is in attachment.
From: "Alexander Beregalov" <a.beregalov@gmail.com>
That patch was for you to add on top of whatever tree you
have handy. Did you apply the patch?That patch will fix all trees.
--
Yes, I applied it manually on top of 2.6.27-rc2-0166
$git diff
diff --git a/arch/sparc64/kernel/stacktrace.c b/arch/sparc64/kernel/stacktrace.c
index b3e3737..c22a131 100644
--- a/arch/sparc64/kernel/stacktrace.c
+++ b/arch/sparc64/kernel/stacktrace.c
@@ -26,13 +26,15 @@ void save_stack_trace(struct stack_trace *trace)/* Bogus frame pointer? */
if (fp < (thread_base + sizeof(struct thread_info)) ||
- fp >= (thread_base + THREAD_SIZE))
+ fp > (thread_base + THREAD_SIZE - sizeof(struct
sparc_stackf)))
break;sf = (struct sparc_stackf *) fp;
regs = (struct pt_regs *) (sf + 1);- if ((regs->magic & ~0x1ff) == PT_REGS_MAGIC) {
+ if (((unsigned long)regs <=
+ (thread_base + THREAD_SIZE - sizeof(*regs))) &&
+ (regs->magic & ~0x1ff) == PT_REGS_MAGIC) {
if (!(regs->tstate & TSTATE_PRIV))
break;
pc = regs->tpc;
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
And then the problem goes away right?
--
No, It hangs in the same way, right after
console handover: boot [earlyprom0] -> real [tty0]
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
Please edit arch/sparc64/kernel/setup.c, where it says:
static struct console prom_early_console = {
.name = "earlyprom",
.write = prom_console_write,
.flags = CON_PRINTBUFFER | CON_BOOT | CON_ANYTIME,
.index = -1,
};and remove "CON_BOOT |".
This will allow you to see the crash message.
Please also double check that you patched the kernel with my
fix correctly. I used your exact config, on the exact same
kind of system, reproducing the exact same hang, and it goes
away with my fix.
--
It can be a different compiler version,
but I tried 4.3.1 and 4.1.2 with the same result.After I removed CON_BOOT warnings apear after
Locking API testsuite:
<..>
Good, all 218 testcases passed!
Dentry cache hash table entries: 131072 (order: 7, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 6, 524288 bytes)
=== here ===
Memory: 1019600k available (2672k kernel code, 1264k data, 128k init)
[fffff80000000000,000000003ff46000]
SLUB: Genslabs=13, HWalign=32, Order=0-2, MinObjects=8, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 884.33 BogoMIPS (lpj=4421694)Is it possible that lockdep messages did not appear when CON_BOOT was
in the flags?Hope it will help.
--
Yes, I saw it.
There were few WARNINGS at lib/list_debug.c:__list_add
That messages went fast, I can not see it now.
Now I see call trace:
__free_pages_ok
__free_pages
__free_pages_bootmem
free_all_bootmem_core
free_all_bootmem
mem_init
start_kernel
tlb_fixup_doneCan it be helpful?
I tried to connect console cable to it, but nothing was there. I found
I should disconnect keyb and vga to able to see it, I will try it
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
Mikulas Patocka is seeing the same bug (see thread "Re: console
handover badness") I just posted the following patch there that can
help track this down.Please try it out on your machine too.
BTW, how much ram is in your system?
Thanks.
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index 217de3e..26b018f 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -1643,6 +1643,8 @@ void __init setup_per_cpu_areas(void)
{
}+extern void sparse_validate_usemap(const char *file, int line);
+
void __init paging_init(void)
{
unsigned long end_pfn, shift, phys_base;
@@ -1788,7 +1790,9 @@ void __init paging_init(void)
#ifndef CONFIG_NEED_MULTIPLE_NODES
max_mapnr = last_valid_pfn;
#endif
+ sparse_validate_usemap(__FILE__, __LINE__);
kernel_physical_mapping_init();
+ sparse_validate_usemap(__FILE__, __LINE__);{
unsigned long max_zone_pfns[MAX_NR_ZONES];
@@ -1798,12 +1802,15 @@ void __init paging_init(void)
max_zone_pfns[ZONE_NORMAL] = end_pfn;free_area_init_nodes(max_zone_pfns);
+ sparse_validate_usemap(__FILE__, __LINE__);
}printk("Booting Linux...\n");
central_probe();
+ sparse_validate_usemap(__FILE__, __LINE__);
cpu_probe();
+ sparse_validate_usemap(__FILE__, __LINE__);
}int __init page_in_phys_avail(unsigned long paddr)
diff --git a/init/main.c b/init/main.c
index 0bc7e16..80771f5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -536,6 +536,8 @@ void __init __weak thread_info_cache_init(void)
{
}+extern void sparse_validate_usemap(const char *file, int line);
+
asmlinkage void __init start_kernel(void)
{
char * command_line;
@@ -567,12 +569,19 @@ asmlinkage void __init start_kernel(void)
printk(KERN_NOTICE);
printk(linux_banner);
setup_arch(&command_line);
+ sparse_validate_usemap(__FILE__, __LINE__);
mm_init_owner(&init_mm, &init_task);
+ sparse_validate_usemap(__FILE__, __LINE__);
se...
Bogus migrate type 6
Usemap for section 0 corrupted
paging_init+0xcac/0xd38[arch/sparc64/mm/init.c:1795]1790 #ifndef CONFIG_NEED_MULTIPLE_NODES
1791 max_mapnr = last_valid_pfn;
1792 #endif
1793 sparse_validate_usemap(__FILE__, __LINE__);
1794 kernel_physical_mapping_init();
1795 sparse_validate_usemap(__FILE__, __LINE__);
1796
1797 {
1798 unsigned long max_zone_pfns[MAX_NR_ZONES];
1799
Ultra 10, 1024MbThanks David
--
From: "Alexander Beregalov" <a.beregalov@gmail.com>
This is probably a different bug than the one I fixed, we'll have
to analyze this somehow.I'll cook up a patch that will let you see the crash without it
scrolling off the screen.
--
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| David Miller | Re: [BUG] New Kernel Bugs |
