Hello list, The following patch is to fixed a race in ptrace_stop handling which causes "strace" hang if the target process blocks SIGTRAP with the test case filed at https://bugzilla.redhat.com/show_bug.cgi?id=446200#c16. Please note this is just IA64 problem because just IA64 has arch_ptrace_stop_needed defined, and has arch_ptrace_stop defined that would set notify_resume flags for syncing rbs...but it also opens the door to invoke ia64_do_signal->get_signal_to_deliver before setting current PTRACED flag. Please help review. **The patch is enclosed in text attachment* **I'm using web client to send the patch* * Signed-off-by: Yu Luming <luming.yu@intel.com> -------------------------------------- signal.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
That's probably not what we want. What happens if the task then sleeps during the user-space access? Unless I forgot something obvious, it will never get scheduled again... --
My intention is to disable signal delivering before TASK_TRACED flag is set for correctly handling ptrace_stop() with SIGTRAP masked. Although this patch totally is a hack, but it should clearly shows --
I really cannot figure out from anything you've said what the failure mode is or how you think it should be affected. Thanks, Roland --
At least you should be able to see the related bug report at: https://bugzilla.redhat.com/show_bug.cgi?id=446200 I'm getting Access Denied. :( Petr Tesarik --
That really doesn't help (yes, I can see it). That is a bug for the RHEL4 kernel, which is based on a much older kernel and has none of the code we're discussing here. The test case is a simple program that blocks SIGTRAP using sigprocmask, then forks and exec's a simple shell script that just runs some random command; run this under strace. There is no further information in that bugzilla report that helps us understand the behavior of that test case in the current kernel, which Luming says has a problem. Thanks, Roland --
ok , I just got permit to share the test case here.. Please check out
the attachment.
1. tar jxvf testpro.tar.bz2
testpro/test1.c
test2.sh
2.cd testpro
3.gcc -o test1 test1.c
4.strace -f -o log.txt ./test1
hope you can reproduce the problem.
Thanks,
Luming
Sorry for confusion, Let me try to explain it more: For ia64,the code path is like: ptrace_notify (to let the debugger run)--> ptrace_stop -->spin_unlock_irq->arch_ptrace_stop (ia64_ptrace_stop) ->[sync rbs and set NOTIFY_RESUME.....]-->spin_lock_irq->set TASK_TRACED flag (to let the debugger run) For x86, the code path is like: ptrace_notify (to let the debugger run) ->ptrace_stop->set TASK_TRACED flag (to let the debugger run)-->spin_unlock_irq If TASK_TRACED is not set earlier before arch_ptrace_stop on ia64 ptrace_notify code path, some signals would be delivered without letting debugger run.. (i.e. PTRACED logica in get_signal_to_deliver would be ignored totally!). These should cause the test case hang on ia64. And x86 just works.. If you have any question , I would dig further.. Thanks, Luming --
> Sorry for confusion, Let me try to explain it more: I do not understand this at all, and it has given no information you did not give before. Please describe the scenario you see in fine-grained terms. Thanks, Roland --
In the code path mentioned above, I see only ia64 has chance to let ptraced thread deliver those pending signals before TASK_TRACED is set. Then debugger thread would lose chance to interfere the delivering of those signals if I correctly understand PT_PTRACED logic in get_signal_to_deliver, and the relationship between the two flag : TASK_TRACED and PT_TRACED. Since you write those code, Please clarify, in ptrace_notify code path, is it allowed that ptraced thread can run signal handler without telling debugger what happened? I noticed the only difference between x86 and IA64 , and it does make the test case work on x86, and fail on IA64... So I made the patch trying to eliminate the difference. It indeed seems to solve my problem although it is still hack, and I don't know what kind of signals strace handled has such magic.. As for how the door is only open for ia64, I can explain further if you want to know. --
This is the key thing that makes no sense to me. What do you mean by "deliver" here? The normal meaning of to "deliver" a signal means that get_signal_to_deliver() dequeues and processes it. This can never happen inside any other kernel code path. It certainly can never happen inside ptrace_stop or ptrace_notify. The difference between ia64 and others is that inside ptrace_stop(), it can release the siglock and then can block (in page faults, or by preemption). No signal delivery can happen. What perhaps can happen is that TIF_SIGPENDING being set can cause a TASK_INTERRUPTIBLE wait in the page fault path to return early and ia64_sync_user_rbs will bail out before doing all its work. But you haven't described any problem like that. Please tell us the exact code path you think is happening in the error case you can reproduce. Describe the actual code path, not high-level notions like "deliver those pending signals". Thanks, Roland --
"deliver" here I mean get_signal_to_deliver()... Please clarify, in ptrace_notify code path, is it allowed that ptraced thread can ignore signals (such as SIGTRAP) without telling debugger what happened? This is key problem I'm not quite sure... Because if the test doesn't block SIGTRAP, it works on IA64 too.. Hope I have made the problem clear... --
> "deliver" here I mean get_signal_to_deliver()... This can never happen inside ptrace_stop. It was already quite clear what elements have to be present to cause a problem with the test case (ia64 and blocked SIGTRAP). Nothing else about it is yet clear to me, sorry. Please tell us the exact code path you think is happening in the error case you can reproduce. Thanks, Roland --
I will try to customize kernel to capture call trace for a precise code path. --Luming --
It does happen!!
Call Trace:
[<a000000100011bd0>] show_stack+0x50/0xa0
sp=e000000146bbfbb0 bsp=e000000146bb0e08
[<a000000100011c50>] dump_stack+0x30/0x60
sp=e000000146bbfd80 bsp=e000000146bb0de8
[<a0000001000979a0>] get_signal_to_deliver+0x60/0x6e0
sp=e000000146bbfd80 bsp=e000000146bb0d80
[<a0000001000343d0>] ia64_do_signal+0xb0/0xd00
sp=e000000146bbfd80 bsp=e000000146bb0cd8
[<a000000100012650>] do_notify_resume_user+0xf0/0x140
sp=e000000146bbfe20 bsp=e000000146bb0ca8
[<a00000010000aac0>] notify_resume_user+0x40/0x60
sp=e000000146bbfe20 bsp=e000000146bb0c58
[<a00000010000a9f0>] skip_rbs_switch+0xe0/0x110
sp=e000000146bbfe30 bsp=e000000146bb0c58
[<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
I applied the following patch , and got the call trace above..
If apply my RFC patch as antidote, I don't see "deliver" ...
Is the problem clear now? I will serve you until every thing is clear to you.
Thanks,
Luming
Signed-off-by: Yu Luming <luming.yu@intel.com>
diff -Bru 1/kernel/signal.c 0/kernel/signal.c
--- 1/kernel/signal.c 2008-05-27 15:18:48.000000000 +0800
+++ 0/kernel/signal.c 2008-05-27 15:08:51.000000000 +0800
@@ -38,6 +38,7 @@
*/
static struct kmem_cache *sigqueue_cachep;
+unsigned long global_arch_ptrace_stop_flag =0;
static int __sig_ignored(struct task_struct *t, int sig)
{
@@ -1501,9 +1502,12 @@
* siglock. That must prevent us from sleeping in TASK_TRACED.
* So after regaining the lock, we must check for SIGKILL.
*/
+ global_arch_ptrace_stop_flag = 1;
spin_unlock_irq(&current->sighand->siglock);
arch_ptrace_stop(exit_code, info);
+
spin_lock_irq(&current->sighand->siglock);
+ ...> > if happens, it should be a bug, right? It doesn't even make sense that it should be possible. So if it somehow is possible, that is certainly a bug. So this here shows a perfectly normal trace that bottoms out at a syscall entry from user mode. You seem to be saying that, somehow, inside ptrace_stop(), we tried to return to user mode--I guess you mean losing the kernel stack with the call chain leading to ptrace_stop()--and then With just that diagnostic patch as shown, these might be two different threads. But I guess you've ruled that out somehow? If this does in fact happen in the thread that is supposed to be in ptrace_stop(), then the That's quite a commitment! My full enlightenment may be a long time off. I won't hold you to it once we've fixed this particular bug, though. ;-) What should be happening is that ia64_ptrace_stop() should do its work, possibly blocking, and then return to its caller in ptrace_stop(). At no point should it be possible for ia64_ptrace_stop() to return directly to user mode, or to reenter notify_resume_user() in any fashion. Please focus on the exact code path taken inside the ia64_ptrace_stop() call. It should be possible to identify every step of that and see exactly where it goes astray from what we expect. Thanks, Roland --
I revised patch a bit, and managed to get this:
Call Trace:
[<a000000100011bd0>] show_stack+0x50/0xa0
sp=e000000141c9fbb0 bsp=e000000141c90ea8
[<a000000100011c50>] dump_stack+0x30/0x60
sp=e000000141c9fd80 bsp=e000000141c90e90
[<a000000100097a20>] get_signal_to_deliver+0xa0/0x720
sp=e000000141c9fd80 bsp=e000000141c90e28
[<a0000001000343d0>] ia64_do_signal+0xb0/0xd00
sp=e000000141c9fd80 bsp=e000000141c90d78
[<a000000100012650>] do_notify_resume_user+0xf0/0x140
sp=e000000141c9fe20 bsp=e000000141c90d48
[<a00000010000aac0>] notify_resume_user+0x40/0x60
sp=e000000141c9fe20 bsp=e000000141c90cf8
[<a00000010000a9f0>] skip_rbs_switch+0xe0/0x110
sp=e000000141c9fe30 bsp=e000000141c90cf8
[<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
sp=e000000141ca0000 bsp=e000000141c90cf8
[<a0000001000971c0>] ptrace_stop+0xa0/0x3e0
sp=e00000014716fdb0 bsp=e000000147160ca8
[<a000000100097650>] ptrace_notify+0x150/0x1c0
sp=e00000014716fdb0 bsp=e000000147160c88
[<a00000010002adb0>] syscall_trace+0x30/0xc0
sp=e00000014716fe30 bsp=e000000147160c70
[<a00000010002aea0>] syscall_trace_enter+0x60/0xa0
sp=e00000014716fe30 bsp=e000000147160c18
[<a00000010000a300>] ia64_trace_syscall+0x40/0x110
sp=e00000014716fe30 bsp=e000000147160c18
[<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
sp=e000000147170000 bsp=e000000147160c18
--
If this is a backtrace of a single thread on the same kernel stack, it has a stack depth of over 84MB. You still need to follow the path inside ia64_ptrace_stop() to see how any of this is possible. Thanks, Roland --
Indeed, there seems to be a large hole here. So, this is either a bug in the unwinder, or a bug in the RBS synchronization, which causes corruption. My test machine currently needs some work to run 2.6.25 again, but I'll try your test case as soon as I re-install it later this week. Cheers, --
Just want to check if the test case works for you? --
Yes, the test case hangs here too. But the problem seems to be elsewhere. Did you look into the strace output? This line is pretty suspicious: 3258 clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x200000000004e290) = 1 Obviously, strace cannot attach PID 1, and since it is not designed to handle this situation, it hangs. I'm going to investigate why the return value of the clone2 syscall is seen as 1 by the tracer. Might even turn out to be a bug in strace... Petr Tesarik --
It's definitely a bug in strace. For some reason (I don't care about) the execve() syscall produces an extra notification. However, this notification message is suppressed when SIGTRAP is blocked. This explains why the test case fails only when SIGTRAP is blocked. Now, you may ask why it only fails on ia64 and not on i386 or x86_64. Well, I was so good that I even looked into strace sources to make sure. Whereas for i386 and x86_64, the value of EAX/RAX is checked for -ENOSYS in syscall_fixup(), for ia64 the first ptrace() after an execve() is unconditionally ignored, see code in get_scno(). I don't know why Luming's fix helps here, but, please, fix strace, don't introduce weird behaviour in the kernel. The only thing I'm willing to talk about is why the extra notification message is sent, and how userspace (strace) is supposed to recognize it. FWIW the backtrace (system tap was at __group_send_sig_info): 0xa0000001000b1a60 : __group_send_sig_info+0x0/0x180 [] 0xa0000001000b1e30 : do_notify_parent_cldstop+0x250/0x2c0 [] 0xa0000001000b2230 : ptrace_stop+0x2b0/0x3c0 [] 0xa0000001000b5200 : get_signal_to_deliver+0x200/0xa40 [] 0xa000000100035920 : ia64_do_signal+0xa0/0xee0 [] 0xa000000100014b60 : do_notify_resume_user+0x100/0x160 [] 0xa00000010000d040 : notify_resume_user+0x40/0x60 [] 0xa00000010000cf40 : skip_rbs_switch+0xf0/0x150 [] 0xa000000000010620 : __kernel_syscall_via_break+0x0/0x20 [] Regards, Petr Tesarik --
What do you mean by "extra"? There is a SIGTRAP sent after execve completes when ptraced, even when PTRACE_SYSCALL is not being used. So for an execve that succeeds under PTRACE_SYSCALL, there is a ptrace_notify at syscall entry, then a SIGTRAP queued (i.e., not seen by ptrace if blocked), then a ptrace_notify at syscall exit. If that's what's happening (including the blocked SIGTRAP not being seen by the ptracer, i.e. strace), then there is no mystery (and no bug). Thanks, Roland --
This might not be the same bug ... but I do have a definite 100%
reproducible bug (latest git kernel, old version of strace (4.5.15-1.el4.1))
Run:
$ strace -o logit -f make
in any directory where make is actually going to have to do some
work. You'll see that the command hangs after make outputs the
first action that it will take. Looking at the stack traces of
the 3 processes involved it seems that make forked, the child
stopped in ptrace waiting for some action from strace, but strace
isn't woken from its sleep in wait().
Backtrace of pid 6442 (strace)
Call Trace:
[<a0000001007069b0>] schedule+0x11f0/0x1380
sp=e0000001b28cfdb0 bsp=e0000001b28c0e00
[<a0000001000842d0>] do_wait+0x1110/0x1520
sp=e0000001b28cfdd0 bsp=e0000001b28c0d58
[<a0000001000849c0>] sys_wait4+0x140/0x1a0
sp=e0000001b28cfe30 bsp=e0000001b28c0cd8
[<a00000010000aa60>] ia64_ret_from_syscall+0x0/0x20
sp=e0000001b28cfe30 bsp=e0000001b28c0cd8
[<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
sp=e0000001b28d0000 bsp=e0000001b28c0cd8
Backtrace of pid 6443 (make)
Call Trace:
[<a0000001007069b0>] schedule+0x11f0/0x1380
sp=e0000001b768fb40 bsp=e0000001b7680d58
[<a000000100707800>] schedule_timeout+0x40/0x180
sp=e0000001b768fb60 bsp=e0000001b7680d28
[<a000000100706d60>] wait_for_common+0x220/0x380
sp=e0000001b768fb90 bsp=e0000001b7680cd8
[<a000000100706f00>] wait_for_completion+0x40/0x60
sp=e0000001b768fbf0 bsp=e0000001b7680cb8
[<a0000001000794d0>] do_fork+0x430/0x4a0
sp=e0000001b768fbf0 bsp=e0000001b7680c60
[<a00000010000a340>] sys_clone+0x60/0x80
sp=e0000001b768fc20 bsp=e0000001b7680c10
[<a00000010000a990>] ...This trace (do_fork->wait_for_completion) tells us this is a vfork call. This is the normal trace for the child having received a signal and stopped to tell ptrace about it (not a syscall tracing stop). I think you need to look into what strace is doing. There is far too much going to know much of anything just from the kernel state where the processes sit. In particular, the sequence of ptrace and wait calls strace made. If the same strace (identical everything) behaved differently with an older kernel, then compare the sequence of ptrace and wait calls and see where it differs. Thanks, Roland --
For this test case, utrace doesn't work too.. --
This is exact problem I suspected and I was trying to address in my hack.. Since there are several processes involved in the pretty complex ptrace scenario., I need to capture all processes context with kdump to confirm this is exact root-cause for the problem. But kdump doesn't work for me..I'm trying to solve it now.. I'm also in doubt about the semantic correctness of the test case.. Since SIGTRAP is so necessary to get ptrace work, is it legitimate to block it in test case? One more thing I need to say is: Same strace works for utrace enabled kernel on IA64.. If the bug is in strace, how could it happen? --
No idea, but send me the strace.log file from running strace -o strace.log strace -f -o log.txt ./test1 and I may be able to tell. Petr Tesarik --
Hm, I think without utrace, it gets out-of-sync once, so syscall entries
and exits are swapped from that point on. With utrace, it gets
out-of-sync _twice_, so it eventually looks fine. But the strace output
definitely looks incorrect even with utrace:
5718 execve("./test2.sh", [], [/* 23 vars */]) = 1
5718 execve("", [0x840c001000100003, 0x26230c14203032, 0x8cb0008800140a81, 0xa643100801808402, 0x2400905000040088, 0x11600a0072000001, 0xad814a00402e0, 0x2200012464009344, 0x1180418512c40026, 0x400003081880008, 0x2100010840910404, 0x8045120000800003, 0x6400000c0000600, 0xc20063440501400, 0x1048015002008081, 0xe02226005008c010, ...], [/* 0 vars */]) = 1
5718 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
Note that strace missed a brk() syscall, although I can actually see
this in the other trace you sent me:
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 5704
ptrace(PTRACE_PEEKUSER, 5704, psr, NULL) = 4398046511120
ptrace(PTRACE_PEEKUSER, 5704, r15, NULL) = 1060
ptrace(PTRACE_SYSCALL, 5704, 0x1, SIG_0) = 0
Look at the value of r15, and compare it with unistd.h:
#define __NR_brk 1060
I _guess_ this is caused by the fact that test2.sh is a shell script, so
the kernel executes the shell, and maybe utrace produces a second execve
notifications in this case? Roland, can you shed some light?
Petr Tesarik
--
Not really. The utrace kernels Luming is trying are intended to match the vanilla ptrace behavior. I don't think it's very useful to worry about the difference between some utrace kernel and the current vanilla kernel. Let's just look at what the current vanilla kernel is doing and compare that to what an older vanilla kernel did if older versions produced different results for the test case. Thanks, Roland --
Just have chance to re-check the problem. I have spotted the place that strace misbehaves when SIGTRAP is blocked by task being straced with -f flag (follow trace) The problem is that after debugee blocking SIGTRAP the task's exec path would not wake up debugger (strace) with this signal. But strace don't know about it, and expect such a wake up. Upon receiving a subsequent wake up,strace still thought it was a wake up from "SIGTRAP in previous exec path". From now on, Debuger and Debugee misunderstand each other.. With the test case posted before in this thread and a customized kernel, I got the following debug info from kernel: syscall_trace_enter exiting (4717) report_exec:pid= 4717 parent=4715, real_parent=4716,current->ptrace =00000001, report_exec:pid= 4717 parent=4715, real_parent=4716,current->ptrace =00000001, syscall_trace_leave entering (4717, syscall:1033) ..sig_ignored=1.. 4715, PTRACE_PEEKUSR 4717 @addr 00000830 return 00000000 4715, PTRACE_PEEKUSR 4717 @addr 000008c0 return 00000000 4715, PTRACE_PEEKUSR 4717 @addr 000008d0 return 00000000 4715, PTRACE_PEEKUSR 4717 @addr 000008d0 return 00000000 4715, PTRACE_PEEKUSR 4717 @addr 000008c0 return 00000000 syscall_trace current(4717)->exit_code(0) syscall_trace_leave exiting (4717) syscall_trace_enter beginning (4717, syscall: 1060) ..sig_ignored=1.. 4715, PTRACE_PEEKUSR 4717 @addr 00000830 return 00000010 4715, PTRACE_PEEKUSR 4717 @addr 000008b8 return 00000424 syscall_trace current(4717)->exit_code(0) syscall_trace_enter exiting (4717) Delete the line that blocks SIGTRAP in the test case, rerun it, I got: syscall_trace current(4724)->exit_code(0) syscall_trace_enter exiting (4724) report_exec:pid= 4724 parent=4722, real_parent=4723,current->ptrace =00000001, report_exec:pid= 4724 parent=4722, real_parent=4723,current->ptrace =00000001, syscall_trace_leave entering (4724, syscall:1033) ..sig_ignored=1.. 4722, PTRACE_PEEKUSR 4724 @addr 00000830 return 00000000 4722, PTRACE_PEEKUSR 4724 @addr 000008c0 ...
This subject line is quite far from what you're now addressing, btw. I certainly wouldn't recommend that patch. The interference between ptrace uses of SIGTRAP and a program that blocks SIGTRAP is just a fact of life with ptrace. If you wanted to change the exec SIGTRAP to be more consistent with other signals induced by debuggers (such as breakpoints and single-step), you could just change send_sig to force_sig. That unblocks SIGTRAP and resets its handler when it's blocked. Conversely, a tracer program could use PTRACE_O_TRACEEXEC if it cared to deal with tracing a process that blocks SIGTRAP. (That uses ptrace_notify rather than any actual signal that can be blocked.) But, none of this goes to your actual problem at all AFAICT. You've said your actual problem with strace only arises on ia64. The behavior of ptrace'd exec, and the effect of a program blocking SIGTRAP, is exactly the same across all machines. Changing it does not explain your situation. I'm in favor of understanding what is actually going on before changing things. If the SIGTRAP from exec is actually seen by strace on a different machine like it's not on ia64, that suggests that something different happened between the two. In the case that doesn't exhibit the problem, either SIGTRAP got unblocked by something before the exec, or userland (strace) behaves differently so that it doesn't care about not seeing that SIGTRAP. Thanks, Roland --
The reason is not as complicated as I thought. It is because x86 strace don't test TCB_WAITEXECVE. please take a look at defs.h (strace-4.5.16), the flag is defined as follows: #ifdef LINUX # if defined(ALPHA) || defined(SPARC) || defined(SPARC64) || defined(POWERPC) || defined(IA64) || defined(HPPA) || defined(SH) || defined(SH64) || defined(S390) || defined(S390X) || defined(ARM) # define TCB_WAITEXECVE 02000 /* ignore SIGTRAP after exceve */ # endif I'm not an strace expert, so I have no idea what the TCB_WAITEXECVE means. And why x86 strace can handle SIGTRAP after execve but ALL other Arch can not.. Could anybody shield some lights on the flag? --
So is there or is there not any regression (change) in the kernel's behavior? If you are now just talking about strace's own code, then the place for that is strace-devel@lists.sourceforge.net, not any kernel lists. Thanks, Roland --
The another mysterious that I completely don't understand probably needs your help. Why kernels with utrace patch applied on ia64 just work? Sounds like utrace delivers an extra SIGTRAP comparing with kernel without utrace code.. Dose it make sense? Hmm.. Sounds like a bug to me.. --
you are right, I just find outputs of two different thread are mixed! Please just ignore the call trace above.. --
I tend to believe singal "deliver" in ptrace_notify path is impossible.. Given the fact that if not block SIGTRAP, the test case just works on ia64, I still believe some signals have been consumed without letting debugger know.. but it is just my guess, We still need to investigate how it happened..I assume you should know how the test case failed. PS shows the test case hangs in process 6553 which seem to be impossible ... Please help me understand what kind of situation could cause it happen. That would be very helpful. root 6516 0.0 0.0 57344 5184 ttyS1 Ss+ 19:33 0:00 -bash root 6551 0.0 0.0 4992 2560 ttyS1 T 19:33 0:00 \_ strace -f ./test1 root 6552 0.0 0.0 4416 1792 ttyS1 T 19:33 0:00 \_ ./test1 root 6553 0.0 0.0 56832 4288 ttyS1 T 19:33 0:01 \_ /bin/bash ./test2.sh Thanks, Luming --
Some correction about the test case hang, the ps output should looks like the following: 13925 S+ strace -f ./test1 13926 S+ ./test1 13927 T+ /bin/bash ./test2 I'm trying upstream kdump to get more detailed data for help analysis the scenario.. But unfortunately upstream kernel just hang when I echo c to sysrq-trigger. Downgrade the kernel to 2.6.22, 'echo c' doesn't hang, but I just got "zero" dump file... Will try F9 later... But from the symptom shouldn't we ask this question: why process 13927 can't wake up? Roland, please confirm if ptrace_untrace is the only way to get Traced process wake up? Back up to caller of ptrace_untrace, maybe we can find out why my RFC patch happen to fix the problem... Any suggestions are welcome! --Luming --
Upstream kdump doesn't work! don't know what cause the regression, possibly kexec-tools... Need investigation here! --
I'd say the first thing to do is understand the sequence of ptrace calls, wait results seen by the ptracer, and the ptrace_stop() path, in your test case. Compare that on x86 and ia64, where the behavior you see differs. When you pinpoint the difference in that sequence, we will have some sense of what it is that really matters to your test. Thanks, Roland --
