In latest 2.6.27(git) enabling dynamic ftrace makes resume from a suspend to ram reboot instead of resuming. Queued for 2.6.28 is a new method of recording mcount callers at compile time that does not have this issue. But the new method is still too "green" to be pulled into 27, so the old ftraced (daemon method) needs to be fixed for 27. The way dynamic ftrace works with the daemon method is this. On boot up the mcount function simply returns. When ftrace is initialized, it calls kstop_machine to modify the mcount function to call another function called "ftrace_record_ip". This new function will record in a preallocated hash (allocated by the ftrace initializer) all the callers of mcount. A check is made to see if the caller has already been put into the hash, and if so, it is not recorded again. Later on a kernel thread ftraced is created. This kernel thread wakes up once a second and checks to see if any new functions were added to the hash. If so, it then calls kstop_machine and modifies those callers to mcount into nops. Again, this daemon method makes resume from suspend to ram reboot instead of resuming. Now, I'm asking the s2r gurus, what did I miss? Do I need to add a "NO_FREEZE" flag or something to the "ftraced" kernel thread? Just asking for some advice. Thanks, -- Steve --
If I'm not mistaken, it'll probably suffice to make it freezable, so that it doesn't run while the system is suspending and resuming. Would that be acceptable? Please tell me where exactly the ftraced source code is located. Thanks, Rafael --
It's in Linus's latest git tree. The code in question is the ftraced() function in kernel/trace/ftrace.c Thanks, -- Steve --
Thanks, I'll have a look in a while. Rafael --
Can you try the appended patch, please?
Thanks,
Rafael
---
kernel/trace/ftrace.c | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/kernel/trace/ftrace.c
===================================================================
--- linux-2.6.orig/kernel/trace/ftrace.c
+++ linux-2.6/kernel/trace/ftrace.c
@@ -796,8 +796,13 @@ static int ftraced(void *ignore)
{
unsigned long usecs;
+ set_freezable();
+
while (!kthread_should_stop()) {
+ if (try_to_freeze())
+ continue;
+
set_current_state(TASK_INTERRUPTIBLE);
/* check once a second */
--
makes sense - i've applied it to tip/tracing/urgent, see the tidied up
commit below.
It should be no big issue not being able to trace across suspend+resume
- and that restriction will go away with Steve's build-time based mcount
patching mechanism in v2.6.28.
Ingo
------------->
From 0e556695ddc8eebf6f6dd86bb0c4911b2b90c12a Mon Sep 17 00:00:00 2001
From: Rafael J. Wysocki <rjw@sisk.pl>
Date: Thu, 21 Aug 2008 21:59:36 +0200
Subject: [PATCH] ftrace: fix ftraced and suspend to ram
It will suffice to make it freezable, so that it doesn't run while the
system is suspending and resuming.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/trace/ftrace.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 639e16c..49f4c3f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -819,8 +819,13 @@ static int ftraced(void *ignore)
{
unsigned long usecs;
+ set_freezable();
+
while (!kthread_should_stop()) {
+ if (try_to_freeze())
+ continue;
+
set_current_state(TASK_INTERRUPTIBLE);
/* check once a second */
--
Patch looks okay to me, but I'm not sure if another issue is not hiding under it. Did anyone actually test ftrace + suspend after applying this? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
I just tested this patch - it didn't help ;( diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 49f4c3f..02e41b2 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -27,6 +27,7 @@ #include <linux/ctype.h> #include <linux/hash.h> #include <linux/list.h> +#include <linux/freezer.h> #include <asm/ftrace.h> --
This is needed to fix compilation, sorry for the omission. Still, did you test ftrace + suspend with the original patch and your fix applied and if you did, then what happend? Rafael --
Suspend works fine. When I hit power button I see some informations from video card (nvidia), then screen goes black and monitor says "no signal" and then I see the same informations from video card, then BIOS and GRUB. When everything works fine, X comes back after first infos from video card. Marcin --
I suspect that something under arch/x86/kernel/acpi/realmode is broken somehow. Thanks, Rafael --
Does ftrace hook itself onto _all_ the functions? Or all .c functions? I guess low-level suspend code needs to be exempt from tracing. Certainly all the assembly functions. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
It hooks into all .c functions that are not annotated with "notrace" or the files have not been marked in the Makefile like: I'm looking into that now too. Are the functions in arch/x86/power/cpu*.c the suspend to ram code? Thanks, -- Steve --
They contain code executed during suspend to RAM, but such code is also: - in all files under arch/x86/kernel/acpi/ - in main.c, console.c under kernel/power - in all files under drivers/acpi/sleep - in drivers/acpi/hardware/hwsleep.c Generally, ACPI is heavily involved and I'm not the right person to ask which of the ACPI functions should get the 'notrace' thing. Also, I'm not sure about the device drivers' ->suspend() and ->resume() callbacks, especially for sysdevs and ->suspend_late(), ->resume_early() for platform devices and PCI. Well, how exactly suspend to RAM is broken by ftrace? Rafael --
I know that the smp_processor_id may be defined in the %fs register, but if ftrace is called before the %fs is set up, it may crash because it uses smp_processor_id. -- Steve --
The wake-up code is the most interesting for you, then. It's located in arch/x86/kernel/acpi/ , but on x86-64 it also uses trampoline code in trampoline_64.S (this is assembly, though). On x86-64, wakeup_long64 in wakeup_64.S is where we get from the real-mode wake-up code, but under arch/x86/kernel/acpi/realmode there are some C files containing functions executed in real mode. Everything in there and everything referred to from there should be 'notrace' IMO. Thanks, Rafael --
I've been painstakingly debugging the issue with suspend to ram and ftraced. The 2.6.28 code does not have this issue, but since the mcount recording is not going to be in 27, this must be solved for the ftrace daemon version. The resume from suspend to ram would reboot because it was triple faulting. Debugging further, I found that calling the mcount function itself was not an issue, but it would fault when it incremented preempt_count. preempt_count is on the tasks info structure that is on the low memory address of the task's stack. For some reason, it could not write to it. Resuming out of suspend to ram does quite a lot of funny tricks to get to work, so it is not surprising at all that simply doing a preempt_disable() would cause a fault. Thanks to Rafael for suggesting to add a "while (1);" to find the place in resuming that is causing the fault. I would place the loop somewhere in the code, compile and reboot and see if it would either reboot (hit the fault) or simply hang (hit the loop). Doing this over and over again, I narrowed it down that it was happening in enable_nonboot_cpus. At this point, I found that it is easier to simply disable tracing around the suspend code, instead of searching for the particular function that can not handle doing a preempt_disable. This patch disables the tracer as it suspends and reenables it on resume. I tested this patch on my Laptop, and it can resume fine with the patch. Signed-off-by: Steven Rostedt <srostedt@redhat.com> --- kernel/power/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) Index: linux-compile.git/kernel/power/main.c =================================================================== --- linux-compile.git.orig/kernel/power/main.c 2008-08-27 08:53:11.000000000 -0400 +++ linux-compile.git/kernel/power/main.c 2008-08-27 08:53:58.000000000 -0400 @@ -21,6 +21,7 @@ #include <linux/freezer.h> #include <linux/vmstat.h> #include <linux/syscalls.h> +#include ...
Well, if that's enable_nonboot_cpus() that's faulting, I guess a similar change in the hibernation code is needed. I'll try to prepare a patch for that. Thanks, Rafael --
From: Rafael J. Wysocki <rjw@sisk.pl>
ftrace: disable tracing for hibernation
In accordance with commit f42ac38c59e0a03d6da0c24a63fb211393f484b0
("ftrace: disable tracing for suspend to ram"), disable tracing
around the suspend code in hibernation code paths.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/power/disk.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -21,6 +21,7 @@
#include <linux/console.h>
#include <linux/cpu.h>
#include <linux/freezer.h>
+#include <linux/ftrace.h>
#include "power.h"
@@ -255,7 +256,7 @@ static int create_image(int platform_mod
int hibernation_snapshot(int platform_mode)
{
- int error;
+ int error, ftrace_save;
/* Free memory before shutting down devices. */
error = swsusp_shrink_memory();
@@ -267,6 +268,7 @@ int hibernation_snapshot(int platform_mo
goto Close;
suspend_console();
+ ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_FREEZE);
if (error)
goto Recover_platform;
@@ -296,6 +298,7 @@ int hibernation_snapshot(int platform_mo
Resume_devices:
device_resume(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+ __ftrace_enabled_restore(ftrace_save);
resume_console();
Close:
platform_end(platform_mode);
@@ -366,10 +369,11 @@ static int resume_target_kernel(void)
int hibernation_restore(int platform_mode)
{
- int error;
+ int error, ftrace_save;
pm_prepare_console();
suspend_console();
+ ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_QUIESCE);
if (error)
goto Finish;
@@ -384,6 +388,7 @@ int hibernation_restore(int platform_mod
platform_restore_cleanup(platform_mode);
device_resume(PMSG_RECOVER);
Finish:
+ __ftrace_enabled_restore(ftrace_save);
resume_console();
...applied it to tip/tracing/urgent - thanks Rafael! Ingo --
Looks good to me, Thanks Rafael! Acked-by: Steven Rostedt <srostedt@redhat.com> -- Steve --
Looks ok to me. ACK. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
You can add my: Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com> Thanks! Marcin --
So this is going to be reverted after 2.6.27? Okay, then. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Maybe not. The difference is that in 27 it would cause s2ram to fail every time. In 28 (I have not proved this yet), it may fail s2ram if the tracer is on. --
--
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian< |
