Re: [patch] Document use of RTC in pm_trace

Previous thread: man-pages-3.09 is released by Michael Kerrisk on Wednesday, September 10, 2008 - 7:41 am. (1 message)

Next thread: [PATCH] fastboot: fix blackfin breakage due to vmlinux.lds change by Arjan van de Ven on Wednesday, September 10, 2008 - 8:25 am. (3 messages)
From: Frans Pop
Date: Wednesday, September 10, 2008 - 7:58 am

After resuming my system today I noticed that the system time was off.
By 16 years...

$ date
Wed Jul  3 23:09:23 CEST 2024

The kernel log clearly shows the jump happening during a resume:
Sep 10 15:31:41 aragorn kernel: PM: Finishing wakeup.
Sep 10 15:31:41 aragorn kernel: Restarting tasks ... done.
Sep 10 15:31:41 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Sep 10 15:31:41 aragorn kernel: PM: Adding info for No Bus:vcs63
Sep 10 15:31:41 aragorn kernel: PM: Adding info for No Bus:vcsa63
Sep 10 15:31:41 aragorn kernel: wlan0: authenticated
Sep 10 15:31:41 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Sep 10 15:31:41 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Sep 10 15:31:41 aragorn kernel: wlan0: associated
Sep 10 15:31:41 aragorn kernel: PM: Removing info for No Bus:vcs63
Sep 10 15:31:41 aragorn kernel: PM: Removing info for No Bus:vcsa63
Jul  3 23:09:08 aragorn kernel: wlan0: deauthenticated
Jul  3 23:09:09 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Jul  3 23:09:09 aragorn kernel: wlan0: authenticated
Jul  3 23:09:09 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Jul  3 23:09:09 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Jul  3 23:09:09 aragorn kernel: wlan0: associated

I can find no other clues about the jump in my logs.

Running 2.6.27-rc6. Clock source is hpet.
--

From: Alan Cox
Date: Wednesday, September 10, 2008 - 7:29 am

On Wed, 10 Sep 2008 16:58:06 +0200


Is this repeatable ?

Alan

--

From: Frans Pop
Date: Wednesday, September 10, 2008 - 8:55 am

This wasn't the first suspend I've done today and I'd not seen it before.
I'll follow up if it does repeat, but I thought it was strange and 
potentially important enough to at least report this occurrence. It might 
ring a bell with someone.
--

From: Frans Pop
Date: Wednesday, September 10, 2008 - 9:25 am

Oops. Yes, it is repeatable. Just had the same occur again.
Not sure if I can trigger it reliably though or if it will still happen
after a reboot.

The jump occurs at the same point and to almost the same time/date:
Sep 10 18:15:30 aragorn kernel: PM: Adding info for No Bus:vcs63
Sep 10 18:15:30 aragorn kernel: PM: Adding info for No Bus:vcsa63
Sep 10 18:15:30 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Sep 10 18:15:30 aragorn kernel: wlan0: authenticated
Sep 10 18:15:30 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Sep 10 18:15:30 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Sep 10 18:15:30 aragorn kernel: wlan0: associated
Sep 10 18:15:30 aragorn kernel: PM: Removing info for No Bus:vcs63
Sep 10 18:15:30 aragorn kernel: PM: Removing info for No Bus:vcsa63
Jul  3 23:09:01 aragorn kernel: wlan0: disassociating by local choice (reason=3)
Jul  3 23:09:03 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Jul  3 23:09:03 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)

One other thing with this last one. I had the system on AC this time,
which means the screensaver was active. It was running at a crazy speed
(much too fast). If I start the screensaver again now it runs at normal
speed.
--

From: Frans Pop
Date: Wednesday, September 10, 2008 - 9:56 am

Aaah, hmmm.

I'm trying to trace a problem with resume. Once in every 5-10 times resume 
fails with dead display even though most times it resumes perfectly. So I 
was trying the suggestion in Documentation/power/s2ram.txt and am doing
'echo 1 > /sys/power/pm_trace' before suspending.

Just now I find [1] which mentions that the trace info is saved in the 
RTC, which I guess could well explain the jump. If that's what behind 
this, wouldn't it make sense to mention that fact in s2ram.txt?

But I would not expect the RTC to get changed when the resume is 
successful. Or does it just get updated with every resume step without a 
reset to the pre-trace value on completion?

[1]https://lists.linux-foundation.org/pipermail/linux-pm/2006-June/008470.html
--

From: Linus Torvalds
Date: Wednesday, September 10, 2008 - 10:02 am

Indeed. We don't know where the bug happens when tracing, so each 
trace-point just writes its hash value into it. And nothing restores it, 
since nothing knows that the resume is "done" (a lot of problems happen 
due to X interactions much later than the 'core' resume thing).

		Linus
--

From: Frans Pop
Date: Wednesday, September 10, 2008 - 11:07 am

Right. How about this patch then?

---
From: Frans Pop <elendil@planet.nl>

As pm_trace uses the system's hardware clock to save its magic
value, users of that option should be warned that using this debug
option will result in an incorrect system time after resume.
    
Signed-off-by: Frans Pop <elendil@planet.nl>

diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
index b05f512..2ebdc60 100644
--- a/Documentation/power/s2ram.txt
+++ b/Documentation/power/s2ram.txt
@@ -54,3 +54,21 @@ used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
 that "radeonfb" simply cannot resume that device - it tries to set the
 PLL's, and it just _hangs_. Using the regular VGA console and letting X
 resume it instead works fine.
+
+NOTE
+====
+pm_trace uses the system's Real Time Clock (RTC) to save the magic number.
+Reason for this is that the RTC is the only reliably available piece of
+hardware during resume operations where a value can be set that will
+survive a reboot.
+
+Consequence is that after a resume (even if it is successful) your system
+clock will have a value corresponding to the magic mumber instead of the
+correct date/time! It is therefore advisable to use a program like ntp-date
+or rdate to reset the correct date/time from an external time source when
+using this trace option.
+
+As the clock keeps ticking it is also essential that the reboot is done
+quickly after the resume failure. The trace option does not use the seconds
+or the low order bits of the minutes of the RTC, but a too long delay will
+corrupt the magic value.
--

From: Willy Tarreau
Date: Wednesday, September 10, 2008 - 2:28 pm

Note that the time jump is 500 million seconds in the future (498977611
to be precise). I don't know if it is possible that some value running
one unit has mistakenly been used in place of another one (eg: internal
clock ticks instead of jiffies, etc...).

Willy

--

From: Geert Uytterhoeven
Date: Wednesday, September 10, 2008 - 1:47 pm

Or your ntp setup that's proliferating to other people's machines?

Alan, your replies usually appear to be sent before the original
email...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds
--

Previous thread: man-pages-3.09 is released by Michael Kerrisk on Wednesday, September 10, 2008 - 7:41 am. (1 message)

Next thread: [PATCH] fastboot: fix blackfin breakage due to vmlinux.lds change by Arjan van de Ven on Wednesday, September 10, 2008 - 8:25 am. (3 messages)