Re: Time travel experiment?

Previous thread: man-pages-3.09 is released by Michael Kerrisk on Wednesday, September 10, 2008 - 10:41 am. (1 message)

Next thread: [PATCH] fastboot: fix blackfin breakage due to vmlinux.lds change by Arjan van de Ven on Wednesday, September 10, 2008 - 11:25 am. (3 messages)
To: <linux-kernel@...>
Cc: Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 10:58 am

After resuming my system today I noticed that the system time was off.
By 16 years...

$ date
Wed Jul 3 23:09:23 CEST 2024

The kernel log clearly shows the jump happening during a resume:
Sep 10 15:31:41 aragorn kernel: PM: Finishing wakeup.
Sep 10 15:31:41 aragorn kernel: Restarting tasks ... done.
Sep 10 15:31:41 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Sep 10 15:31:41 aragorn kernel: PM: Adding info for No Bus:vcs63
Sep 10 15:31:41 aragorn kernel: PM: Adding info for No Bus:vcsa63
Sep 10 15:31:41 aragorn kernel: wlan0: authenticated
Sep 10 15:31:41 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Sep 10 15:31:41 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Sep 10 15:31:41 aragorn kernel: wlan0: associated
Sep 10 15:31:41 aragorn kernel: PM: Removing info for No Bus:vcs63
Sep 10 15:31:41 aragorn kernel: PM: Removing info for No Bus:vcsa63
Jul 3 23:09:08 aragorn kernel: wlan0: deauthenticated
Jul 3 23:09:09 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Jul 3 23:09:09 aragorn kernel: wlan0: authenticated
Jul 3 23:09:09 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Jul 3 23:09:09 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Jul 3 23:09:09 aragorn kernel: wlan0: associated

I can find no other clues about the jump in my logs.

Running 2.6.27-rc6. Clock source is hpet.
--

To: Frans Pop <elendil@...>
Cc: <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 10:29 am

On Wed, 10 Sep 2008 16:58:06 +0200

Is this repeatable ?

Alan

--

To: Alan Cox <alan@...>
Cc: Frans Pop <elendil@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 4:47 pm

Or your ntp setup that's proliferating to other people's machines?

Alan, your replies usually appear to be sent before the original
email...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--

To: Alan Cox <alan@...>
Cc: <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 11:55 am

This wasn't the first suspend I've done today and I'd not seen it before.
I'll follow up if it does repeat, but I thought it was strange and
potentially important enough to at least report this occurrence. It might
ring a bell with someone.
--

To: Alan Cox <alan@...>
Cc: <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 12:25 pm

Oops. Yes, it is repeatable. Just had the same occur again.
Not sure if I can trigger it reliably though or if it will still happen
after a reboot.

The jump occurs at the same point and to almost the same time/date:
Sep 10 18:15:30 aragorn kernel: PM: Adding info for No Bus:vcs63
Sep 10 18:15:30 aragorn kernel: PM: Adding info for No Bus:vcsa63
Sep 10 18:15:30 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15
Sep 10 18:15:30 aragorn kernel: wlan0: authenticated
Sep 10 18:15:30 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Sep 10 18:15:30 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)
Sep 10 18:15:30 aragorn kernel: wlan0: associated
Sep 10 18:15:30 aragorn kernel: PM: Removing info for No Bus:vcs63
Sep 10 18:15:30 aragorn kernel: PM: Removing info for No Bus:vcsa63
Jul 3 23:09:01 aragorn kernel: wlan0: disassociating by local choice (reason=3)
Jul 3 23:09:03 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15
Jul 3 23:09:03 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1)

One other thing with this last one. I had the system on AC this time,
which means the screensaver was active. It was running at a crazy speed
(much too fast). If I start the screensaver again now it runs at normal
speed.
--

To: Frans Pop <elendil@...>
Cc: Alan Cox <alan@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 5:28 pm

Note that the time jump is 500 million seconds in the future (498977611
to be precise). I don't know if it is possible that some value running
one unit has mistakenly been used in place of another one (eg: internal
clock ticks instead of jiffies, etc...).

Willy

--

To: Alan Cox <alan@...>
Cc: <linux-kernel@...>, Thomas Gleixner <tglx@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, September 10, 2008 - 12:56 pm

Aaah, hmmm.

I'm trying to trace a problem with resume. Once in every 5-10 times resume
fails with dead display even though most times it resumes perfectly. So I
was trying the suggestion in Documentation/power/s2ram.txt and am doing
'echo 1 > /sys/power/pm_trace' before suspending.

Just now I find [1] which mentions that the trace info is saved in the
RTC, which I guess could well explain the jump. If that's what behind
this, wouldn't it make sense to mention that fact in s2ram.txt?

But I would not expect the RTC to get changed when the resume is
successful. Or does it just get updated with every resume step without a
reset to the pre-trace value on completion?

[1]https://lists.linux-foundation.org/pipermail/linux-pm/2006-June/008470.html
--

To: Frans Pop <elendil@...>
Cc: Alan Cox <alan@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 1:02 pm

Indeed. We don't know where the bug happens when tracing, so each
trace-point just writes its hash value into it. And nothing restores it,
since nothing knows that the resume is "done" (a lot of problems happen
due to X interactions much later than the 'core' resume thing).

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Alan Cox <alan@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, September 10, 2008 - 2:07 pm

Right. How about this patch then?

---
From: Frans Pop <elendil@planet.nl>

As pm_trace uses the system's hardware clock to save its magic
value, users of that option should be warned that using this debug
option will result in an incorrect system time after resume.

Signed-off-by: Frans Pop <elendil@planet.nl>

diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
index b05f512..2ebdc60 100644
--- a/Documentation/power/s2ram.txt
+++ b/Documentation/power/s2ram.txt
@@ -54,3 +54,21 @@ used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
that "radeonfb" simply cannot resume that device - it tries to set the
PLL's, and it just _hangs_. Using the regular VGA console and letting X
resume it instead works fine.
+
+NOTE
+====
+pm_trace uses the system's Real Time Clock (RTC) to save the magic number.
+Reason for this is that the RTC is the only reliably available piece of
+hardware during resume operations where a value can be set that will
+survive a reboot.
+
+Consequence is that after a resume (even if it is successful) your system
+clock will have a value corresponding to the magic mumber instead of the
+correct date/time! It is therefore advisable to use a program like ntp-date
+or rdate to reset the correct date/time from an external time source when
+using this trace option.
+
+As the clock keeps ticking it is also essential that the reboot is done
+quickly after the resume failure. The trace option does not use the seconds
+or the low order bits of the minutes of the RTC, but a too long delay will
+corrupt the magic value.
--

To: Frans Pop <elendil@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, Alan Cox <alan@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Saturday, September 13, 2008 - 7:20 am

Previous thread: man-pages-3.09 is released by Michael Kerrisk on Wednesday, September 10, 2008 - 10:41 am. (1 message)

Next thread: [PATCH] fastboot: fix blackfin breakage due to vmlinux.lds change by Arjan van de Ven on Wednesday, September 10, 2008 - 11:25 am. (3 messages)