After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3ad38a1c09861bca7e252dde7bb8259 x86: do not zap_low_mappings
in __smp_prepare_cpus. Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32. This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do. No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock. The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
I've often wondered: should git reject any commit with "bad" in the id...
Hugh,
Many, many thanks!! This fixed the my suspend/resume problem on my
X61s as well (where the X server would stop accepting keyboard and
mouse input, and subsequent to restart the X server after remote login
would fail).- Ted
--
I've tested it twice already in my P4 desktop (where the problem was
very frequent after "echo standby > /sys/power/state") and the
patch really fixed it!!--
Great, thanks a lot for sharing the problem and reporting back, Carlos.
Hugh
--
applied, thanks Hugh! You've once again proven that you are worth your
hehe :)
ob'note: would be nice to have the suspend+resume self-test/debug patch
below upstream. It programs the RTC to a 5 second sleep and resumes the
computer, which then self-wakeups afterwards.Ingo
-------------------------------->
Subject: suspend+resume self-test
From: David Brownell <david-b@pacbell.net>Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/power/Kconfig | 10 +++
kernel/power/main.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 173 insertions(+)Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -94,6 +94,16 @@ config SUSPEND
powered and thus its contents are preserved, such as the
suspend-to-RAM state (e.g. the ACPI S3 state).+config PM_TEST_SUSPEND
+ bool "Test suspend/resume and wakealarm during bootup"
+ depends on SUSPEND && PM_DEBUG && RTC_LIB=y
+ ---help---
+ This option will suspend your machine during bootup, and make
+ it wake up a few seconds later using the RTC's wakeup alarm.
+
+ You probably want to have your system's RTC driver statically
+ linked, ensuring that it's available when this test runs.
+
config SUSPEND_FREEZER
bool "Enable freezer for suspend to RAM/standby" \
if ARCH_WANTS_FREEZER_CONTROL || BROKEN
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -132,6 +132,52 @@ static inline int suspend_test(int level#ifdef CONFIG_SUSPEND
+#ifdef CONFIG_PM_TEST_SUSPEND
+
+/*
+ * We test the system suspend code by setting an RTC wakealarm a short
+ * time in the future, then suspending. Suspending the devices won't
+ * normally take long ... some systems o...
ACK. Perhaps you should send it to Andrew for a merge? Or maybe Rafael
wants to add it to his patch queue?--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Great!
If possible, we don't want to introduce any more ifdefs. Should be
better to do it openly, define low_mappings as always 0 for x86_64. We--
Glauber Costa.
"Free as in Freedom"
http://glommer.net"The less confident you are, the more serious you have to act."
--
You're right, something like that (but avoiding the __flush_tlb_all
on x86_64) would have been nicer; never mind, now it's going forward,
I'll leave it as is.A bigger improvement would be to cut out all that swapper_pg_dir
to and fro, needing a global TLB flush on all cpus for each cpu.
Just have an alternate pg_dir (often already there as swsusp_pg_dir)
to point cr3 at for the bootup (or maybe it needs to be vice versa).But that's harder to get right, and involves wider changes and much
more testing than I could afford for the bugfix. Plus I expect it's
on your radar if not already done.Hugh
--
| Alan Cox | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Adrian Bunk | Re: LSM conversion to static interface |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| Winkler, Tomas | RE: iwlwifi: fix build bug in "iwlwifi: fix LED stall" |
