Ok,
I don't think there really is anything very interesting here, but we're
hopefully whittling down the list of regressions, and fixing various
random other small issues while at it.
Some smallish MIPS updates, networking (and network driver) fixes, removal
of a long obsolete framebuffer driver, etc etc. The shortlog really tells
the story.
We should be getting close to a 2.6.21 release, so please update any
regression reports you've done,
Linus
---
Adrian Bunk (6):
[DCCP]: make dccp_write_xmit_timer() static again
9p: make struct v9fs_cached_file_operations static
drivers/spi/: fix section mismatches
drivers/eisa/pci_eisa.c:pci_eisa_init() should be init
drivers/mfd/sm501.c: fix an off-by-one
net/sunrpc/svcsock.c: fix a check
Alan Cox (2):
tty: minor merge correction
pata_pdc202xx_old: LBA48 bug
Alan Stern (1):
UHCI: Fix problem caused by lack of terminating QH
Albert Lee (5):
pdc202xx_new: Enable ATAPI DMA
libata: reorder HSM_ST_FIRST for easier decoding (take 3)
libata: Clear tf before doing request sense (take 3)
libata: Limit max sector to 128 for TORiSAN DVD drives (take 3)
libata: Limit ATAPI DMA to R/W commands only for TORiSAN DVD drives (take 3)
Alexey Dobriyan (1):
[NET]: Correct accept(2) recovery after sock_attach_fd()
Alexey Kuznetsov (1):
[NET]: Fix neighbour destructor handling.
Andi Kleen (3):
x86-64: Disable local APIC timer use on AMD systems with C1E
x86-64: Let oprofile reserve MSR on all CPUs
x86-64: Increase NMI watchdog probing timeout
Andreas Oberritter (2):
V4L/DVB (5495): Tda10086: fix DiSEqC message length
V4L/DVB (5496): Pluto2: fix incorrect TSCR register setting
Andrew Morton (4):
proc: fix linkage with CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n
revert "retries in ext3_prepare_write() violate ordering requirements"
revert "retries in ext4_prepare_write() violate ...2.6.21-rc5 is ok. 2.6.21-rc6 results in [ 14.241665] Unable to handle kernel NULL pointer dereference (address 0000000000000000) [ 14.250025] swapper[1]: Oops 11003706212352 [1] [ 14.254753] Modules linked in: [ 14.258046] [ 14.258047] Pid: 1, CPU 7, comm: swapper [ 14.264962] psr : 00001210084a6010 ifs : 8000000000000610 ip : [<a000000100495371>] Not tainted [ 14.274399] ip is at tg3_chip_reset+0xf1/0x12c0 [ 14.279124] unat: 0000000000000000 pfs : 0000000000000610 rsc : 0000000000000003 [ 14.286862] rnat: e000001005bc7d40 bsps: e000001005bc0000 pr : 68105a9195655599 [ 14.294598] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f [ 14.302338] csd : 0000000000000000 ssd : 0000000000000000 [ 14.307946] b0 : a0000001004952c0 b6 : a00000010038b2e0 b7 : a000000100486580 [ 14.315688] f6 : 1003e000000054e304351 f7 : 1003e0000000000000640 [ 14.322164] f8 : 1003e000000054e2dd251 f9 : 1003e0000000000000064 [ 14.328643] f10 : 10015e7d113fff182eec0 f11 : 1003e000000000073e88a [ 14.335116] r1 : a000000100d4be30 r2 : a000000100b68fc0 r3 : a000000100b68eb0 [ 14.342851] r8 : 0000000000000000 r9 : 0000000000000200 r10 : a00000010089d1a8 [ 14.350597] r11 : a000000100486580 r12 : e000001005bc7d70 r13 : e000001005bc0000 [ 14.358332] r14 : 0000000000000002 r15 : e000001005d08f10 r16 : e000001005d08ee0 [ 14.366072] r17 : e000001005d08748 r18 : e000001005d08758 r19 : 0000000000000000 [ 14.373815] r20 : e000001005d08748 r21 : 0000000000000000 r22 : 0000000040027401 [ 14.381557] r23 : 0000000000027401 r24 : 0000000040000000 r25 : a00000010089d2f0 [ 14.389293] r26 : a000000100b5b5c0 r27 : 0000000000000000 r28 : 0000000000000000 [ 14.397035] r29 : 0000000000000000 r30 : 0000000000000000 r31 : e000001005d08708 [ 14.404847] [ 14.404848] Call Trace: [ 14.409160] [<a000000100013900>] show_stack+0x80/0xa0 [ 14.409162] sp=e000001005bc7900 bsp=e000001005bc1120 [ ...
Sorry, I think this should fix it:
[TG3]: Fix crash during tg3_init_one().
The driver will crash when the chip has been initialized by EFI before
tg3_init_one(). In this case, the driver will call tg3_chip_reset()
before allocating consistent memory.
The bug is fixed by checking for tp->hw_status before accessing it
during tg3_chip_reset().
Signed-off-by: Michael Chan <mchan@broadcom.com>
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 0acee9f..256969e 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4834,8 +4834,10 @@ static int tg3_chip_reset(struct tg3 *tp)
* sharing or irqpoll.
*/
tp->tg3_flags |= TG3_FLAG_CHIP_RESETTING;
- tp->hw_status->status = 0;
- tp->hw_status->status_tag = 0;
+ if (tp->hw_status) {
+ tp->hw_status->status = 0;
+ tp->hw_status->status_tag = 0;
+ }
tp->last_tag = 0;
smp_mb();
synchronize_irq(tp->pdev->irq);
-
From: "Michael Chan" <mchan@broadcom.com> Applied, thanks Michael. -
FWIW, tested, no panic. Tested-by: Nishanth Aravamudan <nacc@us.ibm.com> Thanks, Nish -- Nishanth Aravamudan <nacc@us.ibm.com> IBM Linux Technology Center -
regression update for 21-rc6: 1) all s2ram and NO_HZ related things seem to be resolved on my macbook pro, also CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y don't break resume anymore. 2) However I am still having problems with +CONFIG_HIGH_RES_TIMERS=y +CONFIG_HPET=y +CONFIG_HPET_MMAP=y although the machine resumes, I've managed to get the attached oops. 3) Subject : SATA breakage on resume References : http://lkml.org/lkml/2007/3/7/233 Submitter : Thomas Gleixner <tglx@linutronix.de> Soeren Sonnenburg <kernel@nn7.de> Status : unknown I am still seeing these messages after a suspend/resume cycle (though all devices work even after multiple suspend/resume cycles) ATA: abnormal status 0x80 on port 0x000140df ata3.01: revalidation failed (errno=-2) ata3: failed to recover some devices, retrying in 5 secs ata1.00: configured for UDMA/33 ATA: abnormal status 0x7F on port 0x000140df ATA: abnormal status 0x7F on port 0x000140df ata3.01: configured for UDMA/133 So that's been a big step forward... Soeren -- For the one fact about the future of which we can be certain is that it will be utterly fantastic. -- Arthur C. Clarke, 1962
[ Added some people to the cc.. Len, Thomas, Ingo - look for the exact report on linux-kernel, but basically it's a "irq 9: nobody cared" issue with acpi_irq on irq9 ] Ok, interesting. I'd have blamed ACPI for this one (stuck IRQ9 is almost always some ACPI event that got stuck or the SCI got mis-routed and/or marked with the wrong polarity), although from your message I take it you don't get it without high-res timers? In fact, I have a theory.. Your backtrace is: [<c0119637>] smp_apic_timer_interrupt+0x57/0x90 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0104d30>] apic_timer_interrupt+0x28/0x30 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0140068>] __kfifo_put+0x8/0x90 [<c0130fe5>] on_each_cpu+0x35/0x60 [<c0143538>] clock_was_set+0x18/0x20 [<c0135cdc>] timekeeping_resume+0x7c/0xa0 [<c02aabe1>] __sysdev_resume+0x11/0x80 [<c02ab0c7>] sysdev_resume+0x47/0x80 [<c02b0b05>] device_power_up+0x5/0x10 and the thing is, I don't think we should have interrupt enabled at this point in time! I susect that the timer resume enables interrupts too early! We should be doing the whole "device_power_up()" sequence with This seems to be normal, and related to some unknown timing issue. If the thing works for you apart from the message, I'd just ignore it.. Linus -
yeah, i think you are right. timekeeping_resume() itself does not
re-enable interrupts, it's clock_was_set() that does it implicitly:
void clock_was_set(void)
{
/* Retrigger the CPU local events everywhere */
on_each_cpu(retrigger_next_event, NULL, 0, 1);
}
on_each_cpu() is safe on SMP during resume 'bootup', because we only
have a single CPU at that point, and smp_call_function() does:
spin_lock(&call_lock);
cpus = num_online_cpus() - 1;
if (!cpus) {
spin_unlock(&call_lock);
so we just return. Note that the built-in warning of smp_call_function()
does not trigger because it's done too late:
/* Can deadlock when called with interrupts disabled */
WARN_ON(irqs_disabled());
we should move this up to the head of the function. But for this bug in
question to trigger we'd have to use an UP kernel, which has this code
for on_each_cpu():
#define on_each_cpu(func,info,retry,wait) \
({ \
local_irq_disable(); \
func(info); \
local_irq_enable(); \
ouch!
the solution is this: what we want to call here in timekeeping_resume is
not clock_was_set() but retrigger_next_event() for the current CPU. The
patch below should fix it. Soeren, can you confirm that you are using a
!CONFIG_SMP kernel, and if yes, does the patch below fix the resume
problem for you?
Ingo
---------------------------->
Subject: [patch] high-res timers: UP resume fix
From: Ingo Molnar <mingo@elte.hu>
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:
[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
...hm, you seem to have a CONFIG_SMP=y kernel. I dont immediately see where we re-enable interrupts in the SMP case, but could you try my patch nevertheless? Ingo -
We do in on_each_cpu() unconditionally. I missed that. tglx -
BTW, the on_each_cpu() in clock_was_set() is unnecessary, because timekeeping_resume() is always run on one CPU. Greetings, Rafael -
yes - but that's not the only place where we do clock_was_set(), and the on_each_cpu() is necessary in every other case. So i think the right solution was the patch i did: to split the resume functionality from the clock_was_set() functionality. Ingo -
Right, I reused it and just did not notice, that interrupts are enabled unconditionally in on_each_cpu(). tglx -
I wonder if we should add BUG_ON(interrupts_enabled) just before enabling interrupts to catch similar mistakes early? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Acked-by: Thomas Gleixner <tglx@linutronix.de> -
find updated patch below - only the patch description changed: i removed
the 'UP' thing (patch has relevance on SMP too), and added Thomas' ack.
Ingo
---------------------------->
Subject: [patch] high-res timers: resume fix
From: Ingo Molnar <mingo@elte.hu>
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:
[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
[<c0135cdc>] timekeeping_resume+0x7c/0xa0
[<c02aabe1>] __sysdev_resume+0x11/0x80
[<c02ab0c7>] sysdev_resume+0x47/0x80
[<c02b0b05>] device_power_up+0x5/0x10
it turns out that on resume we mistakenly re-enable interrupts.
Do the timer retrigger only on the current CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 12 ++++++++++++
2 files changed, 15 insertions(+)
Index: linux/include/linux/hrtimer.h
===================================================================
--- linux.orig/include/linux/hrtimer.h
+++ linux/include/linux/hrtimer.h
@@ -206,6 +206,7 @@ struct hrtimer_cpu_base {
struct clock_event_device;
extern void clock_was_set(void);
+extern void hres_timers_resume(void);
extern void hrtimer_interrupt(struct clock_event_device *dev);
/*
@@ -236,6 +237,8 @@ static inline ktime_t hrtimer_cb_get_tim
*/
static inline void clock_was_set(void) { }
+static inline void hres_timers_resume(void) { }
+
/*
* In non high resolution mode the time reference is taken from
* the base softirq time variable.
Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -459,6 +459,18 @@ void ...Hm, I'm probably missing something obvious, but where is it going to be called from? Rafael -
doh! :) Find new patch below :-/ Soeren, please test this one.
Ingo
---------------------------->
Subject: [patch] high-res timers: resume fix
From: Ingo Molnar <mingo@elte.hu>
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:
[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
[<c0135cdc>] timekeeping_resume+0x7c/0xa0
[<c02aabe1>] __sysdev_resume+0x11/0x80
[<c02ab0c7>] sysdev_resume+0x47/0x80
[<c02b0b05>] device_power_up+0x5/0x10
it turns out that on resume we mistakenly re-enable interrupts.
Do the timer retrigger only on the current CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 12 ++++++++++++
kernel/timer.c | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)
Index: linux/include/linux/hrtimer.h
===================================================================
--- linux.orig/include/linux/hrtimer.h
+++ linux/include/linux/hrtimer.h
@@ -206,6 +206,7 @@ struct hrtimer_cpu_base {
struct clock_event_device;
extern void clock_was_set(void);
+extern void hres_timers_resume(void);
extern void hrtimer_interrupt(struct clock_event_device *dev);
/*
@@ -236,6 +237,8 @@ static inline ktime_t hrtimer_cb_get_tim
*/
static inline void clock_was_set(void) { }
+static inline void hres_timers_resume(void) { }
+
/*
* In non high resolution mode the time reference is taken from
* the base softirq time variable.
Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -459,6 +459,18 @@ void clock_was_set(void)
}
/*
+ * During resume ...OK, I did about 5 suspend/resume cycles with CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_HPET=y CONFIG_HPET_MMAP=y and no oops / no problem ... So I guess the fix take #3 is good :-) One not directly related to this patch (but probably all the timer stuff) I noticed with -rc6 is that it takes 10 seconds to suspend (it was ~2 seconds before) -- Sometimes, there's a moment as you're waking, when you become aware of the real world around you, but you're still dreaming. -
On Fri, 2007-04-06 at 16:04 -0700, Linus Torvalds wrote: Argh! Now after intensive use over the last 2 days, I realized that the internal harddisk works OK, but the dvd-drive did not after the 7th suspend/resume cycle - the device was suddenly gone (I could not even eject the disc I just inserted), more verbose dmesg follows: ata1: port is slow to respond, please be patient (Status 0x80) ata1: port failed to respond (30 secs, Status 0x80) ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ata1.00: qc timeout (cmd 0xa1) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0x80) ata1: port failed to respond (30 secs, Status 0x80) ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ATA: abnormal status 0x80 on port 0x000101f7 ata1.00: qc timeout (cmd 0xa1) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: limiting speed to UDMA/33:PIO3 ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0x80) ata1: port failed to respond (30 secs, Status 0x80) ATA: abnormal status 0x80 on port 0x000101f7 sage repeated 4 times ata1.00: qc timeout (cmd 0xa1) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: disabled Soeren -- For the one fact about the future of which we can be certain is that it will be utterly fantastic. -- Arthur C. Clarke, 1962 -
Hi all, This looks like a lockdep problem. 2.6.21-rc6 + hrtimers_debug.patch (from Ingo) - skge_wol_support (commit a504e64ab42bcc27074ea37405d06833ed6e0820) dropped due to swsusp problems [14016.726946] BUG: at /mnt/md0/devel/linux-git/kernel/lockdep.c:2427 check_flags() [14016.734331] [<c0105039>] show_trace_log_lvl+0x1a/0x2f [14016.739507] [<c0105720>] show_trace+0x12/0x14 [14016.743982] [<c01057d2>] dump_stack+0x16/0x18 [14016.748460] [<c013b57f>] check_flags+0x95/0x143 [14016.753106] [<c013e334>] lock_acquire+0x29/0x82 [14016.757741] [<c01369dc>] down_write+0x3a/0x54 [14016.762203] [<c0163be2>] sys_munmap+0x23/0x3f [14016.766661] [<c0104060>] syscall_call+0x7/0xb [14016.771134] ======================= [14016.774712] irq event stamp: 43076 [14016.778111] hardirqs last enabled at (43075): [<c0104189>] syscall_exit_work+0x11/0x26 [14016.786166] hardirqs last disabled at (43076): [<c0103f09>] ret_from_exception+0x9/0xc [14016.794118] softirqs last enabled at (42608): [<c012653b>] __do_softirq+0xe4/0xea [14016.801706] softirqs last disabled at (42599): [<c01069b5>] do_softirq+0x64/0xd1 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc6/git-console.log http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc6/git-config BTW. I noticed some strange fio (1.15) behavior Starting 16 processes file:io_u.c:65, assert idx < f->num_maps failed[ 1605/ 36442 kb/s] [eta 00m:32s] fio: pid=13734, got signal=11 file:io_u.c:65, assert idx < f->num_maps failed[ 10452/ 0 kb/s] [eta 00m:23s] fio: pid=13731, got signal=11 Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) -
It's definitely there, I can see it in gitweb.. Do you have some really ancient git that didn't fetch the tags automatically? Linus -
Oh, my bad. I'd tagged it, but I didn't *sign* the tag, so it was just a tag-reference (and git fetch won't fetch them by default). I replaced the v2.6.21-rc6 tag with a signed one. Do git fetch --tags to get the thing. Linus -
FWIW, this last reversion didn't do it quite right, the device-mapper was at 253 prior to this patches parent patch, and now its at 252, which is still a 'dump it all' change for both tar & dump. Until things settle, I'm going to test and probably use the instructions that Dave Dillow just sent me, which should put it at 238 regardless. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) "Looks clean and obviously correct to me, but then _everything_ I write always looks obviously correct yo me." - Linus -
I'm sitting on five patches which look like 2.6.21 material, but which would normally go through subsystem maintainers: pcmcia: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... driver core: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... netdev: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... net: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... please send acks, nacks or smacks asap, thanks. -
Feel free to forward it on with: Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> As it was just a documentation update, I figured it was safe to wait for 2.6.22, but I have no objection to it going in now. thanks, greg k-h -
It sounded this was specific to Ingo. I haven't heard anybody else ACK this one. Need to send this up, but I'm intentionally avoiding work as we are having a big Easter bash here in Raleigh. Silly bunny-related traditions that have nothing to do with Jesus take priority ;-) I have a couple other bug fixes to push, but that will wait until Tuesday. Jeff -
I'm not sure, it sounds a bit like something I saw a while ago. I would have to check for sure, I made a quick debugging patch (sent to netdev) and it went away so I think my last though was a miscompilation. -
the bug has turned into an 'interface hang under high load' (i.e. the hack patch above is not necessary, but the problem is still there). It still affects the latest forcedeth.c in -rc6. I.e. it's still an unresolved regression. The last state i'm aware of is that I have sent Ayaz ethtool output as well of the hang, as requested. Ingo -
We should not encourage using platform_device_register_simple as we want to obsolete this function. -- Dmitry -
I couldn't get suspend-to-disk to work with 2.6.21-rc6. I've tried set/unset CONFIG_NO_HZ/CONFIG_HPET_TIMER, but nothing worked. With rc5 and Maxim's patch, it worked with CONFIG_NO_HZ unset. This is on ThinkPad X60s. Jeff. -
Do you think you could busect it? You'd have to apply maxim's patch by hand at each bisection step (up until the point where it's already applied in the git tree, of course), so it's not a totally mindless bisection, but it should still be fairly painless, since there is only 277 commits between -rc5 and -rc6 (so bisection should rather quickly narrow it down) Linus -
Linus, I did that last night and realize that I could suspend to disk/ram with 2.6.21-rc6 CONFIG_NO_HZ unset. I must have done something wrong before. Thank you, Jeff. -
i just got the crash below (with slab debug enabled) on -rc6-git4. I
never saw this one before, and as you can see from the recompile count,
i've rebuilt this tree a fair number of times - and the config didnt
change much.
I promptly re-tried the same bzImage but the crash did not reoccur.
So we've got a memory corruptor of some sort in v2.6.21-to-be. I'm 100%
sure that i never saw this under any v2.6.20 variant or on any prior
kernel. The crash site corresponds to a module-refcount dec:
(gdb) list *0x00000000c013c1f4
0xc013c1f4 is in module_put (kernel/module.c:801).
796
797 void module_put(struct module *module)
798 {
799 if (module) {
800 unsigned int cpu = get_cpu();
801 local_dec(&module->ref[cpu].count);
802 /* Maybe they're waiting for us to drop reference? */
803 if (unlikely(!module_is_live(module)))
804 wake_up_process(module->waiter);
805 put_cpu();
(gdb)
NOTE: i'm still using a bzImage kernel, so there are no true modules in
the kernel. (This also makes it pretty likely that this is not a build
artifact either.)
(config and full bootlog attached.)
Ingo
---------------------->
BUG: unable to handle kernel paging request at virtual address 6b6b6ceb
printing eip:
c013c1f5
*pde = 0203000c
Oops: 0002 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c013c1f5>] Not tainted VLI
EFLAGS: 00010256 (2.6.21-rc6 #273)
EIP is at module_put+0x19/0x2d
eax: 6b6b6ceb ebx: f72fee2c ecx: c03c9b36 edx: 6b6b6b6b
esi: f7428f54 edi: 6b6b6b6b ebp: f737bf38 esp: f737bf38
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process udev (pid: 1768, ti=f737a000 task=f7488000 task.ti=f737a000)
Stack: f737bf50 c019e832 f749092c 00000010 f72feda4 f746487c f737bf78 c0167c7f
00000000 00000000 f72f6ba4 c2928d48 f72feda4 f746487c f7be81d4 00000000
f737bf80 c0167d3b f737bf98 c01658b2 ...On Thu, Apr 05, 2007 at 07:50:11PM -0700, Linus Torvalds wrote: This one breaks resume for me (from STR) on a vaio SZ. Reverting this commit allows resuming again but leaves me with some periodic and unpleasant: [ 155.232000] BUG: soft lockup detected on CPU#1! [ 155.232000] [<c0104cf2>] show_trace_log_lvl+0x1a/0x2f [ 155.232000] [<c0105344>] show_trace+0x12/0x14 [ 155.232000] [<c01053c8>] dump_stack+0x16/0x18 [ 155.232000] [<c0147240>] softlockup_tick+0xa7/0xb6 [ 155.232000] [<c01284d3>] run_local_timers+0x12/0x14 [ 155.232000] [<c012887a>] update_process_times+0x3e/0x63 [ 155.232000] [<c0137656>] tick_sched_timer+0x50/0x95 [ 155.232000] [<c01340e0>] hrtimer_interrupt+0x10b/0x18b [ 155.232000] [<c01137b7>] smp_apic_timer_interrupt+0x6c/0x7e [ 155.232000] [<c0104840>] apic_timer_interrupt+0x28/0x30 [ 155.232000] [<c0102318>] cpu_idle+0x1b/0xc7 [ 155.232000] [<c011297a>] start_secondary+0x32b/0x333 [ 155.232000] [<00000000>] run_init_process+0x3fefed10/0x19 [ 155.232000] ======================= FWIW: I hit the same BUG() in -rc5. full boot+suspend+resume log: http://oioio.altervista.org/linux/kern-2.6.21-rc6.log .config: http://oioio.altervista.org/linux/config-2.6.21-rc6-1 I'm available to test more patches or to provide other info. -- -
A couple more info (probably useless but...): - I noticed the resume problem in -rc6-mm1 but reverting the same patch there doesn't make the laptop resume again - last known succesful resuming kernel: 2.6.21-rc5-mm3 (and without hitting the BUG() above after resume) -- -
Strange,strange... First of all try to boot with clocksource=acpi_pm (I want to test whenever HPET working as clocksource is a problem) Then try to boot with hpet=disable or unset CONFIG_HPET_TIMER (This will disable hpet both as clock source and clockevent) Please send also contents of /proc/timer_list (I want to know whenever APIC timer is enabled there or not) Best regards, Maxim Levitsky -
Yes... strange. I can't reproduce the resume breakage anymore, with or without your patch. I still have the soft lockup anyway after resuming. I'll still keep trying, for now just disregard my previous mail. -- -
For me, suspend to disk works only once (has been the case for all .21-rcs IIRC, but I didn't get around to report it so far). There are some threads about an issue like this, which is supposed to be fixed by disabling CONFIG_PCI_MSI, but on my system the problem persists nonetheless. On the second suspend attempt, the last message I see is "Suspending console(s)" If I find the time, I'll try to bisect it this weekend. .config: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.21-rc6 # Fri Apr 13 23:08:52 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL ...
Does CONFIG_HPET_TIMER=n make any difference?
Does the latest -git work?
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
Still no luck with Linux melchior 2.6.21-rc6-gd791d413-dirty #4 PREEMPT Sat Apr 14 09:34:21 CEST 2007 x86_64 GNU/Linux Hmm, I just noticed that CONFIG_HPET_TIMER was forced back on after make oldconfig... Is that expected on amd64? # # Automatically generated make config: don't edit # Linux kernel version: 2.6.21-rc6 # Sat Apr 14 09:33:36 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not ...
Doesn't help. Maybe interesting: In the init=/bin/bash run, the first suspend try was without swap and thus bailed out. After swapon, the second try already hung, despite not having 'really' suspended at all on the first try. I tried it once more, with swap on the first try and got the same 'second try doesn't work' result. git-bisect so far: git-bisect start # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20 git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7 # bad: [2eb1ae149a28c1b8ade687c5fbab3c37da4c0fba] Linux 2.6.21-rc1 git-bisect bad 2eb1ae149a28c1b8ade687c5fbab3c37da4c0fba # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8 # good: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 git-bisect good 43187902cbfafe73ede0144166b741fb0f7d04e1 # good: [beda9f3a13bbb22cde92a45f230a02ef2afef6a9] kbuild: more Makefile cleanups git-bisect good beda9f3a13bbb22cde92a45f230a02ef2afef6a9 # bad: [7edc136ab688f751037a86e8a051151d7962d33f] Char: isicom, support higher rates git-bisect bad 7edc136ab688f751037a86e8a051151d7962d33f -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
Yes it is (on i386 you can disable it).
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
bisect results: git-bisect start # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20 git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7 # bad: [2eb1ae149a28c1b8ade687c5fbab3c37da4c0fba] Linux 2.6.21-rc1 git-bisect bad 2eb1ae149a28c1b8ade687c5fbab3c37da4c0fba # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8 # good: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 git-bisect good 43187902cbfafe73ede0144166b741fb0f7d04e1 # good: [beda9f3a13bbb22cde92a45f230a02ef2afef6a9] kbuild: more Makefile cleanups git-bisect good beda9f3a13bbb22cde92a45f230a02ef2afef6a9 # bad: [7edc136ab688f751037a86e8a051151d7962d33f] Char: isicom, support higher rates git-bisect bad 7edc136ab688f751037a86e8a051151d7962d33f # good: [6267276f3fdda9ad0d5ca451bdcbdf42b802d64b] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone git-bisect good 6267276f3fdda9ad0d5ca451bdcbdf42b802d64b # bad: [b4ac91a0eac36f347a509afda07e4305e931de61] uml: chan_user.h formatting fixes git-bisect bad b4ac91a0eac36f347a509afda07e4305e931de61 # bad: [bf0059b23fd2f0b304f647d87fad0aa626ecf0c0] M68KNOMMU: user ARRAY_SIZE macro when appropriate git-bisect bad bf0059b23fd2f0b304f647d87fad0aa626ecf0c0 # good: [c1725f2af89f1eda3cb9007290971b55084569a4] ARM26: Use ARRAY_SIZE macro when appropriate git-bisect good c1725f2af89f1eda3cb9007290971b55084569a4 # bad: [9b87ed790714bd3a8d492feb24f6c48f8bb59c3a] m32r: fix do_page_fault and update_mmu_cache git-bisect bad 9b87ed790714bd3a8d492feb24f6c48f8bb59c3a # bad: [d12c610e08022a1b84d6bd4412c189214d32e713] swsusp-change-code-ordering-in-userc-sanity git-bisect bad d12c610e08022a1b84d6bd4412c189214d32e713 # bad: [ed746e3b18f4df18afa3763155972c5835f284c5] swsusp: Change code ordering in disk.c git-bisect bad ed746e3b18f4df18afa3763155972c5835f284c5 # good: ...
Doesn't apply cleanly against -rc6, but fixes the problem when
reverted from -rc1.
Index: linux-2.6.21-rc1/kernel/power/disk.c
===================================================================
--- linux-2.6.21-rc1.orig/kernel/power/disk.c 2007-04-14 14:16:59.000000000 +0200
+++ linux-2.6.21-rc1/kernel/power/disk.c 2007-04-14 14:17:03.000000000 +0200
@@ -87,24 +87,52 @@
}
}
-static void unprepare_processes(void)
-{
- thaw_processes();
- pm_restore_console();
-}
-
static int prepare_processes(void)
{
int error = 0;
pm_prepare_console();
+
+ error = disable_nonboot_cpus();
+ if (error)
+ goto enable_cpus;
+
if (freeze_processes()) {
error = -EBUSY;
- unprepare_processes();
+ goto thaw;
}
+
+ if (pm_disk_mode == PM_DISK_TESTPROC) {
+ printk("swsusp debug: Waiting for 5 seconds.\n");
+ mdelay(5000);
+ goto thaw;
+ }
+
+ error = platform_prepare();
+ if (error)
+ goto thaw;
+
+ /* Free memory before shutting down devices. */
+ if (!(error = swsusp_shrink_memory()))
+ return 0;
+
+ platform_finish();
+ thaw:
+ thaw_processes();
+ enable_cpus:
+ enable_nonboot_cpus();
+ pm_restore_console();
return error;
}
+static void unprepare_processes(void)
+{
+ platform_finish();
+ thaw_processes();
+ enable_nonboot_cpus();
+ pm_restore_console();
+}
+
/**
* pm_suspend_disk - The granpappy of hibernation power management.
*
@@ -122,45 +150,29 @@
if (error)
return error;
- if (pm_disk_mode == PM_DISK_TESTPROC) {
- printk("swsusp debug: Waiting for 5 seconds.\n");
- mdelay(5000);
- goto Thaw;
- }
- /* Free memory before shutting down devices. */
- error = swsusp_shrink_memory();
- if (error)
- goto Thaw;
-
- error = platform_prepare();
- if (error)
- goto Thaw;
+ if (pm_disk_mode == PM_DISK_TESTPROC)
+ return 0;
suspend_console();
error = device_suspend(PMSG_FREEZE);
if (error) {
- printk(KERN_ERR "PM: Some devices failed to suspend\n");
- goto ...Now, this was already reported in http://lkml.org/lkml/2007/3/16/126 and I even flagged that message in my local folder, but apparently forgot to follow up on it... *sigh* -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
Unless I misunderstood something, all of the problems Maxim described in
this email are fixed for him in -rc6.
But it's quite possible that you are running into a different issue
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
Yes, it's likely. Tobias, I'm unable to reproduce the problem with your .config, but my hardware is certainly different. Which suspend mode do you use? If that's "platform", can you try to use "shutdown" or "reboot" and see if that helps? Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King -
Thanks.
Now, I suspect the problem is somehow related to the hardware, so it would help
a lot if we could identify the piece of hardware (or driver) involved.
AFAICT, your system is a non-SMP one, so we can rule out
disable/enable_nonboot_cpus(). To confirm that the problem is related to
platform_finish(), can you please apply the appended debug patch and
see if the suspend in the 'platform' mode works with it?
Also, would that be feasible for you to use 'shutdown' as a workaround in case
the source of the problem is difficult to find and/or fix?
Rafael
---
kernel/power/disk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.21-rc6/kernel/power/disk.c
===================================================================
--- linux-2.6.21-rc6.orig/kernel/power/disk.c
+++ linux-2.6.21-rc6/kernel/power/disk.c
@@ -170,8 +170,8 @@ int pm_suspend_disk(void)
if (in_suspend) {
enable_nonboot_cpus();
- platform_finish();
device_resume();
+ platform_finish();
resume_console();
pr_debug("PM: writing image.\n");
error = swsusp_write();
@@ -189,8 +189,8 @@ int pm_suspend_disk(void)
Enable_cpus:
enable_nonboot_cpus();
Resume_devices:
- platform_finish();
device_resume();
+ platform_finish();
resume_console();
Thaw:
unprepare_processes();
-
One person reporting a regression against a -rc kernel can mean
houndreds or thousands of people who will run into the same issue after
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
Well, in this particular case it is not very likely to happen. I have three x86_64 machines here with totally different chipsets/devices on which I'm not seeing anything like that and I believe we'd have more reports before if that were a common issue. That said, I'm not going to ignore it. I'll do my best to debug and fix it, if Tobias helps me. :-) Greetings, Rafael -
Yes, it's a Asus M2N-SLI-Deluxe Mainboard with a Athlon64 3200+ -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
Well, I thought it would, but it also would break some other people's systems.
That's the _real_ problem. Let's see if we can learn more.
Can you please revert it for now, apply the appended one and try to
suspend/resume twice in the 'platform' mode (it may or may not work)?
Rafael
---
kernel/power/disk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Index: linux-2.6.21-rc6/kernel/power/disk.c
===================================================================
--- linux-2.6.21-rc6.orig/kernel/power/disk.c
+++ linux-2.6.21-rc6/kernel/power/disk.c
@@ -267,12 +267,15 @@ static int software_resume(void)
error = swsusp_read();
if (error) {
swsusp_free();
- platform_finish();
goto Thaw;
}
pr_debug("PM: Preparing devices for restore.\n");
+ error = platform_prepare();
+ if (error)
+ goto Thaw;
+
suspend_console();
error = device_suspend(PMSG_PRETHAW);
if (error)
@@ -285,6 +288,7 @@ static int software_resume(void)
enable_nonboot_cpus();
Free:
swsusp_free();
+ platform_finish();
device_resume();
resume_console();
Thaw:
-
Ok. The patch doesn't apply cleanly to 2.6.21-rc6: |patching file kernel/power/disk.c |Hunk #1 FAILED at 267. |Hunk #2 succeeded at 265 (offset -23 lines). |1 out of 2 hunks FAILED -- saving rejects to file |kernel/power/disk.c.rej wiggle helps, seems the first part of Hunk #1 is already applied in 2.6.21-rc6. With CONFIG_PM_DEBUG=y and CONFIG_DISABLE_CONSOLE_SUSPEND=y I see that the second suspend hangs at "i8042 i8042: EARLY resume". This is kinda interesting because I'm normally using a USB keyboard and sure enough, if I hook up a normal keyboard and disable USB legacy support in the BIOS, then suspend to disk works multiple times. I'd still rather like to use my USB keyboard though. ;) -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
And I can now confirm that unpatched 2.6.21-rc6 works fine as long as USB legacy support is disabled (however without legacy support I can't use the USB keyboard to control grub). -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
Well, I think that when you're using the USB keyboard and the USB legacy support, the i8042 driver thinks it has a keyboard to handle and tries to handle it during the suspend, which fails. I don't know why it fails during the second suspend, though. I think using the 'shutdown' mode of suspend would be better. There's a little point in using 'platform' on desktop systems anyway. Frankly, I don't know what to do about it. If we move platform_finish() after device_resume(), some systems may be broken and I think there are more such systems than there are systems that set USB legacy support in the BIOS and have no PS/2 keyboards attached. Pavel, what do you think? Rafael -
This is wierd as i8042 does not use suspend_late/resume_early hooks and so it is impossible for it to hang there. None of input drivers use these I would say that every box that does not use PS/2 keyboard does this. IOW every box with USB keyboard has legacy emulation turned on so quite few of them... -- Dmitry -
Yes. Tobias, can you please post the dmesg output from after a successful Quite some people I know use USB keyboards with notebooks, but in these cases the PS/2 keyboard is still attached (except for notebooks in which the built-in I have such a machine nearby, so I'll see if I can reproduce the problem. Greetings, Rafael -
Here you go: [ 0.000000] Linux version 2.6.21-rc6 (ranma@melchior) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #16 PREEMPT Sun Apr 15 09:39:32 CEST 2007 [ 0.000000] Command line: root=/dev/sda5 resume=/dev/sda6 vga=6 apic=verbose ro [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable) [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000003fee0000 (usable) [ 0.000000] BIOS-e820: 000000003fee0000 - 000000003fee3000 (ACPI NVS) [ 0.000000] BIOS-e820: 000000003fee3000 - 000000003fef0000 (ACPI data) [ 0.000000] BIOS-e820: 000000003fef0000 - 000000003ff00000 (reserved) [ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved) [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) [ 0.000000] Entering add_active_range(0, 0, 159) 0 entries of 256 used [ 0.000000] Entering add_active_range(0, 256, 261856) 1 entries of 256 used [ 0.000000] end_pfn_map = 1048576 [ 0.000000] DMI 2.4 present. [ 0.000000] ACPI: RSDP 000F7B80, 0024 (r2 Nvidia) [ 0.000000] ACPI: XSDT 3FEE30C0, 004C (r1 Nvidia ASUSACPI 42302E31 AWRD 0) [ 0.000000] ACPI: FACP 3FEEC540, 00F4 (r3 Nvidia ASUSACPI 42302E31 AWRD 0) [ 0.000000] ACPI: DSDT 3FEE3240, 92AD (r1 NVIDIA AWRDACPI 1000 MSFT 3000000) [ 0.000000] ACPI: FACS 3FEE0000, 0040 [ 0.000000] ACPI: SSDT 3FEEC740, 00F4 (r1 PTLTD POWERNOW 1 LTP 1) [ 0.000000] ACPI: HPET 3FEEC880, 0038 (r1 Nvidia ASUSACPI 42302E31 AWRD 98) [ 0.000000] ACPI: MCFG 3FEEC900, 003C (r1 Nvidia ASUSACPI 42302E31 AWRD 0) [ 0.000000] ACPI: APIC 3FEEC680, 007C (r1 Nvidia ASUSACPI 42302E31 AWRD 0) [ 0.000000] Entering add_active_range(0, 0, 159) 0 entries of 256 used [ 0.000000] Entering add_active_range(0, 256, 261856) 1 ...
Thanks.
[--snip--]
Hmm, it looks like i8042 is the last thing on the dpm_off_irq list. Still,
if the ACPI resume fails, the next messages may not make it to the console
(it's not very probable, though).
I've tried to reproduce your problem on another box on which I have no PS/2
keyboard (USB keyboard/mouse only) and the USB legacy support set, but I can't.
There must be something very special in your configuration.
Have you tried the patch that I posted some time ago (appended again for
convenience)?
Rafael
drivers/input/serio/i8042.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6.21-rc6/drivers/input/serio/i8042.c
===================================================================
--- linux-2.6.21-rc6.orig/drivers/input/serio/i8042.c 2007-04-07 12:15:19.000000000 +0200
+++ linux-2.6.21-rc6/drivers/input/serio/i8042.c 2007-04-15 18:30:01.000000000 +0200
@@ -846,7 +846,8 @@ static long i8042_panic_blink(long count
static int i8042_suspend(struct platform_device *dev, pm_message_t state)
{
if (dev->dev.power.power_state.event != state.event) {
- if (state.event == PM_EVENT_SUSPEND)
+ if (state.event == PM_EVENT_SUSPEND
+ || state.event == PM_EVENT_PRETHAW)
i8042_controller_reset();
dev->dev.power.power_state = state;
-
And NVidia southbridge, so OHCI not UHCI (plus EHCI) ... one experiment would be to disable the EHCI (high speed USB) support in BIOS, to make for a simpler hardware configuration, and see if that makes BIOS happier. (Or better, just take EHCI out of your Linux config.) Likewise, taking the 8042 drivers out of Linux. I wouldn't be surprised if those factors didn't matter, but it'd be good The "legacy" support in at least some cases involves BIOS having a small USB stack -- enough to handle a keyboard or mouse in "boot mode" (plus sometimes a USB disk or CDROM) -- and poking the i8042 chip to act as if *IT* received the data bytes that really came over USB. I sure don't know the ins-and-outs of such schemes (ISTR there are others), but my guess is that either the 8042 or OHCI got confused, at least in conjunction with the lowlevel magic ACPI was doing. What I'm curious about is exactly why the patch matters. What ACPI magic is being invoked to confuse, or unconfuse, those controllers? - Dave -
Well, my theory is the following:
Without the patch, platform_finish() runs before the i8042's .resume() which is
done as though a real keyboard were present, but the ACPI magic is not done
and this confuses the heck out of the controller. Still, it doesn't go mad at
this point just yet (it probably isn't fully functional either, although we
don't see that, because it's not really used), but next, during the subsequent
suspend, it gets poked while device_power_up() is running and goes belly
I think the patch helps, because it makes the ACPI magic be done while the
i8042's .resume() is being executed.
Which makes me think the following patch might help:
drivers/input/serio/i8042.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6.21-rc6/drivers/input/serio/i8042.c
===================================================================
--- linux-2.6.21-rc6.orig/drivers/input/serio/i8042.c 2007-04-07 12:15:19.000000000 +0200
+++ linux-2.6.21-rc6/drivers/input/serio/i8042.c 2007-04-15 18:30:01.000000000 +0200
@@ -846,7 +846,8 @@ static long i8042_panic_blink(long count
static int i8042_suspend(struct platform_device *dev, pm_message_t state)
{
if (dev->dev.power.power_state.event != state.event) {
- if (state.event == PM_EVENT_SUSPEND)
+ if (state.event == PM_EVENT_SUSPEND
+ || state.event == PM_EVENT_PRETHAW)
i8042_controller_reset();
dev->dev.power.power_state = state;
-
Yeah, lack of PRETHAW support could be an issue. As you may recall, it was added because otherwise statically linked USB host controllers came up under the mistaken belief that they were getting a real resume event rather than a restart-after-power-off ... and there needed to be a way to force a hard reset. Seems like a similar issue here. -
-- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
Well, this means i8042 can be ruled out, so the problem probably is related to the ACPI resume which makes it _much_ more difficult to debug. Can you compile the ACPI drivers: processor, thermal, fan, battery, etc. as modules, boot the kernel with init=/bin/bash and see if the problem is still present (please keep CONFIG_SERIO_I8042 unset just in case)? Rafael -
I first tried it with acpi+cpufreq completely disabled (works). Then I tried it with acpi enabled, but everything as modules and those not loaded (init=/bin/bash, hangs at second suspend). # # Automatically generated make config: don't edit # Linux kernel version: 2.6.21-rc7 # Sun Apr 22 09:26:07 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # ...
Have you tried with ACPI and without cpufreq? Rafael -
Yes, the second one was with ACPI enabled and cpufreq disabled (CONFIG_X86_ACPI_CPUFREQ is not set). -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : ali_pata: boot from CD fails References : http://lkml.org/lkml/2007/3/31/160 Submitter : Stephen Clark <Stephen.Clark@seclark.us> Status : unknown Subject : kernels fail to boot with drives on ATIIXP controller (ACPI/IRQ related) References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621 http://lkml.org/lkml/2007/3/4/257 Submitter : Michal Jaegermann <michal@ellpspace.math.ualberta.ca> Status : unknown Subject : boot failure: rtl8139: exception in interrupt routine References : http://lkml.org/lkml/2007/3/31/160 Submitter : Stephen Clark <Stephen.Clark@seclark.us> Status : unknown Subject : laptops with e1000: lockups References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229603 Submitter : Dave Jones <davej@redhat.com> Handled-By : Jesse Brandeburg <jesse.brandeburg@intel.com> Status : problem is being debugged Subject : forcedeth: interface hangs under load References : http://lkml.org/lkml/2007/4/3/39 Submitter : Ingo Molnar <mingo@elte.hu> Handled-By : Ingo Molnar <mingo@elte.hu> Ayaz Abdulla <aabdulla@nvidia.com> Status : problem is being debugged -
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : suspend to disk works only once References : http://lkml.org/lkml/2007/4/13/240 Submitter : Tobias Diedrich <ranma+kernel@tdiedrich.de> Status : unknown Subject : ThinkPad X60: resume no longer works (PCI related?) workaround: booting with "hpet=disable" References : http://lkml.org/lkml/2007/3/13/3 Submitter : Dave Jones <davej@redhat.com> Jeremy Fitzhardinge <jeremy@goop.org> Caused-By : PCI merge commit 78149df6d565c36675463352d0bfe0000b02b7a7 Handled-By : Eric W. Biederman <ebiederm@xmission.com> Rafael J. Wysocki <rjw@sisk.pl> Status : problem is being debugged Subject : Suspend to RAM doesn't work anymore (ACPI?) References : http://lkml.org/lkml/2007/3/19/128 http://bugzilla.kernel.org/show_bug.cgi?id=8247 Submitter : Tobias Doerffel <tobias.doerffel@gmail.com> Handled-By : Rafael J. Wysocki <rjw@sisk.pl> Len Brown <len.brown@intel.com> Status : problem is being debugged Subject : resume from RAM corrupts vesafb console References : http://lkml.org/lkml/2007/3/26/76 Submitter : Marcus Better <marcus@better.se> Handled-By : Pavel Machek <pavel@ucw.cz> Status : problem is being debugged Subject : suspend to disk hangs (CONFIG_NO_HZ) References : http://lkml.org/lkml/2007/3/25/217 Submitter : Jeff Chua <jeff.chua.linux@gmail.com> Status : unknown -
Hi Marcus, A screen with blinking green blocks implies that your display is in text mode, not in graphics mode. I don't know what options you are using, but have you tried using: acpi_sleep=s3_mode If the above does not work, also try acpi_sleep=s3_bios,s3_mode If it is still not working, you can add this to your suspend script: vbetool vbemode set <VESA mode ID> where VESA mode ID = "vga=" value - 512 (0x200) Tony PS: If your BIOS setup has an option to re-POST the graphics card on resume, that is a big help. Tony -
Will try, but I'm using "s2ram -f -a3" which should mean precisely the abov= e=20 IIUC. Marcus
Just for clarification, do you suspend from VESA framebuffer console or from VGA text console? If from the latter, that's actually worse from the user's point of view, but I can modify vgacon so that it saves its Okay. Tony -
Have you tried other combinations? s2ram -m -p -f s2ram -s -p -f Tony -
Yes, I tried these slightly different combinations: s2ram -f -a3 -s: Works! The screen becomes green but is restored quickly. I= t=20 prints the following messages: Allocated buffer at 0x11000 (base is 0x0) ES: 0x1100 EBX: 0x0000 Save video state failed Calling restore_state_from =46unction not supported? Restore video state failed Switching back to vt1 s2ram -f -a3 -p: Screen goes green and then blank. Everything hangs, doesn'= t=20 react to keyboard input. s2ram -f -a3 -m: Works! (Tested with 2.6.21-rc7.) Thanks, Marcus
Thanks. Should we consider this regression resolved? There is really nothing much vesafb can do to restore its previous state, except through the use of userland tools. Tony -
Uhuh. This is second report of this strangeness. On thinkpad r60, -a3 used to work, and now it needs more options. Can you locate patch causing this? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Already fixed in rc5-git9, see http://bugzilla.kernel.org/show_bug.cgi?id=8247 Tobias
On Sat, Apr 14, 2007 at 02:38:08AM +0200, Adrian Bunk wrote: > Subject : ThinkPad X60: resume no longer works (PCI related?) > workaround: booting with "hpet=disable" > References : http://lkml.org/lkml/2007/3/13/3 > Submitter : Dave Jones <davej@redhat.com> > Jeremy Fitzhardinge <jeremy@goop.org> > Caused-By : PCI merge > commit 78149df6d565c36675463352d0bfe0000b02b7a7 > Handled-By : Eric W. Biederman <ebiederm@xmission.com> > Rafael J. Wysocki <rjw@sisk.pl> > Status : problem is being debugged I'm at a loss on this one. git bisect was non-conclusive. I even tried beating up on Eric's console-over-usb to try and get more useful info, but I failed miserably. Dave -- http://www.codemonkey.org.uk -
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : snd_hda_intel doesn't work with ASUS M2V mainboard References : http://bugzilla.kernel.org/show_bug.cgi?id=8273 Submitter : Hans-Georg Rist <hg.rist@web.de> Status : unknown Subject : snd_intel8x0: divide error: 0000 References : http://lkml.org/lkml/2007/3/5/252 Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Status : unknown Subject : hal daemon crashes after pulling a USB serial device References : http://www.opensubscriber.com/message/linux-usb-devel@lists.sourceforge.net/6369800.html Submitter : Andi Kleen <ak@suse.de> Handled-By : Oliver Neukum <oneukum@suse.de> Status : problem is being debugged Subject : USB: iPod doesn't work (CONFIG_USB_SUSPEND) References : http://lkml.org/lkml/2007/3/21/320 Submitter : Tino Keitel <tino.keitel@gmx.de> Caused-By : Marcelo Tosatti <marcelo@kvack.org> commit 1d619f128ba911cd3e6d6ad3475f146eb92f5c27 Handled-By : Oliver Neukum <oneukum@suse.de> Status : problem is being debuggged Subject : USB: Oops when changing DVB-T adapter References : http://lkml.org/lkml/2007/3/9/212 Submitter : CIJOML <cijoml@volny.cz> Handled-By : Markus Rechberger <markus.rechberger@amd.com> Patch : http://lkml.org/lkml/2007/4/5/154 Status : patches available -
