Oh well.. I'm not very proud of this, because quite frankly, -rc2 has way more changes than I really like. And yeah, it's largely my fault, because I simply missed a V4L/DVB merge that came in before the merge window closed, but since I didn't notice it didn't make -rc1, and as such it got merged late and is in -rc2 instead. But because I'll flail around wildly and rather blame anything else than my own incompetence, I'll just claim that all the other kernel developers have been irresponsible, and caused -rc2 to be bigger than needed. In some areas (you know who you are) it may even be true.. Apart from the V4L/DVB merge, we've got a late PARISC update, and a number of driver updates (ata, networking, usb) changes. Along with the normal smattering of random stuff (core networking, selinux, infiniband, agp, mips, arm). Anyway, I really hope the thing starts calming down now, and everybody should take a hard look at the regressions lists that Adrian has started sending out. We already fixed some of them, but there is more to go.. Thanks, Linus -
I got this compile error: CC arch/i386/kernel/io_apic.o arch/i386/kernel/io_apic.c: In function 'setup_IO_APIC_irqs': arch/i386/kernel/io_apic.c:1357: error: 'struct irq_desc' has no member named 'affinity' arch/i386/kernel/io_apic.c: In function 'io_apic_set_pci_routing': arch/i386/kernel/io_apic.c:2878: error: 'struct irq_desc' has no member named 'affinity' make[1]: *** [arch/i386/kernel/io_apic.o] Error 1 make: *** [arch/i386/kernel] Error 2 didn't happen with rc1 - David Brown -
Ingo's patch correcting a bug in the SMT scheduler (http://lkml.org/lkml/2007/2/26/103) seems to have been missed. It is quite important when using dynticks. -- Damien Wyart -
Hi Linus,
rc2 fails to build on my thinkpad t43:
CC arch/i386/kernel/io_apic.o
arch/i386/kernel/io_apic.c: In function 'setup_IO_APIC_irqs':
arch/i386/kernel/io_apic.c:1357: error: 'struct irq_desc' has no member
named 'affinity'
arch/i386/kernel/io_apic.c: In function 'io_apic_set_pci_routing':
arch/i386/kernel/io_apic.c:2878: error: 'struct irq_desc' has no member
named 'affinity'
make[1]: *** [arch/i386/kernel/io_apic.o] Error 1
make: *** [arch/i386/kernel] Error 2
The problem is caused by affinity being within #ifdef SMP in struct
irq_desc in irq.h:
#ifdef CONFIG_SMP
cpumask_t affinity;
unsigned int cpu;
#endif
I don't know whether the whole functions or just a single line should be
placed under #ifdef CONFIG_SMP. Eric (in CC) will know better than I do.
My config is attached.
BriceYes. I goofed, and missed that stupid case. The offending lines should just die. Patch already sent to Linus. Eric -
Could the patch be posted? or could I see a git commit so I can get it myself? Thanks, David Brown -
I'm attaching it below. It hit the git commits mailing list just a few minutes ago. --- ~Randy
I got this warning so far :) drivers/video/Kconfig:1622:warning: 'select' used by config symbol 'FB_PS3' refer to undefined symbol 'PS3_PS3AV' Regards, Gabriel -
This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : sparc64 compile error due to GENERIC_ISA_DMA removal References : http://bugzilla.kernel.org/show_bug.cgi?id=8097 Submitter : Horst H. von Brand <vonbrand@inf.utfsm.cl> Caused-By : David S. Miller <davem@sunset.davemloft.net> commit 1b51d3a08b6c80a1e47d4c579c41abbe56cd3c44 Status : unknown Subject : mmc reader no longer works References : http://lkml.org/lkml/2007/2/27/91 Submitter : Pavel Machek <pavel@ucw.cz> Caused-By : Oliver Neukum <oneukum@suse.de> Status : problem is being debugged Subject : usb-serial broken (ftdi serial device shows up as ttyUSB140 instead of ttyUSB0) Submitter : Craig Schlenter <craig@codefountain.com> Caused-By : Oliver Neukum <oneukum@suse.de> commit 34ef50e5b1f96c2d8c0f3d28b7d407743806256c Handled-By : Oliver Neukum <oneukum@suse.de> Status : patch available Subject : Oops in rtc_cmos References : http://lkml.org/lkml/2007/3/4/112 http://lkml.org/lkml/2007/2/18/172 Submitter : Paul Rolland <rol@as2917.net> Rafael J. Wysocki <rjw@sisk.pl> Caused-By : David Brownell <david-b@pacbell.net> commit 7be2c7c96aff2871240d61fef508c41176c688b5 Patch : http://lkml.org/lkml/2007/2/23/184 Status : patch available -
Patch is queued up in my tree and will go to Linus in a few days. But I think there is another usb-serial patch that Oliver needs to send me to fix another problem with the usb-serial core... thanks, greg k-h -
From: Adrian Bunk <bunk@stusta.de>
Fixed in current GIT.
commit 74bd7d093b8e87f35eaf3b14459b96a0e20d1d10
Author: David S. Miller <davem@sunset.davemloft.net>
Date: Wed Feb 28 13:09:34 2007 -0800
[SPARC64]: Fix parport_pc build.
Signed-off-by: David S. Miller <davem@davemloft.net>
-Horst's problem is with the floppy driver and
claim_dma_lock/release_dma_lock in include/asm-sparc64/dma.h .
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-From: Adrian Bunk <bunk@stusta.de>
Here is the fix I will send to Linus for this, thanks:
commit 08414aa2516da65ae7a522c6834b8ea576f38c4b
Author: David S. Miller <davem@sunset.davemloft.net>
Date: Sun Mar 4 20:36:18 2007 -0800
[SPARC64]: Fix floppy build failure.
Just define a local {claim,release}_dma_lock() implementation
for the floppy driver to use so we don't need to define and
export to modules the silly dma_spin_lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/include/asm-sparc64/dma.h b/include/asm-sparc64/dma.h
index 1bf4f7a..a9fd061 100644
--- a/include/asm-sparc64/dma.h
+++ b/include/asm-sparc64/dma.h
@@ -15,17 +15,6 @@
#include <asm/delay.h>
#include <asm/oplib.h>
-extern spinlock_t dma_spin_lock;
-
-#define claim_dma_lock() \
-({ unsigned long flags; \
- spin_lock_irqsave(&dma_spin_lock, flags); \
- flags; \
-})
-
-#define release_dma_lock(__flags) \
- spin_unlock_irqrestore(&dma_spin_lock, __flags);
-
/* These are irrelevant for Sparc DMA, but we leave it in so that
* things can compile.
*/
diff --git a/include/asm-sparc64/floppy.h b/include/asm-sparc64/floppy.h
index dbe033e..331013a 100644
--- a/include/asm-sparc64/floppy.h
+++ b/include/asm-sparc64/floppy.h
@@ -854,4 +854,15 @@ static unsigned long __init sun_floppy_init(void)
#define EXTRA_FLOPPY_PARAMS
+static DEFINE_SPINLOCK(dma_spin_lock);
+
+#define claim_dma_lock() \
+({ unsigned long flags; \
+ spin_lock_irqsave(&dma_spin_lock, flags); \
+ flags; \
+})
+
+#define release_dma_lock(__flags) \
+ spin_unlock_irqrestore(&dma_spin_lock, __flags);
+
#endif /* !(__ASM_SPARC64_FLOPPY_H) */
-From: Adrian Bunk <bunk@stusta.de> Thanks for the clarification, I was thinking of the parport problem reported by Meelis Roos. I'll look into this. -
This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : soft lockup detected on CPU#0 References : http://lkml.org/lkml/2007/3/3/152 Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Handled-By : Thomas Gleixner <tglx@linutronix.de> Status : unknown Subject : dynticks makes ksoftirqd1 use unreasonable amount of cpu time References : http://bugzilla.kernel.org/show_bug.cgi?id=8100 Submitter : Emil Karlson <jkarlson@cc.hut.fi> Handled-By : Thomas Gleixner <tglx@linutronix.de> Status : problem is being debugged Subject : ThinkPad T60: system doesn't come out of suspend to RAM (CONFIG_NO_HZ) References : http://lkml.org/lkml/2007/2/22/391 Submitter : Michael S. Tsirkin <mst@mellanox.co.il> Handled-By : Thomas Gleixner <tglx@linutronix.de> Ingo Molnar <mingo@elte.hu> Status : unknown Subject : macbook pro suspend to ram broken (clockevents) References : http://lkml.org/lkml/2007/3/4/110 Submitter : Soeren Sonnenburg <kernel@nn7.de> Caused-By : Thomas Gleixner <tglx@linutronix.de> commit e9e2cdb412412326c4827fc78ba27f410d837e6e Status : unknown Subject : i386: no boot with nmi_watchdog=1 (clockevents) References : http://lkml.org/lkml/2007/2/21/208 Submitter : Daniel Walker <dwalker@mvista.com> Caused-By : Thomas Gleixner <tglx@linutronix.de> commit e9e2cdb412412326c4827fc78ba27f410d837e6e Handled-By : Thomas Gleixner <tglx@linutronix.de> Status : problem is being debugged -
FYI, this is not a "wont boot" problem, this should be a "NMI watchdog does not work" problem - which has far lower severity. Also, Thomas did a fix for this which is now in -mm. Ingo -
If a system normally runs a watchdog, and some do, then nmi would be forced on by grub.comf and the system would not boot. And if the system was counting on nmi to look for a hanging problem, "nmi does not work" would be a real problem if the failure was silent. Actually, a lack of nmi would be worse than not booting, it would be a time bomb waiting for a bad moment to hang. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot -
uhm, what? The NMI watchdog is totally optional. If it doesnt work, we print some stuff and just continue with the bootup. Ingo -
Thanks for your corrections, it's now: Subject : i386: NMI watchdog does not work (clockevents) References : http://lkml.org/lkml/2007/2/21/208 Submitter : Daniel Walker <dwalker@mvista.com> Caused-By : Thomas Gleixner <tglx@linutronix.de> commit e9e2cdb412412326c4827fc78ba27f410d837e6e Fixed-By : Thomas Gleixner <tglx@linutronix.de> Commit : a5f5e43e2b1377392f9afe93aca29b9abf1d6a44 cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed -
yup, you should be able to cross this one off, Adrian. The fix worked for me, at least. -
I didn't see a fix for this one go by .. I'll check the usual places I guess .. Daniel -
find it below.
Ingo
------------------------------------------------------
Subject: fix "NMI appears to be stuck"
From: Thomas Gleixner <tglx@linutronix.de>
Testing NMI watchdog ... CPU#0: NMI appears to be stuck (54->54)!
CPU#1: NMI appears to be stuck (0->0)!
Keep the PIT/HPET alive when nmi_watchdog = 1 is given on the command
line.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/i386/kernel/apic.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff -puN arch/i386/kernel/apic.c~fix-nmi-appears-to-be-stuck arch/i386/kernel/apic.c
--- a/arch/i386/kernel/apic.c~fix-nmi-appears-to-be-stuck
+++ a/arch/i386/kernel/apic.c
@@ -493,8 +493,15 @@ void __init setup_boot_APIC_clock(void)
/* No broadcast on UP ! */
if (num_possible_cpus() == 1)
return;
- } else
- lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+ } else {
+ /*
+ * If nmi_watchdog is set to IO_APIC, we need the
+ * PIT/HPET going. So we register lapic as a dummy
+ * device.
+ */
+ if (nmi_watchdog != NMI_IO_APIC)
+ lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+ }
/* Setup the lapic or request the broadcast */
setup_APIC_timer();
-This fix gets the NMI to tick like the timer (in /proc/interrupts) ..
However, now that it's ticking I don't think it's actually working
properly ..
I used this simple test case given by you a long time ago,
main ()
{
iopl(3);
for (;;)
asm("cli");
}
This doesn't look like a regression tho, 2.6.19-rc6 has the same
behavior ..
Daniel
-hm, so this doesnt get detected as a lockup at all? That's bad and needs fixing ... Ingo -
I can reproduce this on my dual core VAIO. There are some issues: 1) clockevents code needs probably a resume fix, which I'm working on 2) The BOGOMIPS calibration of CPU#1 after resume is completely hosed. CPU#1 has ~ 200000 Bogomips after resume, which is off by factor 500. BOGOMIPS is calibrated vs. jiffies, This was also observed by Ingo on his T60. This is similar to the LAPIC calibration problems I had seen before the LAPIC rework. IRQ#0 comes in extremly slow (probably caused by SMM crap, e.g. PS/2 keyboard emulation) 3) on suspend we shut down CPU#1 on dual core machines. resume restarts CPU#1. Now one would expect that acpi_processor_power_verify would be called, so that the "LAPIC/TSC stops in C3"-workaround gets applied to CPU#1. Len confirmed that it should be called, when the cpu is brought online. This was not noticed before as the old apic code kept the broadcast bit on cpu shutdown. Delayed until the above is sorted out. tglx -
Yeah, I think I can too, on my dual-core Mac Mini. I'm not done with my bisection, but e9e2cdb4 is among the 28 commits left, so I'm pretty sure I'm hitting the same bug. I'll do a few more bootups to be 100% sure. Linus -
Ok, it's in the last six candidates, so yeah, I'm pretty sure. I'll do a final compile/boot cycle to verify, but if you don't hear from me, you can pretty much assume that was it. Thomas, Ingo, I'd _really_ like to get -rc3 out there, but I'd like to cut down the regression list a bit, and a number of them were about resume from RAM, and this is probably it. So I'd *really* like to get this one nailed, especially since the causing commit is known. Can you look at it as a high-priority thing, please? I don't see anything interesting in my logs. .. Time: acpi_pm clocksource has been installed. Real Time Clock Driver v1.12ac hpet_acpi_add: no address or irqs in _CRS .. (I also see "Time: tsc clocksource has been installed." on some boots) Linus -
Sure. I fought it all day. Can you please test the patch I sent a couple of minutes ago ? Would be great to have your feedback tomorrow morning. We need to fix that ACPI problem (acpi_processor_start is not called when CPU#1 is resumed) as well. I look into this tomorrow unless Len beats me. tglx -
Compiling right now - delayed because I just had to pick up one of the Ok, thanks. Linus -
Ok, it does indeed solve the problem for me. Mind sending a signed-off thing with explanations etc? Linus -
Not yet for me unfortunately, although this seems to help. Is this the patch I should have applied? http://lkml.org/lkml/2007/3/5/445 With this applied, on resume I get *some* screen output soon after resume (e.g. with s2ram I get several characters on VGA, X starts drawing some windows) but then the crescent symbol starts blinking again and the system hangs. Could this be the ACPI problem Thomas mentions (acpi_processor_start is not called when CPU#1 is resumed)? -- MST -
Not yet for me unfortunately, although this seems to help. Is this the patch I should have applied? http://lkml.org/lkml/2007/3/5/445 With this applied, on resume I get *some* screen output soon after resume (e.g. with s2ram I get several characters on VGA, X starts drawing some windows) but then the crescent symbol starts blinking again and the system hangs. Could this be the ACPI problem Thomas mentions (acpi_processor_start is not called when CPU#1 is resumed)? -- MST -
could you try this via s2ram on a text console, to see whether the kernel spits out any warning before it locks up? Ingo -
Yes, that's what I did. Unfortunately only a couple of characters were shown before it locked up. I still need to check what does this do in the NO_HZ configuration. BTW, Ingo, can you suspend/resume any number of times with this patch? -- MST -
Was it 'Linu'? That's different debugging hack. Anyway, I suspect yout problem is debuggable with printk+mdelay... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
yeah, i can now suspend/resume an arbitrary number of times, vga,
network, SATA all works fine after that. (i tried it 5 times)
i also have the patch below applied - but i dont think it should make a
difference to your case. (maybe it does though) I've attached my config
as well.
Ingo
------>
From: Thomas Gleixner <tglx@linutronix.de>
The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable
clocksource and clock event sources are registered. The switch to high
resolution mode happens inside of the TIMER_SOFTIRQ, but runs the
softirq afterwards. That way the tick emulation timer, which was set up
in the switch to highres might be executed in the softirq context, which
is a BUG. The rbtree has not to be touched by the softirq after the
highres switch.
This BUG was observed by Andres Salomon, who provided the information to
debug it.
Return early from the softirq, when the switch was sucessful.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/hrtimer.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -540,19 +540,19 @@ static inline int hrtimer_enqueue_reprog
/*
* Switch to high resolution mode
*/
-static void hrtimer_switch_to_hres(void)
+static int hrtimer_switch_to_hres(void)
{
struct hrtimer_cpu_base *base = &__get_cpu_var(hrtimer_bases);
unsigned long flags;
if (base->hres_active)
- return;
+ return 1;
local_irq_save(flags);
if (tick_init_highres()) {
local_irq_restore(flags);
- return;
+ return 0;
}
base->hres_active = 1;
base->clock_base[CLOCK_REALTIME].resolution = KTIME_HIGH_RES;
@@ -565,13 +565,14 @@ static void hrtimer_switch_to_hres(void)
local_irq_restore(flags);
printk(KERN_INFO "Switched to high resolution mode on CPU %d\n",
smp_processor_id())...This is just a coding style thing, but I thought I should really point it
out, because these kinds of things quite often result in nasty bugs simply
because the source code is so hard to read properly:
Ok, so here's the quiz: does this function return "true on success, false
Ohh-oh! This is clearly a failure schenario! And indeed,
"tick_init_highres()" will do the "negative on failure, zero on success"
thing.
BUT! That means that you're testing the return value WRONG!
A function that returns a negative error value should be tested with
if (tick_init_highres() < 0) {
local_irq_restore(flags);
return 0;
}
because now you *see* that it's a failure.
So here's the coding style:
- "true on success, false on failure" should be tested by just doing the
implicit test against zero (because that's how C booleans work!)
Example:
if (everything_is_done())
return;
Or:
if (!something_worked_ok()) {
printk("Aiee! Bug!\n");
return;
}
- "negative error values" should preferably always be tested as such
if (tick_init_highres() < 0) {
printk("Aieee! Couldn't init!\n");
return 0;
}
or, much better, actually use a temporary variable called "err" or
"error" or something, at which point "!error" is suddenly readable
again:
err = tick_init_highres();
if (!err)
return;
I know this sounds stupid, but we've long since come to the point where
source code readability on a *local* scale is damn important, simply
because that's how people look at code: they may not always remember
whether "zero is success" or "zero is false".
In general, I would suggest:
- ALWAYS use "negative means error". If you had done that in this case,
then hrtimer_switch_to_hres() would have been a lot more readable,
*and* it could actually have returned the error code that it got to the
caller. In general, it's just more information when you see
error = some_function();
if (error)
return error;
...So this one above and [2] below lose the obvious "negative error" information, and also prevent such functions from returning a positive value (> 0), e.g., to indicate a successful amount of work done (like bytes read or written). The second version above [1b] also does not quite agree with your statement: - "negative error values" should preferably always be tested as such --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
yeah, agreed. We'll fix this. Ingo -
/me pleads guilty !
Replacement patch below.
tglx
--------------------->
The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable
clocksource and clock event sources are registered. The switch to high
resolution mode happens inside of the TIMER_SOFTIRQ, but runs the
softirq afterwards. That way the tick emulation timer, which was set up
in the switch to highres might be executed in the softirq context, which
is a BUG. The rbtree has not to be touched by the softirq after the
highres switch.
This BUG was observed by Andres Salomon, who provided the information to
debug it.
Return early from the softirq, when the switch was sucessful.
Remove also the superfluid hres_active check on top of
hrtimer_switch_to_hres(), as we never get there, once we switched over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andres Salomon <dilinger@debian.org>
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index de93a81..2e465df 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -540,19 +540,18 @@ static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
/*
* Switch to high resolution mode
*/
-static void hrtimer_switch_to_hres(void)
+static int hrtimer_switch_to_hres(void)
{
struct hrtimer_cpu_base *base = &__get_cpu_var(hrtimer_bases);
unsigned long flags;
-
- if (base->hres_active)
- return;
+ int err;
local_irq_save(flags);
- if (tick_init_highres()) {
+ err = tick_init_highres();
+ if (err < 0) {
local_irq_restore(flags);
- return;
+ return err;
}
base->hres_active = 1;
base->clock_base[CLOCK_REALTIME].resolution = KTIME_HIGH_RES;
@@ -565,13 +564,14 @@ static void hrtimer_switch_to_hres(void)
local_irq_restore(flags);
printk(KERN_INFO "Switched to high resolution mode on CPU %d\n",
smp_processor_id());
+ return 0;
}
#else
static inline int hrtimer_hres_active(void) { return 0; }
static inline...Well, I already applied the original one that came through Andrew, so I really just wanted to note the coding style in general, and your fixed patch no longer applied ;) which I guess is ok, if only because we simply don't care about what the exact error was. But it means that this particular code sequence ends up having the same problem (which is still fewer places than the original patch, so we're good). I personally hate the if (hrtimer_switch_to_hres() == SUCCESS) return; kind of syntax (it's just too long, and it's *not* obvious at all that SUCCESS is zero and that this is a "negative error or zero" kind of function, so it's actually *worse* than just doing what you did, but some projects seem to have that kind of approach. We could encourage people to do if (hrtimer_switch_to_hres() >= 0) return; which is fairly obviously a "success" case for a negative error value, but I'm not sure the extra typing really is worth it. Does anybody have any smart ideas that people might even be ok with following (just making things more cumbersome is anti-productive, so I don't want to have some stupid rule that everybody really hates)? Linus -
Haven't tried that yet. -- MST -
well I could at least two times in a row in my minimalistic setup (config attached) however using the full kernel config it still hangs ... (config also attached in case you watn to compare it). Soeren -- Sometimes, there's a moment as you're waking, when you become aware of the real world around you, but you're still dreaming.
Same problem for me on 2.6.21-rc2/rc1 on IBM X60s. I've applied this patch and Ingo Molnar's patch and s2ram still can't suspend. I tried turning off CONFIG_KVM as well, but makes no difference. Will try turning off CONFIG_NO_HZ to see if this makes any difference. 2.6.20 works fine. Jeff. -
i can confirm that with your full config i see a hang too. This is most likely the ACPI problem - we are debugging this now and will come up with a patch. Ingo -
update: this only happens with simulation, and even then it's not a full hang but 'hangs until i hit a key'. With real resume i'm hitting a real key anyway so it doesnt hang. Ingo -
How 'bout my .config? -- MST -
your config works fine here too - did full suspend/resume twice without any problems. Maybe it's the ACPI related regression Thomas is seeing. Ingo -
Hmm. The key is consumed in the BIOS, so there should be no interrupt. /me digs deeper. tglx -
Here's mine too, in case someone's interested. -- MST
here's Thomas' patch with explanations:
------------------------->
From: Thomas Gleixner <tglx@linutronix.de>
The programming of periodic tick devices needs to be saved/restored
across suspend/resume - otherwise we might end up with a system coming
up that relies on getting a PIT (or HPET) interrupt, while those devices
default to 'no interrupts' after powerup. (To confuse things it worked
to a certain degree on some systems because the lapic gets initialized
as a side-effect of SMP bootup.)
This suspend / resume thing was dropped unintentionally during the
last-minute -mm code reshuffling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 12b3efe..5567745 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -284,6 +284,42 @@ void tick_shutdown_broadcast(unsigned int *cpup)
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}
+void tick_suspend_broadcast(void)
+{
+ struct clock_event_device *bc;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+ bc = tick_broadcast_device.evtdev;
+ if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+ clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
+
+ spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+}
+
+int tick_resume_broadcast(void)
+{
+ struct clock_event_device *bc;
+ unsigned long flags;
+ int broadcast = 0;
+
+ spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+ bc = tick_broadcast_device.evtdev;
+ if (bc) {
+ if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &&
+ !cpus_empty(tick_broadcast_mask))
+ tick_broadcast_start_periodic(bc);
+
+ broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+ }
+ spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+
+ return broadcast;
+}
+
+
#ifdef CONFIG_TICK_ONESHOT
static cpumask_t tick_broadcast_oneshot_mask;
diff --git a/ker...Signed-off-by: Thomas Gleixner <tglx@linutronix.de> -
I just got the resume fix cleaned up. The suspend / resume thing was
dropped unintentionally during the -mm code reshuffling.
It still needs the broadcast fix though.
Does this make the problem go away ?
tglx
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 12b3efe..5567745 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -284,6 +284,42 @@ void tick_shutdown_broadcast(unsigned int *cpup)
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}
+void tick_suspend_broadcast(void)
+{
+ struct clock_event_device *bc;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+ bc = tick_broadcast_device.evtdev;
+ if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+ clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
+
+ spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+}
+
+int tick_resume_broadcast(void)
+{
+ struct clock_event_device *bc;
+ unsigned long flags;
+ int broadcast = 0;
+
+ spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+ bc = tick_broadcast_device.evtdev;
+ if (bc) {
+ if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &&
+ !cpus_empty(tick_broadcast_mask))
+ tick_broadcast_start_periodic(bc);
+
+ broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+ }
+ spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+
+ return broadcast;
+}
+
+
#ifdef CONFIG_TICK_ONESHOT
static cpumask_t tick_broadcast_oneshot_mask;
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 0986a2b..43ba1bd 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -298,6 +298,28 @@ static void tick_shutdown(unsigned int *cpup)
spin_unlock_irqrestore(&tick_device_lock, flags);
}
+static void tick_suspend_periodic(void)
+{
+ struct tick_device *td = &__get_cpu_var(tick_cpu_device);
+ unsigned long flags;
+
+ spin_lock_irqsave(&tick_device_lock, f...yes. works in my isolated test-setup. I'm now going to test this in git-HEAD with the full config. Soeren -- Sometimes, there's a moment as you're waking, when you become aware of the real world around you, but you're still dreaming. -
*argh* when using the full config on HEAD something still causes a hang on resume, but no time to trace it (the this probably new problem) down atm :( Soeren -- Sometimes, there's a moment as you're waking, when you become aware of the real world around you, but you're still dreaming. -
This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20
that are not yet fixed in Linus' tree.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.
Due to the huge amount of recipients, please trim the Cc when answering.
Subject : Asus A8N-VM motherboard:
framebuffer/console boot failure boot failure (ACPI related)
References : http://lkml.org/lkml/2007/2/23/132
Submitter : Andrew Nelless <andrew@nelless.net>
Caused-By : Len Brown <len.brown@intel.com>
commit 7f8f97c3cc75d5783d0b45cf323dedf17684be19
Handled-By : Antonino A. Daplas <adaplas@gmail.com>
Status : problem is being debugged
Subject : LCD is dimmed (ibm-acpi related)
References : http://lkml.org/lkml/2007/2/25/206
Submitter : Jiri Kosina <jikos@jikos.cz>
Caused-By : Richard Purdie <rpurdie@rpsys.net>
commit 994efacdf9a087b52f71e620b58dfa526b0cf928
Handled-By : Jiri Kosina <jikos@jikos.cz>
Richard Purdie <rpurdie@rpsys.net>
Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Status : patches are being discussed
Subject : no backlight on radeon
References : http://lkml.org/lkml/2007/2/19/1
Submitter : Yaroslav Halchenko <kernel@onerussian.com>
Alex Romosan <romosan@sycorax.lbl.gov>
David Miller <davem@davemloft.net>
Caused-By : James Simmons <jsimmons@infradead.org>
commit e0e34ef7f02915cfe50e501e9f32c24217177a96
Handled-By : Richard Purdie <rpurdie@rpsys.net>
James Simmons <jsimmons@infradead.org>
Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Status : problem is being discussed
Subject : nvidiafb broken
References : [ message continues ]The confirmed fixes for both of these are in the backlight tree. I was waiting for any further feedback on the first issue above but will send them to Linus now. Cheers, Richard -
This is not a framebuffer nor console problem. I think Andrew Nelless confirmed that the cause is from the above commit. How to fix it, I don't know. Perhaps the acpi_skip_timer_override boot option has to be used. Tony -
Yes, apologies for taking so long with this. I tried the acpi_skip_timer_override boot option last night, after Tony pointed it out, and this also works around the problem. To summarize the cause is the changes made to early-quirks.c in the mentioned commit and when this is reverted the problem goes away. There doesn't seem to be any sign of a living HPET on this board or any way of enabling it in the current BIOS revision but it seems on intermittent boots the check in early-quirks.c returns, the timer override doesn't happen, and the kernel fails to boot properly. Btw, this is the Asus A8N-VM *CSM* main board, the non-CSM variety actually has a nForce 410 rather than an nForce 430 chip. I don't know whether they behave any differently but the two boards actually have different BIOS releases. If reverting the commit would disable the HPET on boards that do actually support it I personally don't mind using the acpi_skip_timer_override workaround. - Andrew -
Looks like I got fooled by the negative logic for the nvidia_bugs().
Please test this patch -- it should fix it,
as well as simplify the code a bit.
thanks,
-Len
Subject: ACPI: repair nvidia early quirk breakage on x86_64
x86_64 nvidia_bugs() broke when we bailed out on not finding the HPET.
However, the quirk works by checking for _not_ finding the HPET...
Delete the nvidia_hpet_detected flag and simply test for
not finding the HPET, which is simple to do now that
acpi_table_parse returns 1 on failure.
Signed-off-by: Len Brown <len.brown@intel.com>
---
i386/kernel/acpi/earlyquirk.c | 7 +------
x86_64/kernel/early-quirks.c | 9 +--------
2 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/arch/i386/kernel/acpi/earlyquirk.c b/arch/i386/kernel/acpi/earlyquirk.c
index bf86f76..7fdba8a 100644
--- a/arch/i386/kernel/acpi/earlyquirk.c
+++ b/arch/i386/kernel/acpi/earlyquirk.c
@@ -14,11 +14,8 @@
#ifdef CONFIG_ACPI
-static int nvidia_hpet_detected __initdata;
-
static int __init nvidia_hpet_check(struct acpi_table_header *header)
{
- nvidia_hpet_detected = 1;
return 0;
}
#endif
@@ -29,9 +26,7 @@ static int __init check_bridge(int vendor, int device)
/* According to Nvidia all timer overrides are bogus unless HPET
is enabled. */
if (!acpi_use_timer_override && vendor == PCI_VENDOR_ID_NVIDIA) {
- nvidia_hpet_detected = 0;
- acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check);
- if (nvidia_hpet_detected == 0) {
+ if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check) {
acpi_skip_timer_override = 1;
printk(KERN_INFO "Nvidia board "
"detected. Ignoring ACPI "
diff --git a/arch/x86_64/kernel/early-quirks.c b/arch/x86_64/kernel/early-quirks.c
index 8047ea8..dec587b 100644
--- a/arch/x86_64/kernel/early-quirks.c
+++ b/arch/x86_64/kernel/early-quirks.c
@@ -30,11 +30,8 @@ static void via_bugs(void)
#ifdef CONFIG_ACPI
-static int nvidia_hpet_detected __initdata;
-
stati...Yep. You can knock this one off the regression list :) Thanks, Andrew -
This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : NCQ problem with ahci and Hitachi drive References : http://lkml.org/lkml/2007/3/4/178 Submitter : Mathieu Bérard <Mathieu.Berard@crans.org&am
