login
Header Space

 
 

Re: [5/6] 2.6.21-rc2: known regressions

Previous thread: [PATCH] init_new_context: Use the passed task argument by Aneesh Kumar K.V on Wednesday, February 28, 2007 - 12:47 am. (1 message)

Next thread: compile error in arch/i386/kernel/io_apic.c by Meelis Roos on Wednesday, February 28, 2007 - 2:54 am. (1 message)
To: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 1:16 am

Oh well.. I'm not very proud of this, because quite frankly, -rc2 has way 
more changes than I really like.

And yeah, it's largely my fault, because I simply missed a V4L/DVB merge 
that came in before the merge window closed, but since I didn't notice it 
didn't make -rc1, and as such it got merged late and is in -rc2 instead.

But because I'll flail around wildly and rather blame anything else than 
my own incompetence, I'll just claim that all the other kernel developers 
have been irresponsible, and caused -rc2 to be bigger than needed. In some 
areas (you know who you are) it may even be true..

Apart from the V4L/DVB merge, we've got a late PARISC update, and a number 
of driver updates (ata, networking, usb) changes. Along with the normal 
smattering of random stuff (core networking, selinux, infiniband, agp, 
mips, arm).

Anyway, I really hope the thing starts calming down now, and everybody 
should take a hard look at the regressions lists that Adrian has started 
sending out. We already fixed some of them, but there is more to go..

Thanks,

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 3:23 am

I got this compile error:

  CC      arch/i386/kernel/io_apic.o
arch/i386/kernel/io_apic.c: In function 'setup_IO_APIC_irqs':
arch/i386/kernel/io_apic.c:1357: error: 'struct irq_desc' has no
member named 'affinity'
arch/i386/kernel/io_apic.c: In function 'io_apic_set_pci_routing':
arch/i386/kernel/io_apic.c:2878: error: 'struct irq_desc' has no
member named 'affinity'
make[1]: *** [arch/i386/kernel/io_apic.o] Error 1
make: *** [arch/i386/kernel] Error 2

didn't happen with rc1

- David Brown
-
To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 3:59 am

Ingo's patch correcting a bug in the SMT scheduler
(http://lkml.org/lkml/2007/2/26/103) seems to have been missed. It is
quite important when using dynticks.

-- 
Damien Wyart
-
To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Eric W. Biederman <ebiederm@...>
Date: Wednesday, February 28, 2007 - 3:39 am

Hi Linus,

rc2 fails to build on my thinkpad t43:

  CC      arch/i386/kernel/io_apic.o
arch/i386/kernel/io_apic.c: In function 'setup_IO_APIC_irqs':
arch/i386/kernel/io_apic.c:1357: error: 'struct irq_desc' has no member
named 'affinity'
arch/i386/kernel/io_apic.c: In function 'io_apic_set_pci_routing':
arch/i386/kernel/io_apic.c:2878: error: 'struct irq_desc' has no member
named 'affinity'
make[1]: *** [arch/i386/kernel/io_apic.o] Error 1
make: *** [arch/i386/kernel] Error 2

The problem is caused by affinity being within #ifdef SMP in struct
irq_desc in irq.h:
#ifdef CONFIG_SMP
        cpumask_t               affinity;
        unsigned int            cpu;
#endif

I don't know whether the whole functions or just a single line should be
placed under #ifdef CONFIG_SMP. Eric (in CC) will know better than I do.

My config is attached.

Brice
To: Brice Goglin <Brice.Goglin@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 9:09 am

Yes. I goofed, and missed that stupid case.  The offending lines
should just die.  Patch already sent to Linus.  

Eric
-
To: Eric W. Biederman <ebiederm@...>
Cc: Brice Goglin <Brice.Goglin@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 12:44 pm

Could the patch be posted? or could I see a git commit so I can get it myself?

Thanks,
David Brown
-
To: David Brown <dmlb2000@...>
Cc: Eric W. Biederman <ebiederm@...>, Brice Goglin <Brice.Goglin@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 1:07 pm

I'm attaching it below.  It hit the git commits mailing list
just a few minutes ago.

---
~Randy
To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 1:50 am

I got this warning so far :)

drivers/video/Kconfig:1622:warning: 'select' used by config symbol 
'FB_PS3' refer to undefined symbol 'PS3_PS3AV'


Regards,

Gabriel

-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Horst H. von Brand <vonbrand@...>, David S. Miller <davem@...>, <sparclinux@...>, Pavel Machek <pavel@...>, Oliver Neukum <oneukum@...>, <gregkh@...>, <linux-usb-devel@...>, Craig Schlenter <craig@...>, Paul Rolland <rol@...>, Rafael J. Wysocki <rjw@...>, David Brownell <david-b@...>, <a.zummo@...>, <rtc-linux@...>
Date: Sunday, March 4, 2007 - 9:50 pm

This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : sparc64 compile error due to GENERIC_ISA_DMA removal
References : http://bugzilla.kernel.org/show_bug.cgi?id=8097
Submitter  : Horst H. von Brand &lt;vonbrand@inf.utfsm.cl&gt;
Caused-By  : David S. Miller &lt;davem@sunset.davemloft.net&gt;
             commit 1b51d3a08b6c80a1e47d4c579c41abbe56cd3c44
Status     : unknown


Subject    : mmc reader no longer works
References : http://lkml.org/lkml/2007/2/27/91
Submitter  : Pavel Machek &lt;pavel@ucw.cz&gt;
Caused-By  : Oliver Neukum &lt;oneukum@suse.de&gt;
Status     : problem is being debugged


Subject    : usb-serial broken
             (ftdi serial device shows up as ttyUSB140 instead of ttyUSB0)
Submitter  : Craig Schlenter &lt;craig@codefountain.com&gt;
Caused-By  : Oliver Neukum &lt;oneukum@suse.de&gt;
             commit 34ef50e5b1f96c2d8c0f3d28b7d407743806256c
Handled-By : Oliver Neukum &lt;oneukum@suse.de&gt;
Status     : patch available


Subject    : Oops in rtc_cmos
References : http://lkml.org/lkml/2007/3/4/112
             http://lkml.org/lkml/2007/2/18/172
Submitter  : Paul Rolland &lt;rol@as2917.net&gt;
             Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Caused-By  : David Brownell &lt;david-b@pacbell.net&gt;
             commit 7be2c7c96aff2871240d61fef508c41176c688b5
Patch      : http://lkml.org/lkml/2007/2/23/184
Status     : patch available

-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Horst H. von Brand <vonbrand@...>, David S. Miller <davem@...>, <sparclinux@...>, Pavel Machek <pavel@...>, Oliver Neukum <oneukum@...>, <linux-usb-devel@...>, Craig Schlenter <craig@...>, Paul Rolland <rol@...>, Rafael J. Wysocki <rjw@...>, David Brownell <david-b@...>, <a.zummo@...>, <rtc-linux@...>
Date: Sunday, March 4, 2007 - 11:32 pm

Patch is queued up in my tree and will go to Linus in a few days.

But I think there is another usb-serial patch that Oliver needs to send
me to fix another problem with the usb-serial core...

thanks,

greg k-h
-
To: <bunk@...>
Cc: <torvalds@...>, <akpm@...>, <linux-kernel@...>, <vonbrand@...>, <davem@...>, <sparclinux@...>, <pavel@...>, <oneukum@...>, <gregkh@...>, <linux-usb-devel@...>, <craig@...>, <rol@...>, <rjw@...>, <david-b@...>, <a.zummo@...>, <rtc-linux@...>
Date: Sunday, March 4, 2007 - 10:07 pm

From: Adrian Bunk &lt;bunk@stusta.de&gt;

Fixed in current GIT.

commit 74bd7d093b8e87f35eaf3b14459b96a0e20d1d10
Author: David S. Miller &lt;davem@sunset.davemloft.net&gt;
Date:   Wed Feb 28 13:09:34 2007 -0800

    [SPARC64]: Fix parport_pc build.
    
    Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;

-
To: David Miller <davem@...>
Cc: <torvalds@...>, <akpm@...>, <linux-kernel@...>, <vonbrand@...>, <davem@...>, <sparclinux@...>
Date: Sunday, March 4, 2007 - 10:26 pm

Horst's problem is with the floppy driver and 
claim_dma_lock/release_dma_lock in include/asm-sparc64/dma.h .

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: <bunk@...>
Cc: <torvalds@...>, <akpm@...>, <linux-kernel@...>, <vonbrand@...>, <davem@...>, <sparclinux@...>
Date: Monday, March 5, 2007 - 12:42 am

From: Adrian Bunk &lt;bunk@stusta.de&gt;

Here is the fix I will send to Linus for this, thanks:

commit 08414aa2516da65ae7a522c6834b8ea576f38c4b
Author: David S. Miller &lt;davem@sunset.davemloft.net&gt;
Date:   Sun Mar 4 20:36:18 2007 -0800

    [SPARC64]: Fix floppy build failure.
    
    Just define a local {claim,release}_dma_lock() implementation
    for the floppy driver to use so we don't need to define and
    export to modules the silly dma_spin_lock.
    
    Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;

diff --git a/include/asm-sparc64/dma.h b/include/asm-sparc64/dma.h
index 1bf4f7a..a9fd061 100644
--- a/include/asm-sparc64/dma.h
+++ b/include/asm-sparc64/dma.h
@@ -15,17 +15,6 @@
 #include &lt;asm/delay.h&gt;
 #include &lt;asm/oplib.h&gt;
 
-extern spinlock_t  dma_spin_lock;
-
-#define claim_dma_lock() \
-({	unsigned long flags; \
-	spin_lock_irqsave(&amp;dma_spin_lock, flags); \
-	flags; \
-})
-
-#define release_dma_lock(__flags) \
-	spin_unlock_irqrestore(&amp;dma_spin_lock, __flags);
-
 /* These are irrelevant for Sparc DMA, but we leave it in so that
  * things can compile.
  */
diff --git a/include/asm-sparc64/floppy.h b/include/asm-sparc64/floppy.h
index dbe033e..331013a 100644
--- a/include/asm-sparc64/floppy.h
+++ b/include/asm-sparc64/floppy.h
@@ -854,4 +854,15 @@ static unsigned long __init sun_floppy_init(void)
 
 #define EXTRA_FLOPPY_PARAMS
 
+static DEFINE_SPINLOCK(dma_spin_lock);
+
+#define claim_dma_lock() \
+({	unsigned long flags; \
+	spin_lock_irqsave(&amp;dma_spin_lock, flags); \
+	flags; \
+})
+
+#define release_dma_lock(__flags) \
+	spin_unlock_irqrestore(&amp;dma_spin_lock, __flags);
+
 #endif /* !(__ASM_SPARC64_FLOPPY_H) */
-
To: <bunk@...>
Cc: <torvalds@...>, <akpm@...>, <linux-kernel@...>, <vonbrand@...>, <davem@...>, <sparclinux@...>
Date: Sunday, March 4, 2007 - 10:29 pm

From: Adrian Bunk &lt;bunk@stusta.de&gt;

Thanks for the clarification, I was thinking of the parport
problem reported by Meelis Roos.

I'll look into this.
-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Sunday, March 4, 2007 - 9:50 pm

This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : soft lockup detected on CPU#0
References : http://lkml.org/lkml/2007/3/3/152
Submitter  : Michal Piotrowski &lt;michal.k.k.piotrowski@gmail.com&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : unknown


Subject    : dynticks makes ksoftirqd1 use unreasonable amount of cpu time
References : http://bugzilla.kernel.org/show_bug.cgi?id=8100
Submitter  : Emil Karlson &lt;jkarlson@cc.hut.fi&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : problem is being debugged


Subject    : ThinkPad T60: system doesn't come out of suspend to RAM
             (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin &lt;mst@mellanox.co.il&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : unknown


Subject    : macbook pro suspend to ram broken  (clockevents)
References : http://lkml.org/lkml/2007/3/4/110
Submitter  : Soeren Sonnenburg &lt;kernel@nn7.de&gt;
Caused-By  : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Status     : unknown


Subject    : i386: no boot with nmi_watchdog=1  (clockevents)
References : http://lkml.org/lkml/2007/2/21/208
Submitter  : Daniel Walker &lt;dwalker@mvista.com&gt;
Caused-By  : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : problem is being debugged

-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 3:57 am

FYI, this is not a "wont boot" problem, this should be a "NMI watchdog 
does not work" problem - which has far lower severity. Also, Thomas did 
a fix for this which is now in -mm.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 12:14 pm

If a system normally runs a watchdog, and some do, then nmi would be 
forced on by grub.comf and the system would not boot. And if the system 
was counting on nmi to look for a hanging problem, "nmi does not work" 
would be a real problem if the failure was silent.

Actually, a lack of nmi would be worse than not booting, it would be a 
time bomb waiting for a bad moment to hang.

-- 
Bill Davidsen &lt;davidsen@tmr.com&gt;
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To: Bill Davidsen <davidsen@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 12:21 pm

uhm, what? The NMI watchdog is totally optional. If it doesnt work, we 
print some stuff and just continue with the bootup.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 7:12 pm

Thanks for your corrections, it's now:

Subject    : i386: NMI watchdog does not work  (clockevents)
References : http://lkml.org/lkml/2007/2/21/208
Submitter  : Daniel Walker &lt;dwalker@mvista.com&gt;
Caused-By  : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Fixed-By   : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Commit     : a5f5e43e2b1377392f9afe93aca29b9abf1d6a44

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Ingo Molnar <mingo@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 4:13 am

yup, you should be able to cross this one off, Adrian.  The fix worked for me,
at least.
-
To: Andrew Morton <akpm@...>
Cc: Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>
Date: Monday, March 5, 2007 - 11:25 am

I didn't see a fix for this one go by .. I'll check the usual places I
guess ..

Daniel

-
To: Daniel Walker <dwalker@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>
Date: Monday, March 5, 2007 - 11:27 am

find it below.

	Ingo

------------------------------------------------------
Subject: fix "NMI appears to be stuck"
From: Thomas Gleixner &lt;tglx@linutronix.de&gt;

  Testing NMI watchdog ... CPU#0: NMI appears to be stuck (54-&gt;54)!
  CPU#1: NMI appears to be stuck (0-&gt;0)!

Keep the PIT/HPET alive when nmi_watchdog = 1 is given on the command
line.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/kernel/apic.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff -puN arch/i386/kernel/apic.c~fix-nmi-appears-to-be-stuck arch/i386/kernel/apic.c
--- a/arch/i386/kernel/apic.c~fix-nmi-appears-to-be-stuck
+++ a/arch/i386/kernel/apic.c
@@ -493,8 +493,15 @@ void __init setup_boot_APIC_clock(void)
 		/* No broadcast on UP ! */
 		if (num_possible_cpus() == 1)
 			return;
-	} else
-		lapic_clockevent.features &amp;= ~CLOCK_EVT_FEAT_DUMMY;
+	} else {
+		/*
+		 * If nmi_watchdog is set to IO_APIC, we need the
+		 * PIT/HPET going. So we register lapic as a dummy
+		 * device.
+		 */
+		if (nmi_watchdog != NMI_IO_APIC)
+			lapic_clockevent.features &amp;= ~CLOCK_EVT_FEAT_DUMMY;
+	}
 
 	/* Setup the lapic or request the broadcast */
 	setup_APIC_timer();
-
To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>
Date: Monday, March 5, 2007 - 12:42 pm

This fix gets the NMI to tick like the timer (in /proc/interrupts) ..
However, now that it's ticking I don't think it's actually working
properly ..

I used this simple test case given by you a long time ago,

main ()
{
iopl(3);
for (;;) 
        asm("cli");

}

This doesn't look like a regression tho, 2.6.19-rc6 has the same
behavior ..

Daniel

-
To: Daniel Walker <dwalker@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>
Date: Monday, March 5, 2007 - 3:30 pm

hm, so this doesnt get detected as a lockup at all? That's bad and needs 
fixing ...

	Ingo
-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 7:43 pm

I can reproduce this on my dual core VAIO. There are some issues:

1) clockevents code needs probably a resume fix, which I'm working on

2) The BOGOMIPS calibration of CPU#1 after resume is completely hosed.
CPU#1 has ~ 200000 Bogomips after resume, which is off by factor 500.
BOGOMIPS is calibrated vs. jiffies, This was also observed by Ingo on
his T60. This is similar to the LAPIC calibration problems I had seen
before the LAPIC rework. IRQ#0 comes in extremly slow (probably caused
by SMM crap, e.g. PS/2 keyboard emulation)

3) on suspend we shut down CPU#1 on dual core machines. resume restarts
CPU#1. Now one would expect that acpi_processor_power_verify would be
called, so that the "LAPIC/TSC stops in C3"-workaround gets applied to
CPU#1. Len confirmed that it should be called, when the cpu is brought
online. This was not noticed before as the old apic code kept the
broadcast bit on cpu shutdown.


Delayed until the above is sorted out.

	tglx


-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 7:45 pm

Yeah, I think I can too, on my dual-core Mac Mini. 

I'm not done with my bisection, but e9e2cdb4 is among the 28 commits left, 
so I'm pretty sure I'm hitting the same bug. I'll do a few more bootups to 
be 100% sure.

		Linus
-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 8:38 pm

Ok, it's in the last six candidates, so yeah, I'm pretty sure. I'll do a 
final compile/boot cycle to verify, but if you don't hear from me, you can 
pretty much assume that was it.

Thomas, Ingo, I'd _really_ like to get -rc3 out there, but I'd like to cut 
down the regression list a bit, and a number of them were about resume 
from RAM, and this is probably it. So I'd *really* like to get this one 
nailed, especially since the causing commit is known. Can you look at it 
as a high-priority thing, please?

I don't see anything interesting in my logs.

	..
	Time: acpi_pm clocksource has been installed.
	Real Time Clock Driver v1.12ac
	hpet_acpi_add: no address or irqs in _CRS
	..

(I also see "Time: tsc clocksource has been installed." on some boots)

		Linus

-
To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 9:02 pm

Sure. I fought it all day. Can you please test the patch I sent a couple
of minutes ago ? Would be great to have your feedback tomorrow morning. 

We need to fix that ACPI problem (acpi_processor_start is not called
when CPU#1 is resumed) as well. I look into this tomorrow unless Len
beats me.

	tglx


-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 9:31 pm

Compiling right now - delayed because I just had to pick up one of the 

Ok, thanks.

		Linus
-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 10:18 pm

Ok, it does indeed solve the problem for me.

Mind sending a signed-off thing with explanations etc?

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 6:33 am

Not yet for me unfortunately, although this seems to help.
Is this the patch I should have applied?
http://lkml.org/lkml/2007/3/5/445

With this applied, on resume I get *some* screen output soon after resume
(e.g. with s2ram I get several characters on VGA, X starts drawing some windows)
but then the crescent symbol starts blinking again and the system hangs.

Could this be the ACPI problem Thomas mentions (acpi_processor_start is not called
when CPU#1 is resumed)?

-- 
MST
-
To: Linus Torvalds <torvalds@...>
Cc: Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 6:33 am

Not yet for me unfortunately, although this seems to help.
Is this the patch I should have applied?
http://lkml.org/lkml/2007/3/5/445

With this applied, on resume I get *some* screen output soon after resume
(e.g. with s2ram I get several characters on VGA, X starts drawing some windows)
but then the crescent symbol starts blinking again and the system hangs.

Could this be the ACPI problem Thomas mentions (acpi_processor_start is not called
when CPU#1 is resumed)?

-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 6:37 am

could you try this via s2ram on a text console, to see whether the 
kernel spits out any warning before it locks up?

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 6:46 am

Yes, that's what I did. Unfortunately only a couple of characters were
shown before it locked up.

I still need to check what does this do in the NO_HZ configuration.

BTW, Ingo, can you suspend/resume any number of times with this patch?

-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Sunday, March 11, 2007 - 1:32 pm

Was it 'Linu'? That's different debugging hack.

Anyway, I suspect yout problem is debuggable with printk+mdelay...

							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To: Michael S. Tsirkin <mst@...>
Cc: Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 7:32 am

yeah, i can now suspend/resume an arbitrary number of times, vga, 
network, SATA all works fine after that. (i tried it 5 times)

i also have the patch below applied - but i dont think it should make a 
difference to your case. (maybe it does though) I've attached my config 
as well.

	Ingo

------&gt;
From: Thomas Gleixner &lt;tglx@linutronix.de&gt;

The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable 
clocksource and clock event sources are registered. The switch to high 
resolution mode happens inside of the TIMER_SOFTIRQ, but runs the 
softirq afterwards. That way the tick emulation timer, which was set up 
in the switch to highres might be executed in the softirq context, which 
is a BUG. The rbtree has not to be touched by the softirq after the 
highres switch.

This BUG was observed by Andres Salomon, who provided the information to
debug it.

Return early from the softirq, when the switch was sucessful. 

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

---
 kernel/hrtimer.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -540,19 +540,19 @@ static inline int hrtimer_enqueue_reprog
 /*
  * Switch to high resolution mode
  */
-static void hrtimer_switch_to_hres(void)
+static int hrtimer_switch_to_hres(void)
 {
 	struct hrtimer_cpu_base *base = &amp;__get_cpu_var(hrtimer_bases);
 	unsigned long flags;
 
 	if (base-&gt;hres_active)
-		return;
+		return 1;
 
 	local_irq_save(flags);
 
 	if (tick_init_highres()) {
 		local_irq_restore(flags);
-		return;
+		return 0;
 	}
 	base-&gt;hres_active = 1;
 	base-&gt;clock_base[CLOCK_REALTIME].resolution = KTIME_HIGH_RES;
@@ -565,13 +565,14 @@ static void hrtimer_switch_to_hres(void)
 	local_irq_restore(flags);
 	printk(KERN_INFO "Switched to high resolution mode on CPU %d\n",
 	       smp_processor_id())...
To: Ingo Molnar <mingo@...>
Cc: Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 12:44 pm

This is just a coding style thing, but I thought I should really point it 
out, because these kinds of things quite often result in nasty bugs simply 
because the source code is so hard to read properly:


Ok, so here's the quiz: does this function return "true on success, false 


Ohh-oh! This is clearly a failure schenario! And indeed, 
"tick_init_highres()" will do the "negative on failure, zero on success" 
thing.

BUT! That means that you're testing the return value WRONG!

A function that returns a negative error value should be tested with

	if (tick_init_highres() &lt; 0) {
		local_irq_restore(flags);
		return 0;
	}

because now you *see* that it's a failure.

So here's the coding style:

 - "true on success, false on failure" should be tested by just doing the 
   implicit test against zero (because that's how C booleans work!)

   Example:

	if (everything_is_done())
		return;

   Or:

	if (!something_worked_ok()) {
		printk("Aiee! Bug!\n");
		return;
	}

 - "negative error values" should preferably always be tested as such

	if (tick_init_highres() &lt; 0) {
		printk("Aieee! Couldn't init!\n");
		return 0;
	}

   or, much better, actually use a temporary variable called "err" or 
   "error" or something, at which point "!error" is suddenly readable 
   again:

	err = tick_init_highres();
	if (!err)
		return;

I know this sounds stupid, but we've long since come to the point where 
source code readability on a *local* scale is damn important, simply 
because that's how people look at code: they may not always remember 
whether "zero is success" or "zero is false".

In general, I would suggest:

 - ALWAYS use "negative means error". If you had done that in this case, 
   then hrtimer_switch_to_hres() would have been a lot more readable, 
   *and* it could actually have returned the error code that it got to the 
   caller. In general, it's just more information when you see

	error = some_function();
	if (error)
		return error;

   ...
To: Linus Torvalds <torvalds@...>
Cc: Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Friday, March 16, 2007 - 11:18 am

So this one above and [2] below lose the obvious "negative error"
information, and also prevent such functions from returning a
positive value (&gt; 0), e.g., to indicate a successful amount of
work done (like bytes read or written).  The second version above
[1b] also does not quite agree with your statement:
  - "negative error values" should preferably always be tested as such



---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To: Linus Torvalds <torvalds@...>
Cc: Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 1:05 pm

yeah, agreed. We'll fix this.

	Ingo
-
To: Linus Torvalds <torvalds@...>
Cc: Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 1:29 pm

/me pleads guilty !

Replacement patch below.

	tglx

---------------------&gt;

The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable
clocksource and clock event sources are registered. The switch to high
resolution mode happens inside of the TIMER_SOFTIRQ, but runs the
softirq afterwards. That way the tick emulation timer, which was set up
in the switch to highres might be executed in the softirq context, which
is a BUG. The rbtree has not to be touched by the softirq after the
highres switch.

This BUG was observed by Andres Salomon, who provided the information to
debug it.

Return early from the softirq, when the switch was sucessful.

Remove also the superfluid hres_active check on top of
hrtimer_switch_to_hres(), as we never get there, once we switched over.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Andres Salomon &lt;dilinger@debian.org&gt;

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index de93a81..2e465df 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -540,19 +540,18 @@ static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
 /*
  * Switch to high resolution mode
  */
-static void hrtimer_switch_to_hres(void)
+static int hrtimer_switch_to_hres(void)
 {
 	struct hrtimer_cpu_base *base = &amp;__get_cpu_var(hrtimer_bases);
 	unsigned long flags;
-
-	if (base-&gt;hres_active)
-		return;
+	int err;
 
 	local_irq_save(flags);
 
-	if (tick_init_highres()) {
+	err = tick_init_highres();
+	if (err &lt; 0) {
 		local_irq_restore(flags);
-		return;
+		return err;
 	}
 	base-&gt;hres_active = 1;
 	base-&gt;clock_base[CLOCK_REALTIME].resolution = KTIME_HIGH_RES;
@@ -565,13 +564,14 @@ static void hrtimer_switch_to_hres(void)
 	local_irq_restore(flags);
 	printk(KERN_INFO "Switched to high resolution mode on CPU %d\n",
 	       smp_processor_id());
+	return 0;
 }
 
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
 static inline...
To: Thomas Gleixner <tglx@...>
Cc: Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 1:41 pm

Well, I already applied the original one that came through Andrew, so I 
really just wanted to note the coding style in general, and your fixed 
patch no longer applied ;)


which I guess is ok, if only because we simply don't care about what the 
exact error was. But it means that this particular code sequence ends up 
having the same problem (which is still fewer places than the original 
patch, so we're good).

I personally hate the

	if (hrtimer_switch_to_hres() == SUCCESS)
		return;

kind of syntax (it's just too long, and it's *not* obvious at all that 
SUCCESS is zero and that this is a "negative error or zero" kind of 
function, so it's actually *worse* than just doing what you did, but some 
projects seem to have that kind of approach.

We could encourage people to do

	if (hrtimer_switch_to_hres() &gt;= 0)
		return;

which is fairly obviously a "success" case for a negative error value, but 
I'm not sure the extra typing really is worth it. Does anybody have any 
smart ideas that people might even be ok with following (just making 
things more cumbersome is anti-productive, so I don't want to have some 
stupid rule that everybody really hates)?

			Linus
-
To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:20 am

Haven't tried that yet.

-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 7:36 am

well I could at least two times in a row in my minimalistic setup
(config attached) however using the full kernel config it still
hangs ... (config also attached in case you watn to compare it).

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
To: Soeren Sonnenburg <kernel@...>
Cc: Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:09 am

Same problem for me on 2.6.21-rc2/rc1 on IBM X60s. I've applied this
patch and Ingo Molnar's patch and s2ram still can't suspend. I tried
turning off CONFIG_KVM as well, but makes no difference.

Will try turning off CONFIG_NO_HZ to see if this makes any difference.
2.6.20 works fine.
Jeff.
-
To: Soeren Sonnenburg <kernel@...>
Cc: Michael S. Tsirkin <mst@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:07 am

i can confirm that with your full config i see a hang too. This is most 
likely the ACPI problem - we are debugging this now and will come up 
with a patch.

	Ingo
-
To: Soeren Sonnenburg <kernel@...>
Cc: Michael S. Tsirkin <mst@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:51 am

update: this only happens with simulation, and even then it's not a full 
hang but 'hangs until i hit a key'. With real resume i'm hitting a real 
key anyway so it doesnt hang.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Soeren Sonnenburg <kernel@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:55 am

How 'bout my .config?


-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Soeren Sonnenburg <kernel@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 9:03 am

your config works fine here too - did full suspend/resume twice without 
any problems. Maybe it's the ACPI related regression Thomas is seeing.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Soeren Sonnenburg <kernel@...>, Michael S. Tsirkin <mst@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 9:09 am

Hmm. The key is consumed in the BIOS, so there should be no
interrupt. /me digs deeper.

	tglx


-
To: Ingo Molnar <mingo@...>
Cc: Soeren Sonnenburg <kernel@...>, Linus Torvalds <torvalds@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 8:15 am

Here's mine too, in case someone's interested.

-- 
MST
To: Linus Torvalds <torvalds@...>
Cc: Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 3:25 am

here's Thomas' patch with explanations:

-------------------------&gt;
From: Thomas Gleixner &lt;tglx@linutronix.de&gt;

The programming of periodic tick devices needs to be saved/restored 
across suspend/resume - otherwise we might end up with a system coming 
up that relies on getting a PIT (or HPET) interrupt, while those devices 
default to 'no interrupts' after powerup. (To confuse things it worked 
to a certain degree on some systems because the lapic gets initialized 
as a side-effect of SMP bootup.)

This suspend / resume thing was dropped unintentionally during the 
last-minute -mm code reshuffling.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 12b3efe..5567745 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -284,6 +284,42 @@ void tick_shutdown_broadcast(unsigned int *cpup)
 	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
 }
 
+void tick_suspend_broadcast(void)
+{
+	struct clock_event_device *bc;
+	unsigned long flags;
+
+	spin_lock_irqsave(&amp;tick_broadcast_lock, flags);
+
+	bc = tick_broadcast_device.evtdev;
+	if (bc &amp;&amp; tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+		clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
+
+	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
+}
+
+int tick_resume_broadcast(void)
+{
+	struct clock_event_device *bc;
+	unsigned long flags;
+	int broadcast = 0;
+
+	spin_lock_irqsave(&amp;tick_broadcast_lock, flags);
+
+	bc = tick_broadcast_device.evtdev;
+	if (bc) {
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &amp;&amp;
+		    !cpus_empty(tick_broadcast_mask))
+			tick_broadcast_start_periodic(bc);
+
+		broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+	}
+	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
+
+	return broadcast;
+}
+
+
 #ifdef CONFIG_TICK_ONESHOT
 
 static cpumask_t tick_broadcast_oneshot_mask;
diff --git a/ker...
To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 4:09 am

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;


-
To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Soeren Sonnenburg <kernel@...>, Len Brown <lenb@...>
Date: Monday, March 5, 2007 - 8:25 pm

I just got the resume fix cleaned up. The suspend / resume thing was
dropped unintentionally during the -mm code reshuffling.

It still needs the broadcast fix though.

Does this make the problem go away ?

	tglx

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 12b3efe..5567745 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -284,6 +284,42 @@ void tick_shutdown_broadcast(unsigned int *cpup)
 	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
 }
 
+void tick_suspend_broadcast(void)
+{
+	struct clock_event_device *bc;
+	unsigned long flags;
+
+	spin_lock_irqsave(&amp;tick_broadcast_lock, flags);
+
+	bc = tick_broadcast_device.evtdev;
+	if (bc &amp;&amp; tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+		clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
+
+	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
+}
+
+int tick_resume_broadcast(void)
+{
+	struct clock_event_device *bc;
+	unsigned long flags;
+	int broadcast = 0;
+
+	spin_lock_irqsave(&amp;tick_broadcast_lock, flags);
+
+	bc = tick_broadcast_device.evtdev;
+	if (bc) {
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &amp;&amp;
+		    !cpus_empty(tick_broadcast_mask))
+			tick_broadcast_start_periodic(bc);
+
+		broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+	}
+	spin_unlock_irqrestore(&amp;tick_broadcast_lock, flags);
+
+	return broadcast;
+}
+
+
 #ifdef CONFIG_TICK_ONESHOT
 
 static cpumask_t tick_broadcast_oneshot_mask;
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 0986a2b..43ba1bd 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -298,6 +298,28 @@ static void tick_shutdown(unsigned int *cpup)
 	spin_unlock_irqrestore(&amp;tick_device_lock, flags);
 }
 
+static void tick_suspend_periodic(void)
+{
+	struct tick_device *td = &amp;__get_cpu_var(tick_cpu_device);
+	unsigned long flags;
+
+	spin_lock_irqsave(&amp;tick_device_lock, f...
To: <tglx@...>
Cc: Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 2:49 am

yes. works in my isolated test-setup. I'm now going to test this in
git-HEAD with the full config.

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-
To: <tglx@...>
Cc: Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Len Brown <lenb@...>
Date: Tuesday, March 6, 2007 - 3:49 am

*argh* when using the full config on HEAD something still causes a hang
on resume, but no time to trace it (the this probably new problem) down
atm :(

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Andrew Nelless <andrew@...>, Len Brown <len.brown@...>, Antonino A. Daplas <adaplas@...>, <linux-acpi@...>, Jiri Kosina <jikos@...>, Richard Purdie <rpurdie@...>, Henrique de Moraes Holschuh <hmh@...>, Yaroslav Halchenko <kernel@...>, Alex Romosan <romosan@...>, David Miller <davem@...>, James Simmons <jsimmons@...>, <benh@...>, Andreas Schwab <schwab@...>
Date: Sunday, March 4, 2007 - 9:50 pm

This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : Asus A8N-VM motherboard:
             framebuffer/console boot failure boot failure (ACPI related)
References : http://lkml.org/lkml/2007/2/23/132
Submitter  : Andrew Nelless &lt;andrew@nelless.net&gt;
Caused-By  : Len Brown &lt;len.brown@intel.com&gt;
             commit 7f8f97c3cc75d5783d0b45cf323dedf17684be19
Handled-By : Antonino A. Daplas &lt;adaplas@gmail.com&gt;
Status     : problem is being debugged


Subject    : LCD is dimmed  (ibm-acpi related)
References : http://lkml.org/lkml/2007/2/25/206
Submitter  : Jiri Kosina &lt;jikos@jikos.cz&gt;
Caused-By  : Richard Purdie &lt;rpurdie@rpsys.net&gt;
             commit 994efacdf9a087b52f71e620b58dfa526b0cf928
Handled-By : Jiri Kosina &lt;jikos@jikos.cz&gt;
             Richard Purdie &lt;rpurdie@rpsys.net&gt;
             Henrique de Moraes Holschuh &lt;hmh@hmh.eng.br&gt;
Status     : patches are being discussed


Subject    : no backlight on radeon
References : http://lkml.org/lkml/2007/2/19/1
Submitter  : Yaroslav Halchenko &lt;kernel@onerussian.com&gt;
             Alex Romosan &lt;romosan@sycorax.lbl.gov&gt;
             David Miller &lt;davem@davemloft.net&gt;
Caused-By  : James Simmons &lt;jsimmons@infradead.org&gt;
             commit e0e34ef7f02915cfe50e501e9f32c24217177a96
Handled-By : Richard Purdie &lt;rpurdie@rpsys.net&gt;
             James Simmons &lt;jsimmons@infradead.org&gt;
             Henrique de Moraes Holschuh &lt;hmh@hmh.eng.br&gt;
Status     : problem is being discussed


Subject    : nvidiafb broken
References : [ message continues ]
" title="http://lkml.org/lkml/2007...">http://lkml.org/lkml/2007...
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Nelless <andrew@...>, Len Brown <len.brown@...>, Antonino A. Daplas <adaplas@...>, <linux-acpi@...>, Jiri Kosina <jikos@...>, Henrique de Moraes Holschuh <hmh@...>, Yaroslav Halchenko <kernel@...>, Alex Romosan <romosan@...>, David Miller <davem@...>, James Simmons <jsimmons@...>, <benh@...>, Andreas Schwab <schwab@...>
Date: Monday, March 5, 2007 - 8:21 am

The confirmed fixes for both of these are in the backlight tree. I was
waiting for any further feedback on the first issue above but will send
them to Linus now.

Cheers,

Richard

-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Nelless <andrew@...>, Len Brown <len.brown@...>, <linux-acpi@...>, Jiri Kosina <jikos@...>, Richard Purdie <rpurdie@...>, Henrique de Moraes Holschuh <hmh@...>, Yaroslav Halchenko <kernel@...>, Alex Romosan <romosan@...>, David Miller <davem@...>, James Simmons <jsimmons@...>, <benh@...>, Andreas Schwab <schwab@...>
Date: Monday, March 5, 2007 - 6:35 am

This is not a framebuffer nor console problem.

I think Andrew Nelless confirmed that the cause is from the above
commit. How to fix it, I don't know.  Perhaps the
acpi_skip_timer_override boot option has to be used.

Tony

-
To: Antonino A. Daplas <adaplas@...>
Cc: <linux-kernel@...>
Date: Monday, March 5, 2007 - 11:06 am

Yes, apologies for taking so long with this.
I tried the acpi_skip_timer_override boot option last night,
after Tony pointed it out, and this also works around the
problem.

To summarize the cause is the changes made to early-quirks.c
in the mentioned commit and when this is reverted the problem
goes away.

There doesn't seem to be any sign of a living HPET on this
board or any way of enabling it in the current BIOS revision
but it seems on intermittent boots the check in early-quirks.c
returns, the timer override doesn't happen, and the kernel
fails to boot properly.

Btw, this is the Asus A8N-VM *CSM* main board, the non-CSM
variety actually has a nForce 410 rather than an
nForce 430 chip. I don't know whether they behave any
differently but the two boards actually have different BIOS
releases.

If reverting the commit would disable the HPET on boards
that do actually support it I personally don't mind using
the acpi_skip_timer_override workaround.

-
  Andrew

-
To: Antonino A. Daplas <adaplas@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Nelless <andrew@...>, <linux-acpi@...>, Jiri Kosina <jikos@...>, Richard Purdie <rpurdie@...>, Henrique de Moraes Holschuh <hmh@...>, Yaroslav Halchenko <kernel@...>, Alex Romosan <romosan@...>, David Miller <davem@...>, James Simmons <jsimmons@...>, <benh@...>, Andreas Schwab <schwab@...>
Date: Thursday, March 8, 2007 - 7:28 pm

Looks like I got fooled by the negative logic for the nvidia_bugs().
Please test this patch -- it should fix it,
as well as simplify the code a bit.

thanks,
-Len


Subject: ACPI: repair nvidia early quirk breakage on x86_64

x86_64 nvidia_bugs() broke when we bailed out on not finding the HPET.
However, the quirk works by checking for _not_ finding the HPET...

Delete the nvidia_hpet_detected flag and simply test for
not finding the HPET, which is simple to do now that
acpi_table_parse returns 1 on failure.

Signed-off-by: Len Brown &lt;len.brown@intel.com&gt;
---
 i386/kernel/acpi/earlyquirk.c |    7 +------
 x86_64/kernel/early-quirks.c  |    9 +--------
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/i386/kernel/acpi/earlyquirk.c b/arch/i386/kernel/acpi/earlyquirk.c
index bf86f76..7fdba8a 100644
--- a/arch/i386/kernel/acpi/earlyquirk.c
+++ b/arch/i386/kernel/acpi/earlyquirk.c
@@ -14,11 +14,8 @@
 
 #ifdef CONFIG_ACPI
 
-static int nvidia_hpet_detected __initdata;
-
 static int __init nvidia_hpet_check(struct acpi_table_header *header)
 {
-	nvidia_hpet_detected = 1;
 	return 0;
 }
 #endif
@@ -29,9 +26,7 @@ static int __init check_bridge(int vendor, int device)
 	/* According to Nvidia all timer overrides are bogus unless HPET
 	   is enabled. */
 	if (!acpi_use_timer_override &amp;&amp; vendor == PCI_VENDOR_ID_NVIDIA) {
-		nvidia_hpet_detected = 0;
-		acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check);
-		if (nvidia_hpet_detected == 0) {
+		if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check) {
 			acpi_skip_timer_override = 1;
 			  printk(KERN_INFO "Nvidia board "
                        "detected. Ignoring ACPI "
diff --git a/arch/x86_64/kernel/early-quirks.c b/arch/x86_64/kernel/early-quirks.c
index 8047ea8..dec587b 100644
--- a/arch/x86_64/kernel/early-quirks.c
+++ b/arch/x86_64/kernel/early-quirks.c
@@ -30,11 +30,8 @@ static void via_bugs(void)
 
 #ifdef CONFIG_ACPI
 
-static int nvidia_hpet_detected __initdata;
-
 stati...
To: Len Brown <lenb@...>
Cc: <linux-kernel@...>
Date: Friday, March 9, 2007 - 3:25 pm

Yep. You can knock this one off the regression
list :)

Thanks,
  Andrew


-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Mathieu <Mathieu.Berard@...>, <jgarzik@...>, <linux-ide@...>, Michal Jaegermann <michal@...>, Fabio Comolli <fabio.comolli@...>, Tejun Heo <htejun@...>, Janosch Machowinski <jmachowinski@...>, Lukas Hejtmanek <xhejtman@...>, Meelis Roos <mroos@...>, Olivier Mondoloni <darkcore71@...>, Thomas Renninger <trenn@...>, Robert Moore <robert.moore@...>, <lenb@...>, <linux-acpi@...>
Date: Sunday, March 4, 2007 - 9:50 pm

This email lists some known regressions in 2.6.21-rc2 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : NCQ problem with ahci and Hitachi drive
References : http://lkml.org/lkml/2007/3/4/178
Submitter  : Mathieu Bérard &lt;Mathieu.Berard@crans.org&am