Re: Linux v2.6.21-rc3

Previous thread: [kj]Patch8:replace pci_find_device in drivers/telephony/ixj.c by Surya on Wednesday, March 7, 2007 - 12:45 am. (1 message)

Next thread: kernel-headers by zhangxiliang on Wednesday, March 7, 2007 - 1:14 am. (2 messages)
To: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, March 7, 2007 - 12:59 am

We've finally hopefully started to put a dent in the regressions,
especially the suspend/resume problems introduced since 2.6.20.

So 2.6.21-rc3 is out there now, and there's some hope that it will work
more widely than -rc1 and -rc2 did. Please do give it a good testing, and
update Adrian and the mailing list (and me) about any regressions
(hopefully many more of the "it's fixed now" than other kinds, but all
regressions are interesting).

The appended shortlog gives a reasonable overview. In general we're
definitely calming down, and most of the changes are fairly small and
obvious fixes.

Let's keep the fixes to a minimum, especially since I'm planning on biting
peoples heads off if I get any more pull requests for things that aren't
real and obvious fixes.

Linus

---

Adam Litke (1):
Fix get_unmapped_area and fsync for hugetlb shm segments

Adrian Bunk (8):
HID: hid-debug.c should #include <linux/hid-debug.h>
arch/arm26/kernel/entry.S: remove dead code
make ipc/shm.c:shm_nopage() static
mm/{,tiny-}shmem.c cleanups
drivers/video/sm501fb.c: make 4 functions static
fix the SYSCTL=n compilation
arch/i386/kernel/vmi.c must #include <asm/kmap_types.h>
remove arch/i386/kernel/tsc.c:custom_sched_clock

Ahmed S. Darwish (1):
KVM: Use ARRAY_SIZE macro instead of manual calculation.

Akira Iguchi (1):
scc_pata: bugfix for checking DMA IRQ status

Alan Cox (4):
libata-core: Fix simplex handling
pata_qdi: Fix initialisation
siimage: DRAC4 note
ide: remove a ton of pointless #undef REALLY_SLOW_IO

Alexandr Andreev (1):
[IA64] sync compat getdents

Alexey Dobriyan (1):
geode-aes: use unsigned long for spin_lock_irqsave

Allan Graves (1):
uml: enable RAW

Andres Salomon (3):
i386: make x86_64 tsc header require i386 rather than vice-versa
hrtimers: fix HRTIMER_CB_IRQSAFE_NO_SOFTIRQ description
hrtimers: hrtimer_clock...

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Mark Lord <lkml@...>, Jim Radford <radford@...>, Oliver Neukum <oneukum@...>, <gregkh@...>, <linux-usb-devel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Randy Cushman <rcushman_linux@...>, Takashi Iwai <tiwai@...>, <perex@...>, <alsa-devel@...>, Vladimir Brik <no.hope@...>, Andi Kleen <ak@...>, Ray Lee <ray-lk@...>, Alexey Starikovskiy <alexey.y.starikovskiy@...>, Len Brown <lenb@...>, <linux-acpi@...>
Date: Wednesday, March 14, 2007 - 2:11 pm

This email lists known regressions in Linus' tree compared to 2.6.20
with patches available.

If possible, the patches should be included in 2.6.21-rc4 for reducing
the number of known regressiond in -rc4 a little bit.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter : Mark Lord <lkml@rtr.ca>
Caused-By : Jim Radford <radford@blackbean.org>
commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
Jim Radford <radford@blackbean.org>
Patch : http://lkml.org/lkml/2007/3/13/217
Status : patch available

Subject : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By : Randy Cushman <rcushman_linux@earthlink.net>
commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Takashi Iwai <tiwai@suse.de>
Status : patch available

Subject : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status : patch available

Subject : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter : Ray Lee <ray-lk@madrabbit.org>
Caused-By : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch : [ message continues ]

" title="http://lkml.org/lkml/2007/3/...">http://lkml.org/lkml/2007/3/...

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Greg Kroah-Hartman <gregkh@...>, David Miller <davem@...>, Jeff Garzik <jeff@...>
Date: Tuesday, March 13, 2007 - 3:26 pm

Here is a quick summary of the regressions I am looking at.

- Currently we appear to have a pid leak in tty_io.c
http://lkml.org/lkml/2007/3/8/222

- There is a missing init_WORK in vt.c that cases oops
when we attempt to use SAK.
http://lkml.org/lkml/2007/3/11/148

- We have a network ABI regression caused by the latest sysfs
changes to net-sysfs.c In particular we now cannot rename network
devices if our destination name happens to be the name of a sysfs file that
the network device appears in, and if we try the kernel gets very
confused and we loose access to the network device.

Do we just want to revert commit 43cb76d91ee85f579a69d42bc8efc08bac560278
Greg has been working on this off and on and has not found a
simple solution yet.

- pci_save_state, pci_restore_state are broken and have been for a
while if used on anything besides plain pci (pci-x, pci-e and msi)
and are not used in pairs. (gregkh and Andrew have the patches to
correct this).

- I am still confirming that I have fixed all of the irq handling
problems that resulted in the "No irq for vector" message. I think
I have but I have at least one indirect bug report that I'm still
following up on.

Eric
-

To: Eric W. Biederman <ebiederm@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, Jeff Garzik <jeff@...>
Date: Tuesday, March 13, 2007 - 3:40 pm

I do not think this should be reverted, as the odds that some one will
rename their network device to be "irq" or something else that is in the
pci device's directory is pretty slim. It also only shows up if
CONFIG_SYSFS_DEPRECATED is disabled, not the common option.

But I am still working on it, I sent you and Kay a patch that, while it

I think these are already in Linus's tree right now, right?

thanks,

greg k-h
-

To: Greg KH <gregkh@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, Jeff Garzik <jeff@...>
Date: Tuesday, March 13, 2007 - 4:04 pm

Oops I missed that.

Eric
-

To: Greg KH <gregkh@...>
Cc: Eric W. Biederman <ebiederm@...>, Linux Kernel Mailing List <linux-kernel@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, Jeff Garzik <jeff@...>
Date: Tuesday, March 13, 2007 - 3:48 pm

Yes. I just wanted some more testing of it, and while I didn't hear much,
at least Auke added his ack, and the old state was clearly broken, so they
got applied yesterday.

Linus
-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Vladimir Brik <no.hope@...>, Andi Kleen <ak@...>, <gregkh@...>, <linux-pci@...>, Randy Dunlap <randy.dunlap@...>, <lenb@...>, <linux-acpi@...>, Ray Lee <ray-lk@...>, Colchao <colchaodemola@...>, Mathieu <Mathieu.Berard@...>, Tejun Heo <htejun@...>, <jgarzik@...>, <linux-ide@...>, Michal Jaegermann <michal@...>, Fabio Comolli <fabio.comolli@...>, Plamen Petrov <plamen.petrov@...>, Laurent Riffard <laurent.riffard@...>
Date: Tuesday, March 13, 2007 - 8:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status : problem is being debugged

Subject : x86_64: boot hangs unless CONFIG_PCIEPORTBUS=n and acpi=off
References : http://bugzilla.kernel.org/show_bug.cgi?id=8162
Submitter : Randy Dunlap <randy.dunlap@oracle.com>
Status : unknown

Subject : ACPI regression with noapic
References : http://lkml.org/lkml/2007/3/8/468
Submitter : Ray Lee <ray-lk@madrabbit.org>
Status : unknown

Subject : acpi_serialize locks system during boot
References : http://bugzilla.kernel.org/show_bug.cgi?id=8171
Submitter : Colchao <colchaodemola@gmail.com>
Status : unknown

Subject : NCQ problem with ahci and Hitachi drive (ACPI related)
References : http://lkml.org/lkml/2007/3/4/178
http://lkml.org/lkml/2007/3/9/475
Submitter : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
Status : unknown

Subject : kernels fail to boot with drives on ATIIXP controller
(ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
http://lkml.org/lkml/2007/3/4/257
Submitter : Michal Jaegermann <michal@ellpspace.math.ualberta.ca>
Status : unknown

Subject : libata: PATA UDMA/100 configured as UDMA/33
References : http://lkml.org/lkml/2007/2/20/294
[ message continues ]

" title="http://www.mail-archive.com/l...">http://www.mail-archive.com/l...

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Vladimir Brik <no.hope@...>
Date: Tuesday, March 13, 2007 - 11:13 am

It uses RDTSC when it shouldn't. Already got a fix for that.

-Andi
-

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Vladimir Brik <no.hope@...>, Andi Kleen <ak@...>, <gregkh@...>, <linux-pci@...>, Randy Dunlap <randy.dunlap@...>, <lenb@...>, <linux-acpi@...>, Ray Lee <ray-lk@...>, Colchao <colchaodemola@...>, Mathieu <Mathieu.Berard@...>, Tejun Heo <htejun@...>, <jgarzik@...>, <linux-ide@...>, Michal Jaegermann <michal@...>, Fabio Comolli <fabio.comolli@...>, Plamen Petrov <plamen.petrov@...>, Laurent Riffard <laurent.riffard@...>
Date: Tuesday, March 13, 2007 - 10:03 am

Some cases should be fixed now but probably not all (eg the Nvidia one)
-

To: Alan Cox <alan@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Vladimir Brik <no.hope@...>, Andi Kleen <ak@...>, <gregkh@...>, <linux-pci@...>, Randy Dunlap <randy.dunlap@...>, <lenb@...>, <linux-acpi@...>, Ray Lee <ray-lk@...>, Colchao <colchaodemola@...>, Mathieu Bérard <Mathieu.Berard@...>, Tejun Heo <htejun@...>, <jgarzik@...>, <linux-ide@...>, Michal Jaegermann <michal@...>, Plamen Petrov <plamen.petrov@...>, Laurent Riffard <laurent.riffard@...>
Date: Tuesday, March 13, 2007 - 4:12 pm

This regression is still present in 2.6.21-rc3-g8b9909de (pulled from
Linus' tree less than one hour ago).

Fabio
-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Len Brown <lenb@...>, <davem@...>, <kuznet@...>, <pekkas@...>, <jmorris@...>, <yoshfuji@...>, <kaber@...>, <netdev@...>, Pavel Machek <pavel@...>, Marcel Holtmann <marcel@...>, <maxk@...>, <bluez-devel@...>, Albert Hopkins <kernel@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>
Date: Tuesday, March 13, 2007 - 8:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : ipv6 crash
References : http://lkml.org/lkml/2007/3/10/2
Submitter : Len Brown <lenb@kernel.org>
Status : unknown

Subject : ThinkPad X60: bluetooth hardlocks
References : http://lkml.org/lkml/2007/3/2/85
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Marcel Holtmann <marcel@holtmann.org>
Status : unknown

Subject : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter : Albert Hopkins <kernel@marduk.letterboxes.org>
Handled-By : Ayaz Abdulla <aabdulla@nvidia.com>
Status : problem is being debugged

-

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Marcel Holtmann <marcel@...>, <maxk@...>, <bluez-devel@...>, Mark Lord <lkml@...>
Date: Tuesday, March 13, 2007 - 9:30 am

On Tue, 13 Mar 2007 13:50:03 +0100,

Does this still happen with -rc3? I'd have thought Mark's patch in
0de1517e23c2e28d58a6344b97a120596ea200bb fixed that...
-

To: Cornelia Huck <cornelia.huck@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Marcel Holtmann <marcel@...>, <maxk@...>, <bluez-devel@...>
Date: Tuesday, March 13, 2007 - 9:35 am

Pavel? Could you retest this now on a ThinkPad X60 ?

???
-

To: Mark Lord <lkml@...>
Cc: Cornelia Huck <cornelia.huck@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Marcel Holtmann <marcel@...>, <maxk@...>, <bluez-devel@...>
Date: Tuesday, March 13, 2007 - 2:13 pm

I can confirm it is fixed.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Stephane Casset <sept@...>, Thomas Gleixner <tglx@...>, Jiri Slaby <jirislaby@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Ingo Molnar <mingo@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Tejun Heo <htejun@...>, Rafael J. Wysocki <rjw@...>, <pavel@...>, <linux-pm@...>
Date: Tuesday, March 13, 2007 - 8:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : Dynticks and High resolution Timer hanging the system
References : http://lkml.org/lkml/2007/3/7/504
Submitter : Stephane Casset <sept@logidee.com>
Caused-By : Thomas Gleixner <tglx@linutronix.de>
Status : unknown

Subject : Clocksource tsc unstable (delta = -154983451 ns)
References : http://lkml.org/lkml/2007/3/9/271
Submitter : Jiri Slaby <jirislaby@gmail.com>
Status : unknown

Subject : hrtimer_switch_to_hres():
wrong tick_init_highres() return value handling
References : http://lkml.org/lkml/2007/3/6/262
Submitter : Linus Torvalds <torvalds@linux-foundation.org>
Caused-By : Thomas Gleixner <tglx@linutronix.de>
commit 54cdfdb47f73b5af3d1ebb0f1e383efbe70fde9e
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : unknown

Subject : soft lockup detected on CPU#0
References : http://lkml.org/lkml/2007/3/3/152
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar <mingo@elte.hu>
Status : unknown

Subject : dynticks makes ksoftirqd1 use unreasonable amount of cpu time
References : http://bugzilla.kernel.org/show_bug.cgi?id=8100
Submitter : Emil Karlson <jkarlson@cc.hut.fi>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : problem is being debugged

Subject : system doesn't come out of suspend (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
...

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Stephane Casset <sept@...>, Jiri Slaby <jirislaby@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Ingo Molnar <mingo@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Tejun Heo <htejun@...>, Rafael J. Wysocki <rjw@...>, <pavel@...>, <linux-pm@...>
Date: Tuesday, March 13, 2007 - 4:46 pm

That's not a regression. That's an informal message, when the TSC
watchdog detects that the TSC is unreliable.

tglx

-

To: Thomas Gleixner <tglx@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Jiri Slaby <jirislaby@...>, Ingo Molnar <mingo@...>, Florian Lohoff <flo@...>
Date: Wednesday, March 14, 2007 - 7:44 am

Looking at [1], there's also be a probably related "doesn't boot"
problem.
My first guess would be commit 6bb74df481223731af6c7e0ff3adb31f6442cfcd
"clocksource init adjustments (fix bug #7426)".

Jiri, is the message also present with 2.6.21-rc2 (at a different place

cu
Adrian

[1] http://lkml.org/lkml/2007/3/13/219

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Thomas Gleixner <tglx@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Jiri Slaby <jirislaby@...>, Ingo Molnar <mingo@...>
Date: Wednesday, March 14, 2007 - 2:02 pm

With the current git of today the halt on boot is gone. I am running=20
it now ...

Flo
--=20
Florian Lohoff flo@rfc822.org +49-171-2280134
Those who would give up a little freedom to get a little=20
security shall soon have neither - Benjamin Franklin

To: Florian Lohoff <flo@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Jiri Slaby <jirislaby@...>, Ingo Molnar <mingo@...>
Date: Wednesday, March 14, 2007 - 2:28 pm

I'm really curious what made it go away.

tglx

-

To: Adrian Bunk <bunk@...>
Cc: Thomas Gleixner <tglx@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, Florian Lohoff <flo@...>
Date: Wednesday, March 14, 2007 - 8:16 am

Yes, it's present there too, some lines below the place, where it is placed
in -rc3.

regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
-

To: Jiri Slaby <jirislaby@...>
Cc: Thomas Gleixner <tglx@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, Florian Lohoff <flo@...>
Date: Wednesday, March 14, 2007 - 1:31 pm

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Stephane Casset <sept@...>, Jiri Slaby <jirislaby@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Ingo Molnar <mingo@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Tejun Heo <htejun@...>, Rafael J. Wysocki <rjw@...>, <pavel@...>, <linux-pm@...>
Date: Tuesday, March 13, 2007 - 4:05 pm

Linus merged the original patch, which solved the real problem.

He just gave me a lesson how to do it right next time.

tglx

-

To: Thomas Gleixner <tglx@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Stephane Casset <sept@...>, Jiri Slaby <jirislaby@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Ingo Molnar <mingo@...>, Emil Karlson <jkarlson@...>, Michael S. Tsirkin <mst@...>, Soeren Sonnenburg <kernel@...>, Tejun Heo <htejun@...>, Rafael J. Wysocki <rjw@...>, <pavel@...>, <linux-pm@...>
Date: Wednesday, March 14, 2007 - 7:31 am

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, <lenb@...>, <linux-acpi@...>, Tomas Janousek <tomi@...>, Thomas Gleixner <tglx@...>, Soeren Sonnenburg <kernel@...>, <jgarzik@...>, <linux-ide@...>, Michael S. Tsirkin <mst@...>, Arkadiusz Miskiewicz <arekm@...>, Konstantin Karasyov <konstantin.a.karasyov@...>, Lukas Hejtmanek <xhejtman@...>
Date: Tuesday, March 13, 2007 - 8:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : resume: slab error in verify_redzone_free(): cache `size-512':
memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter : Pavel Machek <pavel@ucw.cz>
Status : unknown

Subject : beeps get longer after suspend
References : http://lkml.org/lkml/2007/2/26/276
Submitter : Pavel Machek <pavel@ucw.cz>
Status : unknown

Subject : suspend/resume hangs until keypress
References : http://bugzilla.kernel.org/show_bug.cgi?id=8181
Submitter : Tomas Janousek <tomi@nomi.cz>
Status : unknown

Subject : SATA breakage on resume
References : http://lkml.org/lkml/2007/3/7/233
Submitter : Thomas Gleixner <tglx@linutronix.de>
Soeren Sonnenburg <kernel@nn7.de>
Status : unknown

Subject : first disk access after resume takes several minutes
References : http://lkml.org/lkml/2007/3/8/117
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Status : unknown

Subject : after resume: X hangs after drawing a couple of windows
References : http://lkml.org/lkml/2007/3/8/117
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Status : unknown

Subject : ThinkPad Z60m: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
http://lkml.org/lkml/2007/2/28/172
Submitter : Arkadiusz Miskiewicz <arekm@maven.pl>
Caused-By : Konstantin Karasyov <konstantin.a.karasyov@intel.com>
commit 0a6139027f3986162233adc17285151e78b39cac
Handled-By : Kons...

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, <lenb@...>, <linux-acpi@...>, Tomas Janousek <tomi@...>, Thomas Gleixner <tglx@...>, Soeren Sonnenburg <kernel@...>, <jgarzik@...>, <linux-ide@...>, Michael S. Tsirkin <mst@...>, Konstantin Karasyov <konstantin.a.karasyov@...>, Lukas Hejtmanek <xhejtman@...>
Date: Tuesday, March 13, 2007 - 5:46 pm

It's fixed in git tree. Commit ff24ba74b6d3befbfbafa142582211b5a6095d45

--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
-

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, <lenb@...>, <linux-acpi@...>, Tomas Janousek <tomi@...>, Thomas Gleixner <tglx@...>, Soeren Sonnenburg <kernel@...>, <jgarzik@...>, <linux-ide@...>, Michael S. Tsirkin <mst@...>, Arkadiusz Miskiewicz <arekm@...>, Konstantin Karasyov <konstantin.a.karasyov@...>, Lukas Hejtmanek <xhejtman@...>
Date: Tuesday, March 13, 2007 - 2:14 pm

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, <lenb@...>, <linux-acpi@...>
Date: Tuesday, March 13, 2007 - 9:29 am

seems to be fixed in 2.6.21-rc3

--
Lukáš Hejtmánek
-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Dave Jones <davej@...>, Eric W. Biederman <ebiederm@...>, Rafael J. Wysocki <rjw@...>, <pavel@...>, <linux-pm@...>, <gregkh@...>, <linux-pci@...>, <lenb@...>, <linux-acpi@...>, Jens Axboe <jens.axboe@...>, Jeff Chua <jeff.chua.linux@...>, Ray Lee <ray-lk@...>, Alexey Starikovskiy <alexey.y.starikovskiy@...>
Date: Tuesday, March 13, 2007 - 8:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : ThinkPad X60: resume no longer works (PCI related?)
References : http://lkml.org/lkml/2007/3/13/3
Submitter : Dave Jones <davej@redhat.com>
Caused-By : PCI merge
commit 78149df6d565c36675463352d0bfe0000b02b7a7
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
Rafael J. Wysocki <rjw@sisk.pl>
Status : problem is being debugged

Subject : ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
http://lkml.org/lkml/2007/2/28/348
Submitter : Jens Axboe <jens.axboe@oracle.com>
Jeff Chua <jeff.chua.linux@gmail.com>
Status : unknown

Subject : suspend to disk hangs
References : http://lkml.org/lkml/2007/3/6/142
Submitter : Jeff Chua <jeff.chua.linux@gmail.com>
Status : unknown

Subject : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter : Ray Lee <ray-lk@madrabbit.org>
Caused-By : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch : http://lkml.org/lkml/2007/3/12/228
Status : patch available

-

To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Oliver Pinter <oliver.pntr@...>, Sid Boyce <g3vbv@...>, Pavel Machek <pavel@...>, <drzeus-mmc@...>, Mark Lord <lkml@...>, Jim Radford <radford@...>, Oliver Neukum <oneukum@...>, <greg@...>, <linux-usb-devel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Takashi Iwai <tiwai@...>, <perex@...>, <alsa-devel@...>, Randy Cushman <rcushman_linux@...>
Date: Tuesday, March 13, 2007 - 8:49 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.

Subject : crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter : Oliver Pinter <oliver.pntr@gmail.com>
Status : unknown

Subject : kwin dies silently
References : http://lkml.org/lkml/2007/2/28/112
Submitter : Sid Boyce <g3vbv@blueyonder.co.uk>
Status : unknown

Subject : mmc card reader no longer works
References : http://lkml.org/lkml/2007/2/27/91
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Oliver Neukum <oneukum@suse.de>
Status : unknown

Subject : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter : Mark Lord <lkml@rtr.ca>
Caused-By : Jim Radford <radford@blackbean.org>
commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
Jim Radford <radford@blackbean.org>
Status : problem is being debugged

Subject : snd_intel8x0: divide error: 0000
References : http://lkml.org/lkml/2007/3/5/252
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Takashi Iwai <tiwai@suse.de>
Status : submitter was asked to test a patch

Subject : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By : Randy Cushman <rcushman_linux@earthlink.net>
commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Randy Cushman <rcushman_linux@earthlink.net>
Takashi ...

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <perex@...>, <alsa-devel@...>, Randy Cushman <rcushman_linux@...>
Date: Tuesday, March 13, 2007 - 9:40 am

At Tue, 13 Mar 2007 13:49:57 +0100,

Already fixed. The patch is in ALSA HG tree, but not synced to
git...
Jaroslav, could you do prepare and push request ASAP, please?

thanks,

Takashi
-

To: Pavel Machek <pavel@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Oliver Neukum <oneukum@...>
Date: Tuesday, March 13, 2007 - 9:08 am

First I heard of this. The error report is a bit thin so Pavel will need to
elaborate a bit more.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-

To: Pierre Ossman <drzeus-mmc@...>
Cc: Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 9:36 am

The device is a USB serial device. USB serial was known to have issues
in the version this happened. As far as I know the bug has not been
replicated after this bugs were fixed.

Regards
Oliver
-

To: Oliver Neukum <oneukum@...>
Cc: Pierre Ossman <drzeus-mmc@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 2:11 pm

Ahha, now I see where the confusion comes from.

No, the reader is not a serial device, it is reader build-in x60. USB
serial device (siemens sx1) has separate problem.

Device is

15:00.2 Generic system peripheral [0805]: Ricoh Co Ltd R5C822
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 18)

root@amd:~# ls -al /dev/mmc
brw-r--r-- 1 root root 251, 0 Nov 5 16:57 /dev/mmc
...

...anything else I should try? Card is obviously detected, but I can't
access it..

Uhuh. User error, lets close the report.

mmc changed the major to

236 mmc

... while it was something else in 2.6.20. Can we get stable device
allocation for mmc?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Pavel Machek <pavel@...>
Cc: Oliver Neukum <oneukum@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 3:07 pm

What kind of savages do not use udev these days?! ;)

I don't have the time and energy to jump through all the hoops required to get
an official number right now. Most users use udev and those that don't can use
the "major" parameter for mmc_block.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-

To: Pierre Ossman <drzeus-mmc@...>
Cc: Oliver Neukum <oneukum@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 4:05 pm

That's okay, but if one of those savages got major for you, would you
be willing to use it? :-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Pavel Machek <pavel@...>
Cc: Oliver Neukum <oneukum@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 4:31 pm

Indeed I would.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-

To: Pierre Ossman <drzeus-mmc@...>
Cc: Pavel Machek <pavel@...>, Oliver Neukum <oneukum@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 3:15 pm

Those whose Linux installation predates the devfs hype
and postdates the devfs hype
and predates the udev hype
and will postdate the udev hype
and predates the next hype

cu
Adri "static /dev" an

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 3:12 pm

hi,

i don't know if you ever used linux on embedded devices like set-top-boxes.

you have a mostly fixed device infrastructure on those devices.

even if you call it a "kind of savage",
using udev there instead of fixed major device numbers is crap.

best regards
marcel

-

To: <linux-kernel@...>
Date: Thursday, March 8, 2007 - 1:28 pm

(Dropped LKML, whoops.)

Robert and Jeff already know about these, but I thought I'd send out a
reminder.

ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000
status 0x500 next cpb count 0x0 next cpb idx 0x0
ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd 35/00:30:b5:c1:8f/00:01:01:00:00/e0 tag 0 cdb 0x0 data 155648 out
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA

They didn't happen (or didn't happen as frequently) in 2.6.20; it's a serious
bug. Happened in -rc2 and -rc3. A patch from Robert reverting
721449bf0d51213fe3abf0ac3e3561ef9ea7827a seems to make them go away.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Jeff Garzik <jeff@...>
Date: Wednesday, March 7, 2007 - 10:22 am

Still having SATA breakage on resume:

Caught that one (from screen)

ATA: abnormal status 0x7F on port 0x000118cf
irq 21: nobody cared (try booting ......)
...
Disabling IRQ #21

During normal boot I see the "ATA: abnormal status 0x7F on port
0x000118cf" once, but there the system behaves normal

tglx

-

To: <tglx@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Jeff Garzik <jeff@...>
Date: Wednesday, March 7, 2007 - 1:42 pm

maybe that is also causing the hang I am still seeing with the full
config... :(
(no display, no usb device activation, but I tend to think the mbp wants
to access the hdd...)

SCSI device sda: write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ata1.00: qc timeout (cmd 0xa1)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7

Soeren
--
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Jeff Garzik <jeff@...>
Date: Wednesday, March 7, 2007 - 1:14 pm

I enabled ATA_DEBUG and hacked it to provide debug output only on
resume. Now the disk resumes and no stale interrupt happens.

Full log at: http://www.tglx.de/private/tglx/sata-2.6.21-rc3.log

Both states are fully reproducible. (DEBUG ON/OFF == GOOD/BAD)

/me continues the libata exploration

tglx

-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Tim Waugh <tim@...>, <linux-parport@...>
Date: Wednesday, March 7, 2007 - 9:09 am

BTW. Does anyone care about parport console?
console=lp0 hangs since at least 2.6.18

Calling initcall 0xc0438939: pty_init+0x0/0x231()
Calling initcall 0xc0439235: lp_init_module+0x0/0x238()
lp: driver loaded but no devices found
Calling initcall 0xc043947f: mod_init+0x0/0x286()
intel_rng: FWH not detected
Calling initcall 0xc0439aa9: serial8250_init+0x0/0x114()
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
PM: Adding info for platform:serial8250
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
PM: Adding info for No Bus:ttyS2
PM: Adding info for No Bus:ttyS3
Calling initcall 0xc0439c6c: serial8250_pnp_init+0x0/0xf()
PM: Removing info for No Bus:ttyS0
00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
PM: Removing info for No Bus:ttyS1
00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
Calling initcall 0xc0439c7b: serial8250_pci_init+0x0/0x16()
Calling initcall 0xc043a16d: parport_default_proc_register+0x0/0x16()
Calling initcall 0xc043a250: parport_pc_init+0x0/0x196()
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-...

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

To: <linux-parport@...>
Cc: Michal Piotrowski <michal.k.k.piotrowski@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Tim Waugh <tim@...>
Date: Wednesday, March 7, 2007 - 1:14 pm

For the record, I used console=lp0 quite recently (stock 2.6.19 according to
the printout, running on i386) [to find out what was causing a panic that
immediately vanished off the top of the screen because of "atkbd.c: Spurious
ACK..."s from the flashing kb LEDs] and it worked just fine.

The parport-related lines went:

lp: driver loaded but no devices found
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
parport0: Printer, EPSON Stylus COLOR 600
lp0: using parport0 (interrupt-driven)
lp0: console ready

... then the kernel continued booting until the panic occurred (it was a silly
storage-related misconfig on my part).

If anyone wants me to try anything (newer kernel or different parport-related
BIOS settings, perhaps, to see if I can duplicate the problem?) and report
back, let me know.

Stephen
-

To: Stephen Mollett <molletts@...>
Cc: <linux-parport@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Tim Waugh <tim@...>
Date: Wednesday, March 7, 2007 - 1:35 pm

ISTR lp consoles block indefinitely until the printer is ready, so
if you ask for a lp console but don't have a working printer connected
it will hang.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
-

To: Michal Piotrowski <michal.k.k.piotrowski@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Tim Waugh <tim@...>, <linux-parport@...>
Date: Wednesday, March 7, 2007 - 12:25 pm

I do think we care, but I don't think anybody in particular feels singled

Ok, that's not exactly new then, which implies that not a *lot* of people
even care ;)

Do you think you'd be willing to try to figure out when it started? You
seem to be the first one to have even noticed.

(I tried to google it, and the most recent thing google finds is your
report, although I also saw a report of somebody trying it under qemu in
July last year and also reported a hang)

Looking through the history of the last few years (it in git), I don't see
anything even *remotely* suspicious there, so it's probably either
(a) really old, and hasn't worked in a loong time and nobody just uses it
(b) something really stupid that happened while doing other cleanups (but
the changes in the last two years are *literally* just things like
removing devfs support)
(c) some infrastructure change that subtly broke lpconsole, probably
causing an oops during printk, which obviously results in a printk
itself, which thus hangs.

It would be good to get it fixed, although for obvious reasons it's not a
huge priority..

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>
Date: Wednesday, March 7, 2007 - 8:56 am

Hi,

I get this while
echo shutdown > /sys/power/disk; echo disk > /sys/power/state

BUG: using smp_processor_id() in preemptible [00000001] code: swsusp_shutdown/3359
caller is check_tsc_sync_source+0x1b/0xef
[<c010503d>] show_trace_log_lvl+0x1a/0x2f
[<c0105724>] show_trace+0x12/0x14
[<c01057d6>] dump_stack+0x16/0x18
[<c01f835e>] debug_smp_processor_id+0xa2/0xb4
[<c0113cc5>] check_tsc_sync_source+0x1b/0xef
[<c011367d>] __cpu_up+0x136/0x158
[<c0141aec>] _cpu_up+0x74/0xbf
[<c0141b5d>] cpu_up+0x26/0x38
[<c0141bbc>] enable_nonboot_cpus+0x4d/0x9a
[<c0146ae0>] pm_suspend_disk+0x11c/0x210
[<c014597e>] enter_state+0x50/0x1d0
[<c0145b84>] state_store+0x86/0x9c
[<c01a53d0>] subsys_attr_store+0x20/0x25
[<c01a54ea>] sysfs_write_file+0xc1/0xe9
[<c017199b>] vfs_write+0xaf/0x138
[<c0171f65>] sys_write+0x3d/0x61
[<c0104064>] syscall_call+0x7/0xb
=======================

l *check_tsc_sync_source+0x1b/0xef
0xc0113caa is in check_tsc_sync_source (/mnt/md0/devel/linux-git/arch/i386/kernel/../../x86_64/kernel/tsc_sync.c:99).
94 /*
95 * Source CPU calls into this - it waits for the freshly booted
96 * target CPU to arrive and then starts the measurement:
97 */
98 void __cpuinit check_tsc_sync_source(int cpu)
99 {
100 int cpus = 2;
101
102 /*
103 * No need to check if we already know that the TSC is not

echo platform > /sys/power/disk; echo disk > /sys/power/state
doesn't work (as always).

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/boot...
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-...

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

To: Michal Piotrowski <michal.k.k.piotrowski@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, March 7, 2007 - 12:34 pm

[ Ingo and Thomas added to Cc, because I think this is them.. ]

Ingo, I think this came in during commit 95492e4646, "x86: rewrite SMP TSC
sync code".

(Leaving the original message quoted in full for Ingo and Thomas, sorry
for the waste of bandwidth)

Linus

---
-

To: Linus Torvalds <torvalds@...>
Cc: Michal Piotrowski <michal.k.k.piotrowski@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, March 7, 2007 - 1:12 pm

Michal, could you try the patch below?

Ingo

----------------------------->
Subject: [patch] CPU hotplug: call check_tsc_sync_source() with irqs off
From: Ingo Molnar <mingo@elte.hu>

check_tsc_sync_source() depends on being called with irqs disabled (it
checks whether the TSC is coherent across two specific CPUs). This is
incidentally true during bootup, but not during cpu hotplug __cpu_up().
This got found via smp_processor_id() debugging.

disable irqs explicitly and remove the unconditional enabling of
interrupts. Add touch_nmi_watchdog() to the cpu_online_map busy loop.

this bug is present both on i386 and on x86_64.

Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/i386/kernel/smpboot.c | 16 ++++++++++------
arch/x86_64/kernel/smpboot.c | 5 ++++-
2 files changed, 14 insertions(+), 7 deletions(-)

Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -50,6 +50,7 @@
#include <linux/notifier.h>
#include <linux/cpu.h>
#include <linux/percpu.h>
+#include <linux/nmi.h>

#include <linux/delay.h>
#include <linux/mc146818rtc.h>
@@ -1283,8 +1284,9 @@ void __cpu_die(unsigned int cpu)

int __cpuinit __cpu_up(unsigned int cpu)
{
+ unsigned long flags;
#ifdef CONFIG_HOTPLUG_CPU
- int ret=0;
+ int ret = 0;

/*
* We do warm boot only on cpus that had booted earlier
@@ -1302,23 +1304,25 @@ int __cpuinit __cpu_up(unsigned int cpu)
/* In case one didn't come up */
if (!cpu_isset(cpu, cpu_callin_map)) {
printk(KERN_DEBUG "skipping cpu%d, didn't come online\n", cpu);
- local_irq_enable();
return -EIO;
}

- local_irq_enable();
-
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* Unleash the CPU! */
cpu_set(cpu, smp_commenced_mask);

/*
- * Check TSC synch...

To: Linux Kernel Mailing List <linux-kernel@...>
Cc: Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, March 7, 2007 - 3:12 pm

Hi,

I just tryed linux-2.6.21-rc3 on my machine (P4HT 2.8GHz, with 512Mo)
with Tickless System (Dynamic Ticks) and High Resolution Timer Support
(.config in attachement)

The problem is that the kernel hang on boot. I tried different
configuration with nohz and highres on the kernel command line.

The only combination that works is : nohz=off highres=off

I also tried compiling the kernel without Tickless and without High
resolution timer, this kernel is working ok and is one of the first
kernel to suspend and resume from RAM. Congratulations ! ;p

I tried to compile te kernel with only Tickless System or High
Resolution timer, both hang on boot.

The hang is just after :
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x2040-0x2047, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x2048-0x204f, BIOS settings: hdc:DMA, hdd:pio

And I have the message :
Switched to NOHZ mode on CPU #1
or
Switched to high resolution mode on CPU #1
Depending on the option enabled/disabled

What can I do to help find the bug ?

dmesg and .config of the system booted with nohz=off highres=off are in
attachements.

Regards
--
St

To: Stephane Casset <sept@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Wednesday, March 7, 2007 - 3:52 pm

There should be no difference between compile time and runtime

Can you capture a boot log with highres and/or dynticks enabled ?

Enable CONFIG_SERIAL_8250_CONSOLE and add "console=ttyS0,115200" to the
commandline. Capture the output with minicom on a second box.

Also please enable CONFIG_MAGIC_SYSRQ and try to send a SysRq-T and a
SysRq-Q to the machine via keyboard or the serial line.

Thanks

tglx

-

To: Thomas Gleixner <tglx@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Wednesday, March 7, 2007 - 5:16 pm

When the system hangs, the keyboard is dead :(

I just tried clocksource=acpi_pm and the hang disapears...

I tested 2.6.21-rc1 which also hangs but not always, when it hangs I
tried Sysrq-T and got this, I noted in parenthesis some value when it does'nt
hang...

SysRq : Show Pending Timers
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: X
now at XXXXXXXXXXX nsecs
cpu: 0
clock 0:
.index: 0
.resolution: 10000000 nsecs / 1ns (when it does'nt hang)
.get_time: ktime_get_real
.offset: 0 nsecs
active timers:
clock 1:
.index: 1
.resolution: 10000000 nsecs / 1ns (when it does'nt hang)
.get_time: ktime_get
.offset: 0 nsecs
active timers:
.expires_next : 9223372036854775807 nsecs (some thing resonneable when not hanging)
Almost the same for cpu1
and

Tick Device: mode: 1
Clock Event Device: pit
max_delta_ns: 27461866
min_delta_ns: 12571
mult: 5124677
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: pit_next_event
set_mode: init_pit_timer
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000001
tick_broadcast_oneshot_mask: 00000000

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 672715459
min_delta_ns: 1202
mult: 53557254
shift: 32
mode: 3
next_event: 84460000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 672715459
min_delta_ns: 1202
mult: 53557254
shift: 32
mode: 3
next_event: 84790000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

So it seems that the clock source selection is not working properly or the pit
(the default clock source right ?) is not correctly initialised...

If you need the complete SYSRQ...

To: Stephane Casset <sept@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Ingo Molnar <mingo@...>, Len Brown <lenb@...>, Arjan van de Ven <arjan@...>
Date: Wednesday, March 7, 2007 - 6:09 pm

Hrmpf. Netconsole should work.

Enable CONFIG_NETCONSOLE and compile the network driver into your
kernel. See Documentation/networking/netconsole.txt for the kernel
command line option.

------------------------------^

ACPI does only take care of one CPU

ACPI: processor limited to max C-state 1
ACPI: CPU0 (power states: C1[C1] C3[C3])
ACPI: Processor [CPU0] (supports 8 throttling states)

but there is no entry for the second CPU.

Also it seems that the power state limit is possibly ignored.

That would explain the hang, as TSC and local APIC might get stuck.

Broken BIOS/ACPI I fear. Can you please go to

http://www.linuxfirmwarekit.org/download.php

Not now.

tglx

-

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, March 7, 2007 - 1:45 pm

I think that this patch fixes the problem. Thanks!

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

To: Greg KH <greg@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, March 7, 2007 - 6:25 am

Greg, I think we should revert that patch in 2.6.20.x stable serie too
as get_order is broken there as well, causing random kernel memory
corruption every now and then among others.

Cheers,
Ben

-

To: Benjamin Herrenschmidt <benh@...>
Cc: Greg KH <greg@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, March 7, 2007 - 11:39 am

Did you confirm that that was indeed the cause of the problem you saw?

As far as I can tell, the bug (because it tested the wrong #define) would
only affect the constant-size case, and only for something larger than a
single page, and only for a non-power-of-two size. So it looked fairly
hard to trigger, if only because all the obvious constants I saw seemed
to already be powers-of-two..

So did you hunt it down to a particular cases where it triggers?

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Greg KH <greg@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Thursday, March 8, 2007 - 4:08 am

Well, at least one of the problem I caught with my ppc32 implementation
of DEBUG_PAGEALLOC yes. PowerPC dma_alloc_coherent, on machines with
cache consistent PCI DMA, would use get_order to allocate pages and then
memset over the size passed in. The ide-pmac driver, among others, would
trigger that bug by asking for 0x1020 bytes while get_order only
returned 0. (I should look into making the ide-pmac driver allocate <=
4K but that's a different matter).

Yup, the above. Calls to dma_alloc_consistent with a constant size that
is not a multiple of the page size and larger than one page. (Our
dma_alloc_consistent implementation on 32 bits is inline).

Ben.

-

To: Linus Torvalds <torvalds@...>
Cc: Benjamin Herrenschmidt <benh@...>, Greg KH <greg@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, March 7, 2007 - 4:52 pm

IIRC, it crashed on boot in the powerpc iommu code when slab
debugging is enabled. Not sure if it was on Cell or on benh's
powerbook though.

Arnd <><
-

To: Arnd Bergmann <arnd@...>
Cc: Linus Torvalds <torvalds@...>, Greg KH <greg@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Thursday, March 8, 2007 - 4:10 am

Not iommu code, but dma_alloc_coherent() for non-iommu 32 bits
machines :-) Oh and it wasn't slab but DEBUG_PAGEALLOC :-)

Ben.

-

To: Benjamin Herrenschmidt <benh@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, March 7, 2007 - 9:26 am

Now added to the -stable tree, thanks for pointing it out to me.

greg k-h
-

To: Greg KH <greg@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>
Date: Wednesday, March 7, 2007 - 10:15 am

Greg / Adrian,

I didn't see anything in -rc3 to address the USB hub/serial crashes
reported here for -rc2. What's the status for those, or who should
I be pinging to get them fixed?

Thanks

-

To: Mark Lord <lkml@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>
Date: Wednesday, March 7, 2007 - 10:22 am

I have a series of USB bugfixes that need to get sent to Linus that
should fix the serial issues. I'll get to them after I drag this next
-stable release out the door...

thanks,

greg k-h
-

Previous thread: [kj]Patch8:replace pci_find_device in drivers/telephony/ixj.c by Surya on Wednesday, March 7, 2007 - 12:45 am. (1 message)

Next thread: kernel-headers by zhangxiliang on Wednesday, March 7, 2007 - 1:14 am. (2 messages)