We've finally hopefully started to put a dent in the regressions,
especially the suspend/resume problems introduced since 2.6.20.So 2.6.21-rc3 is out there now, and there's some hope that it will work
more widely than -rc1 and -rc2 did. Please do give it a good testing, and
update Adrian and the mailing list (and me) about any regressions
(hopefully many more of the "it's fixed now" than other kinds, but all
regressions are interesting).The appended shortlog gives a reasonable overview. In general we're
definitely calming down, and most of the changes are fairly small and
obvious fixes.Let's keep the fixes to a minimum, especially since I'm planning on biting
peoples heads off if I get any more pull requests for things that aren't
real and obvious fixes.Linus
---
Adam Litke (1):
Fix get_unmapped_area and fsync for hugetlb shm segmentsAdrian Bunk (8):
HID: hid-debug.c should #include <linux/hid-debug.h>
arch/arm26/kernel/entry.S: remove dead code
make ipc/shm.c:shm_nopage() static
mm/{,tiny-}shmem.c cleanups
drivers/video/sm501fb.c: make 4 functions static
fix the SYSCTL=n compilation
arch/i386/kernel/vmi.c must #include <asm/kmap_types.h>
remove arch/i386/kernel/tsc.c:custom_sched_clockAhmed S. Darwish (1):
KVM: Use ARRAY_SIZE macro instead of manual calculation.Akira Iguchi (1):
scc_pata: bugfix for checking DMA IRQ statusAlan Cox (4):
libata-core: Fix simplex handling
pata_qdi: Fix initialisation
siimage: DRAC4 note
ide: remove a ton of pointless #undef REALLY_SLOW_IOAlexandr Andreev (1):
[IA64] sync compat getdentsAlexey Dobriyan (1):
geode-aes: use unsigned long for spin_lock_irqsaveAllan Graves (1):
uml: enable RAWAndres Salomon (3):
i386: make x86_64 tsc header require i386 rather than vice-versa
hrtimers: fix HRTIMER_CB_IRQSAFE_NO_SOFTIRQ description
hrtimers: hrtimer_clock...
This email lists known regressions in Linus' tree compared to 2.6.20
with patches available.If possible, the patches should be included in 2.6.21-rc4 for reducing
the number of known regressiond in -rc4 a little bit.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter : Mark Lord <lkml@rtr.ca>
Caused-By : Jim Radford <radford@blackbean.org>
commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
Jim Radford <radford@blackbean.org>
Patch : http://lkml.org/lkml/2007/3/13/217
Status : patch availableSubject : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By : Randy Cushman <rcushman_linux@earthlink.net>
commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Takashi Iwai <tiwai@suse.de>
Status : patch availableSubject : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status : patch availableSubject : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter : Ray Lee <ray-lk@madrabbit.org>
Caused-By : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch : [ message continues ]
Here is a quick summary of the regressions I am looking at.
- Currently we appear to have a pid leak in tty_io.c
http://lkml.org/lkml/2007/3/8/222- There is a missing init_WORK in vt.c that cases oops
when we attempt to use SAK.
http://lkml.org/lkml/2007/3/11/148- We have a network ABI regression caused by the latest sysfs
changes to net-sysfs.c In particular we now cannot rename network
devices if our destination name happens to be the name of a sysfs file that
the network device appears in, and if we try the kernel gets very
confused and we loose access to the network device.Do we just want to revert commit 43cb76d91ee85f579a69d42bc8efc08bac560278
Greg has been working on this off and on and has not found a
simple solution yet.- pci_save_state, pci_restore_state are broken and have been for a
while if used on anything besides plain pci (pci-x, pci-e and msi)
and are not used in pairs. (gregkh and Andrew have the patches to
correct this).- I am still confirming that I have fixed all of the irq handling
problems that resulted in the "No irq for vector" message. I think
I have but I have at least one indirect bug report that I'm still
following up on.Eric
-
I do not think this should be reverted, as the odds that some one will
rename their network device to be "irq" or something else that is in the
pci device's directory is pretty slim. It also only shows up if
CONFIG_SYSFS_DEPRECATED is disabled, not the common option.But I am still working on it, I sent you and Kay a patch that, while it
I think these are already in Linus's tree right now, right?
thanks,
greg k-h
-
Oops I missed that.
Eric
-
Yes. I just wanted some more testing of it, and while I didn't hear much,
at least Auke added his ack, and the old state was clearly broken, so they
got applied yesterday.Linus
-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status : problem is being debuggedSubject : x86_64: boot hangs unless CONFIG_PCIEPORTBUS=n and acpi=off
References : http://bugzilla.kernel.org/show_bug.cgi?id=8162
Submitter : Randy Dunlap <randy.dunlap@oracle.com>
Status : unknownSubject : ACPI regression with noapic
References : http://lkml.org/lkml/2007/3/8/468
Submitter : Ray Lee <ray-lk@madrabbit.org>
Status : unknownSubject : acpi_serialize locks system during boot
References : http://bugzilla.kernel.org/show_bug.cgi?id=8171
Submitter : Colchao <colchaodemola@gmail.com>
Status : unknownSubject : NCQ problem with ahci and Hitachi drive (ACPI related)
References : http://lkml.org/lkml/2007/3/4/178
http://lkml.org/lkml/2007/3/9/475
Submitter : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
Status : unknownSubject : kernels fail to boot with drives on ATIIXP controller
(ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
http://lkml.org/lkml/2007/3/4/257
Submitter : Michal Jaegermann <michal@ellpspace.math.ualberta.ca>
Status : unknownSubject : libata: PATA UDMA/100 configured as UDMA/33
References : http://lkml.org/lkml/2007/2/20/294
[ message continues ]
It uses RDTSC when it shouldn't. Already got a fix for that.
-Andi
-
Some cases should be fixed now but probably not all (eg the Nvidia one)
-
This regression is still present in 2.6.21-rc3-g8b9909de (pulled from
Linus' tree less than one hour ago).Fabio
-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : ipv6 crash
References : http://lkml.org/lkml/2007/3/10/2
Submitter : Len Brown <lenb@kernel.org>
Status : unknownSubject : ThinkPad X60: bluetooth hardlocks
References : http://lkml.org/lkml/2007/3/2/85
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Marcel Holtmann <marcel@holtmann.org>
Status : unknownSubject : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter : Albert Hopkins <kernel@marduk.letterboxes.org>
Handled-By : Ayaz Abdulla <aabdulla@nvidia.com>
Status : problem is being debugged-
On Tue, 13 Mar 2007 13:50:03 +0100,
Does this still happen with -rc3? I'd have thought Mark's patch in
0de1517e23c2e28d58a6344b97a120596ea200bb fixed that...
-
Pavel? Could you retest this now on a ThinkPad X60 ?
???
-
I can confirm it is fixed.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : Dynticks and High resolution Timer hanging the system
References : http://lkml.org/lkml/2007/3/7/504
Submitter : Stephane Casset <sept@logidee.com>
Caused-By : Thomas Gleixner <tglx@linutronix.de>
Status : unknownSubject : Clocksource tsc unstable (delta = -154983451 ns)
References : http://lkml.org/lkml/2007/3/9/271
Submitter : Jiri Slaby <jirislaby@gmail.com>
Status : unknownSubject : hrtimer_switch_to_hres():
wrong tick_init_highres() return value handling
References : http://lkml.org/lkml/2007/3/6/262
Submitter : Linus Torvalds <torvalds@linux-foundation.org>
Caused-By : Thomas Gleixner <tglx@linutronix.de>
commit 54cdfdb47f73b5af3d1ebb0f1e383efbe70fde9e
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : unknownSubject : soft lockup detected on CPU#0
References : http://lkml.org/lkml/2007/3/3/152
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar <mingo@elte.hu>
Status : unknownSubject : dynticks makes ksoftirqd1 use unreasonable amount of cpu time
References : http://bugzilla.kernel.org/show_bug.cgi?id=8100
Submitter : Emil Karlson <jkarlson@cc.hut.fi>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : problem is being debuggedSubject : system doesn't come out of suspend (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
...
That's not a regression. That's an informal message, when the TSC
watchdog detects that the TSC is unreliable.tglx
-
Looking at [1], there's also be a probably related "doesn't boot"
problem.
My first guess would be commit 6bb74df481223731af6c7e0ff3adb31f6442cfcd
"clocksource init adjustments (fix bug #7426)".Jiri, is the message also present with 2.6.21-rc2 (at a different place
cu
Adrian[1] http://lkml.org/lkml/2007/3/13/219
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
With the current git of today the halt on boot is gone. I am running=20
it now ...Flo
--=20
Florian Lohoff flo@rfc822.org +49-171-2280134
Those who would give up a little freedom to get a little=20
security shall soon have neither - Benjamin Franklin
I'm really curious what made it go away.
tglx
-
Yes, it's present there too, some lines below the place, where it is placed
in -rc3.regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
-
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
Linus merged the original patch, which solved the real problem.
He just gave me a lesson how to do it right next time.
tglx
-
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : resume: slab error in verify_redzone_free(): cache `size-512':
memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter : Pavel Machek <pavel@ucw.cz>
Status : unknownSubject : beeps get longer after suspend
References : http://lkml.org/lkml/2007/2/26/276
Submitter : Pavel Machek <pavel@ucw.cz>
Status : unknownSubject : suspend/resume hangs until keypress
References : http://bugzilla.kernel.org/show_bug.cgi?id=8181
Submitter : Tomas Janousek <tomi@nomi.cz>
Status : unknownSubject : SATA breakage on resume
References : http://lkml.org/lkml/2007/3/7/233
Submitter : Thomas Gleixner <tglx@linutronix.de>
Soeren Sonnenburg <kernel@nn7.de>
Status : unknownSubject : first disk access after resume takes several minutes
References : http://lkml.org/lkml/2007/3/8/117
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Status : unknownSubject : after resume: X hangs after drawing a couple of windows
References : http://lkml.org/lkml/2007/3/8/117
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Status : unknownSubject : ThinkPad Z60m: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
http://lkml.org/lkml/2007/2/28/172
Submitter : Arkadiusz Miskiewicz <arekm@maven.pl>
Caused-By : Konstantin Karasyov <konstantin.a.karasyov@intel.com>
commit 0a6139027f3986162233adc17285151e78b39cac
Handled-By : Kons...
It's fixed in git tree. Commit ff24ba74b6d3befbfbafa142582211b5a6095d45
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
-
Seems fixed in -rc3.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
seems to be fixed in 2.6.21-rc3
--
Lukáš Hejtmánek
-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : ThinkPad X60: resume no longer works (PCI related?)
References : http://lkml.org/lkml/2007/3/13/3
Submitter : Dave Jones <davej@redhat.com>
Caused-By : PCI merge
commit 78149df6d565c36675463352d0bfe0000b02b7a7
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
Rafael J. Wysocki <rjw@sisk.pl>
Status : problem is being debuggedSubject : ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
http://lkml.org/lkml/2007/2/28/348
Submitter : Jens Axboe <jens.axboe@oracle.com>
Jeff Chua <jeff.chua.linux@gmail.com>
Status : unknownSubject : suspend to disk hangs
References : http://lkml.org/lkml/2007/3/6/142
Submitter : Jeff Chua <jeff.chua.linux@gmail.com>
Status : unknownSubject : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter : Ray Lee <ray-lk@madrabbit.org>
Caused-By : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch : http://lkml.org/lkml/2007/3/12/228
Status : patch available-
This email lists some known regressions in Linus' tree compared to 2.6.20.
If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter : Oliver Pinter <oliver.pntr@gmail.com>
Status : unknownSubject : kwin dies silently
References : http://lkml.org/lkml/2007/2/28/112
Submitter : Sid Boyce <g3vbv@blueyonder.co.uk>
Status : unknownSubject : mmc card reader no longer works
References : http://lkml.org/lkml/2007/2/27/91
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Oliver Neukum <oneukum@suse.de>
Status : unknownSubject : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter : Mark Lord <lkml@rtr.ca>
Caused-By : Jim Radford <radford@blackbean.org>
commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
Jim Radford <radford@blackbean.org>
Status : problem is being debuggedSubject : snd_intel8x0: divide error: 0000
References : http://lkml.org/lkml/2007/3/5/252
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Takashi Iwai <tiwai@suse.de>
Status : submitter was asked to test a patchSubject : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By : Randy Cushman <rcushman_linux@earthlink.net>
commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Randy Cushman <rcushman_linux@earthlink.net>
Takashi ...
At Tue, 13 Mar 2007 13:49:57 +0100,
Already fixed. The patch is in ALSA HG tree, but not synced to
git...
Jaroslav, could you do prepare and push request ASAP, please?thanks,
Takashi
-
First I heard of this. The error report is a bit thin so Pavel will need to
elaborate a bit more.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-
The device is a USB serial device. USB serial was known to have issues
in the version this happened. As far as I know the bug has not been
replicated after this bugs were fixed.Regards
Oliver
-
Ahha, now I see where the confusion comes from.
No, the reader is not a serial device, it is reader build-in x60. USB
serial device (siemens sx1) has separate problem.Device is
15:00.2 Generic system peripheral [0805]: Ricoh Co Ltd R5C822
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 18)root@amd:~# ls -al /dev/mmc
brw-r--r-- 1 root root 251, 0 Nov 5 16:57 /dev/mmc
......anything else I should try? Card is obviously detected, but I can't
access it..Uhuh. User error, lets close the report.
mmc changed the major to
236 mmc
... while it was something else in 2.6.20. Can we get stable device
allocation for mmc?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
What kind of savages do not use udev these days?! ;)
I don't have the time and energy to jump through all the hoops required to get
an official number right now. Most users use udev and those that don't can use
the "major" parameter for mmc_block.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-
That's okay, but if one of those savages got major for you, would you
be willing to use it? :-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
Indeed I would.
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-
Those whose Linux installation predates the devfs hype
and postdates the devfs hype
and predates the udev hype
and will postdate the udev hype
and predates the next hypecu
Adri "static /dev" an--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
hi,
i don't know if you ever used linux on embedded devices like set-top-boxes.
you have a mostly fixed device infrastructure on those devices.
even if you call it a "kind of savage",
using udev there instead of fixed major device numbers is crap.best regards
marcel-
(Dropped LKML, whoops.)
Robert and Jeff already know about these, but I thought I'd send out a
reminder.ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000
status 0x500 next cpb count 0x0 next cpb idx 0x0
ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd 35/00:30:b5:c1:8f/00:01:01:00:00/e0 tag 0 cdb 0x0 data 155648 out
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUAThey didn't happen (or didn't happen as frequently) in 2.6.20; it's a serious
bug. Happened in -rc2 and -rc3. A patch from Robert reverting
721449bf0d51213fe3abf0ac3e3561ef9ea7827a seems to make them go away.--
Cheers,
Alistair.Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
Still having SATA breakage on resume:
Caught that one (from screen)
ATA: abnormal status 0x7F on port 0x000118cf
irq 21: nobody cared (try booting ......)
...
Disabling IRQ #21During normal boot I see the "ATA: abnormal status 0x7F on port
0x000118cf" once, but there the system behaves normaltglx
-
maybe that is also causing the hang I am still seeing with the full
config... :(
(no display, no usb device activation, but I tend to think the mbp wants
to access the hdd...)SCSI device sda: write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ata1.00: qc timeout (cmd 0xa1)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7Soeren
--
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-
I enabled ATA_DEBUG and hacked it to provide debug output only on
resume. Now the disk resumes and no stale interrupt happens.Full log at: http://www.tglx.de/private/tglx/sata-2.6.21-rc3.log
Both states are fully reproducible. (DEBUG ON/OFF == GOOD/BAD)
/me continues the libata exploration
tglx
-
BTW. Does anyone care about parport console?
console=lp0 hangs since at least 2.6.18Calling initcall 0xc0438939: pty_init+0x0/0x231()
Calling initcall 0xc0439235: lp_init_module+0x0/0x238()
lp: driver loaded but no devices found
Calling initcall 0xc043947f: mod_init+0x0/0x286()
intel_rng: FWH not detected
Calling initcall 0xc0439aa9: serial8250_init+0x0/0x114()
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
PM: Adding info for platform:serial8250
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
PM: Adding info for No Bus:ttyS2
PM: Adding info for No Bus:ttyS3
Calling initcall 0xc0439c6c: serial8250_pnp_init+0x0/0xf()
PM: Removing info for No Bus:ttyS0
00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
PM: Removing info for No Bus:ttyS1
00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
Calling initcall 0xc0439c7b: serial8250_pci_init+0x0/0x16()
Calling initcall 0xc043a16d: parport_default_proc_register+0x0/0x16()
Calling initcall 0xc043a250: parport_pc_init+0x0/0x196()
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-...
Regards,
Michal--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
For the record, I used console=lp0 quite recently (stock 2.6.19 according to
the printout, running on i386) [to find out what was causing a panic that
immediately vanished off the top of the screen because of "atkbd.c: Spurious
ACK..."s from the flashing kb LEDs] and it worked just fine.The parport-related lines went:
lp: driver loaded but no devices found
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
parport0: Printer, EPSON Stylus COLOR 600
lp0: using parport0 (interrupt-driven)
lp0: console ready... then the kernel continued booting until the panic occurred (it was a silly
storage-related misconfig on my part).If anyone wants me to try anything (newer kernel or different parport-related
BIOS settings, perhaps, to see if I can duplicate the problem?) and report
back, let me know.Stephen
-
ISTR lp consoles block indefinitely until the printer is ready, so
if you ask for a lp console but don't have a working printer connected
it will hang.--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
-
I do think we care, but I don't think anybody in particular feels singled
Ok, that's not exactly new then, which implies that not a *lot* of people
even care ;)Do you think you'd be willing to try to figure out when it started? You
seem to be the first one to have even noticed.(I tried to google it, and the most recent thing google finds is your
report, although I also saw a report of somebody trying it under qemu in
July last year and also reported a hang)Looking through the history of the last few years (it in git), I don't see
anything even *remotely* suspicious there, so it's probably either
(a) really old, and hasn't worked in a loong time and nobody just uses it
(b) something really stupid that happened while doing other cleanups (but
the changes in the last two years are *literally* just things like
removing devfs support)
(c) some infrastructure change that subtly broke lpconsole, probably
causing an oops during printk, which obviously results in a printk
itself, which thus hangs.It would be good to get it fixed, although for obvious reasons it's not a
huge priority..Linus
-
Hi,
I get this while
echo shutdown > /sys/power/disk; echo disk > /sys/power/stateBUG: using smp_processor_id() in preemptible [00000001] code: swsusp_shutdown/3359
caller is check_tsc_sync_source+0x1b/0xef
[<c010503d>] show_trace_log_lvl+0x1a/0x2f
[<c0105724>] show_trace+0x12/0x14
[<c01057d6>] dump_stack+0x16/0x18
[<c01f835e>] debug_smp_processor_id+0xa2/0xb4
[<c0113cc5>] check_tsc_sync_source+0x1b/0xef
[<c011367d>] __cpu_up+0x136/0x158
[<c0141aec>] _cpu_up+0x74/0xbf
[<c0141b5d>] cpu_up+0x26/0x38
[<c0141bbc>] enable_nonboot_cpus+0x4d/0x9a
[<c0146ae0>] pm_suspend_disk+0x11c/0x210
[<c014597e>] enter_state+0x50/0x1d0
[<c0145b84>] state_store+0x86/0x9c
[<c01a53d0>] subsys_attr_store+0x20/0x25
[<c01a54ea>] sysfs_write_file+0xc1/0xe9
[<c017199b>] vfs_write+0xaf/0x138
[<c0171f65>] sys_write+0x3d/0x61
[<c0104064>] syscall_call+0x7/0xb
=======================l *check_tsc_sync_source+0x1b/0xef
0xc0113caa is in check_tsc_sync_source (/mnt/md0/devel/linux-git/arch/i386/kernel/../../x86_64/kernel/tsc_sync.c:99).
94 /*
95 * Source CPU calls into this - it waits for the freshly booted
96 * target CPU to arrive and then starts the measurement:
97 */
98 void __cpuinit check_tsc_sync_source(int cpu)
99 {
100 int cpus = 2;
101
102 /*
103 * No need to check if we already know that the TSC is notecho platform > /sys/power/disk; echo disk > /sys/power/state
doesn't work (as always).http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/boot...
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-...Regards,
Michal--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
[ Ingo and Thomas added to Cc, because I think this is them.. ]
Ingo, I think this came in during commit 95492e4646, "x86: rewrite SMP TSC
sync code".(Leaving the original message quoted in full for Ingo and Thomas, sorry
for the waste of bandwidth)Linus
---
-
Michal, could you try the patch below?
Ingo
----------------------------->
Subject: [patch] CPU hotplug: call check_tsc_sync_source() with irqs off
From: Ingo Molnar <mingo@elte.hu>check_tsc_sync_source() depends on being called with irqs disabled (it
checks whether the TSC is coherent across two specific CPUs). This is
incidentally true during bootup, but not during cpu hotplug __cpu_up().
This got found via smp_processor_id() debugging.disable irqs explicitly and remove the unconditional enabling of
interrupts. Add touch_nmi_watchdog() to the cpu_online_map busy loop.this bug is present both on i386 and on x86_64.
Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/i386/kernel/smpboot.c | 16 ++++++++++------
arch/x86_64/kernel/smpboot.c | 5 ++++-
2 files changed, 14 insertions(+), 7 deletions(-)Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -50,6 +50,7 @@
#include <linux/notifier.h>
#include <linux/cpu.h>
#include <linux/percpu.h>
+#include <linux/nmi.h>#include <linux/delay.h>
#include <linux/mc146818rtc.h>
@@ -1283,8 +1284,9 @@ void __cpu_die(unsigned int cpu)int __cpuinit __cpu_up(unsigned int cpu)
{
+ unsigned long flags;
#ifdef CONFIG_HOTPLUG_CPU
- int ret=0;
+ int ret = 0;/*
* We do warm boot only on cpus that had booted earlier
@@ -1302,23 +1304,25 @@ int __cpuinit __cpu_up(unsigned int cpu)
/* In case one didn't come up */
if (!cpu_isset(cpu, cpu_callin_map)) {
printk(KERN_DEBUG "skipping cpu%d, didn't come online\n", cpu);
- local_irq_enable();
return -EIO;
}- local_irq_enable();
-
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* Unleash the CPU! */
cpu_set(cpu, smp_commenced_mask);/*
- * Check TSC synch...
Hi,
I just tryed linux-2.6.21-rc3 on my machine (P4HT 2.8GHz, with 512Mo)
with Tickless System (Dynamic Ticks) and High Resolution Timer Support
(.config in attachement)The problem is that the kernel hang on boot. I tried different
configuration with nohz and highres on the kernel command line.The only combination that works is : nohz=off highres=off
I also tried compiling the kernel without Tickless and without High
resolution timer, this kernel is working ok and is one of the first
kernel to suspend and resume from RAM. Congratulations ! ;pI tried to compile te kernel with only Tickless System or High
Resolution timer, both hang on boot.The hang is just after :
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x2040-0x2047, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x2048-0x204f, BIOS settings: hdc:DMA, hdd:pioAnd I have the message :
Switched to NOHZ mode on CPU #1
or
Switched to high resolution mode on CPU #1
Depending on the option enabled/disabledWhat can I do to help find the bug ?
dmesg and .config of the system booted with nohz=off highres=off are in
attachements.Regards
--
St
There should be no difference between compile time and runtime
Can you capture a boot log with highres and/or dynticks enabled ?
Enable CONFIG_SERIAL_8250_CONSOLE and add "console=ttyS0,115200" to the
commandline. Capture the output with minicom on a second box.Also please enable CONFIG_MAGIC_SYSRQ and try to send a SysRq-T and a
SysRq-Q to the machine via keyboard or the serial line.Thanks
tglx
-
When the system hangs, the keyboard is dead :(
I just tried clocksource=acpi_pm and the hang disapears...
I tested 2.6.21-rc1 which also hangs but not always, when it hangs I
tried Sysrq-T and got this, I noted in parenthesis some value when it does'nt
hang...SysRq : Show Pending Timers
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: X
now at XXXXXXXXXXX nsecs
cpu: 0
clock 0:
.index: 0
.resolution: 10000000 nsecs / 1ns (when it does'nt hang)
.get_time: ktime_get_real
.offset: 0 nsecs
active timers:
clock 1:
.index: 1
.resolution: 10000000 nsecs / 1ns (when it does'nt hang)
.get_time: ktime_get
.offset: 0 nsecs
active timers:
.expires_next : 9223372036854775807 nsecs (some thing resonneable when not hanging)
Almost the same for cpu1
andTick Device: mode: 1
Clock Event Device: pit
max_delta_ns: 27461866
min_delta_ns: 12571
mult: 5124677
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: pit_next_event
set_mode: init_pit_timer
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000001
tick_broadcast_oneshot_mask: 00000000Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 672715459
min_delta_ns: 1202
mult: 53557254
shift: 32
mode: 3
next_event: 84460000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interruptTick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 672715459
min_delta_ns: 1202
mult: 53557254
shift: 32
mode: 3
next_event: 84790000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interruptSo it seems that the clock source selection is not working properly or the pit
(the default clock source right ?) is not correctly initialised...If you need the complete SYSRQ...
Hrmpf. Netconsole should work.
Enable CONFIG_NETCONSOLE and compile the network driver into your
kernel. See Documentation/networking/netconsole.txt for the kernel
command line option.------------------------------^
ACPI does only take care of one CPU
ACPI: processor limited to max C-state 1
ACPI: CPU0 (power states: C1[C1] C3[C3])
ACPI: Processor [CPU0] (supports 8 throttling states)but there is no entry for the second CPU.
Also it seems that the power state limit is possibly ignored.
That would explain the hang, as TSC and local APIC might get stuck.
Broken BIOS/ACPI I fear. Can you please go to
http://www.linuxfirmwarekit.org/download.php
Not now.
tglx
-
I think that this patch fixes the problem. Thanks!
Regards,
Michal--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
Greg, I think we should revert that patch in 2.6.20.x stable serie too
as get_order is broken there as well, causing random kernel memory
corruption every now and then among others.Cheers,
Ben-
Did you confirm that that was indeed the cause of the problem you saw?
As far as I can tell, the bug (because it tested the wrong #define) would
only affect the constant-size case, and only for something larger than a
single page, and only for a non-power-of-two size. So it looked fairly
hard to trigger, if only because all the obvious constants I saw seemed
to already be powers-of-two..So did you hunt it down to a particular cases where it triggers?
Linus
-
Well, at least one of the problem I caught with my ppc32 implementation
of DEBUG_PAGEALLOC yes. PowerPC dma_alloc_coherent, on machines with
cache consistent PCI DMA, would use get_order to allocate pages and then
memset over the size passed in. The ide-pmac driver, among others, would
trigger that bug by asking for 0x1020 bytes while get_order only
returned 0. (I should look into making the ide-pmac driver allocate <=
4K but that's a different matter).Yup, the above. Calls to dma_alloc_consistent with a constant size that
is not a multiple of the page size and larger than one page. (Our
dma_alloc_consistent implementation on 32 bits is inline).Ben.
-
IIRC, it crashed on boot in the powerpc iommu code when slab
debugging is enabled. Not sure if it was on Cell or on benh's
powerbook though.Arnd <><
-
Not iommu code, but dma_alloc_coherent() for non-iommu 32 bits
machines :-) Oh and it wasn't slab but DEBUG_PAGEALLOC :-)Ben.
-
Now added to the -stable tree, thanks for pointing it out to me.
greg k-h
-
Greg / Adrian,
I didn't see anything in -rc3 to address the USB hub/serial crashes
reported here for -rc2. What's the status for those, or who should
I be pinging to get them fixed?Thanks
-
I have a series of USB bugfixes that need to get sent to Linus that
should fix the serial issues. I'll get to them after I drag this next
-stable release out the door...thanks,
greg k-h
-
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 007/196] Chinese: add translation of stable_kernel_rules.txt |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [BUG] New Kernel Bugs |
