Re: [5/6] 2.6.21-rc3: known regressions

Previous thread: [kj]Patch8:replace pci_find_device in drivers/telephony/ixj.c by Surya on Tuesday, March 6, 2007 - 9:45 pm. (1 message)

Next thread: kernel-headers by zhangxiliang on Tuesday, March 6, 2007 - 10:14 pm. (2 messages)
From: Linus Torvalds
Date: Tuesday, March 6, 2007 - 9:59 pm

We've finally hopefully started to put a dent in the regressions, 
especially the suspend/resume problems introduced since 2.6.20.

So 2.6.21-rc3 is out there now, and there's some hope that it will work 
more widely than -rc1 and -rc2 did. Please do give it a good testing, and 
update Adrian and the mailing list (and me) about any regressions 
(hopefully many more of the "it's fixed now" than other kinds, but all 
regressions are interesting).

The appended shortlog gives a reasonable overview. In general we're 
definitely calming down, and most of the changes are fairly small and 
obvious fixes. 

Let's keep the fixes to a minimum, especially since I'm planning on biting 
peoples heads off if I get any more pull requests for things that aren't 
real and obvious fixes. 

		Linus

---

Adam Litke (1):
      Fix get_unmapped_area and fsync for hugetlb shm segments

Adrian Bunk (8):
      HID: hid-debug.c should #include <linux/hid-debug.h>
      arch/arm26/kernel/entry.S: remove dead code
      make ipc/shm.c:shm_nopage() static
      mm/{,tiny-}shmem.c cleanups
      drivers/video/sm501fb.c: make 4 functions static
      fix the SYSCTL=n compilation
      arch/i386/kernel/vmi.c must #include <asm/kmap_types.h>
      remove arch/i386/kernel/tsc.c:custom_sched_clock

Ahmed S. Darwish (1):
      KVM: Use ARRAY_SIZE macro instead of manual calculation.

Akira Iguchi (1):
      scc_pata: bugfix for checking DMA IRQ status

Alan Cox (4):
      libata-core: Fix simplex handling
      pata_qdi: Fix initialisation
      siimage: DRAC4 note
      ide: remove a ton of pointless #undef REALLY_SLOW_IO

Alexandr Andreev (1):
      [IA64] sync compat getdents

Alexey Dobriyan (1):
      geode-aes: use unsigned long for spin_lock_irqsave

Allan Graves (1):
      uml: enable RAW

Andres Salomon (3):
      i386: make x86_64 tsc header require i386 rather than vice-versa
      hrtimers: fix HRTIMER_CB_IRQSAFE_NO_SOFTIRQ description
      hrtimers: hrtimer_clock_base ...
From: Benjamin Herrenschmidt
Date: Wednesday, March 7, 2007 - 3:25 am

Greg, I think we should revert that patch in 2.6.20.x stable serie too
as get_order is broken there as well, causing random kernel memory
corruption every now and then among others.

Cheers,
Ben

-

From: Greg KH
Date: Wednesday, March 7, 2007 - 6:26 am

Now added to the -stable tree, thanks for pointing it out to me.

greg k-h
-

From: Mark Lord
Date: Wednesday, March 7, 2007 - 7:15 am

Greg / Adrian,

I didn't see anything in -rc3 to address the USB hub/serial crashes
reported here for -rc2.  What's the status for those, or who should
I be pinging to get them fixed?

Thanks

-

From: Greg KH
Date: Wednesday, March 7, 2007 - 7:22 am

I have a series of USB bugfixes that need to get sent to Linus that
should fix the serial issues.  I'll get to them after I drag this next
-stable release out the door...

thanks,

greg k-h
-

From: Linus Torvalds
Date: Wednesday, March 7, 2007 - 8:39 am

Did you confirm that that was indeed the cause of the problem you saw?

As far as I can tell, the bug (because it tested the wrong #define) would 
only affect the constant-size case, and only for something larger than a 
single page, and only for a non-power-of-two size. So it looked fairly 
hard to trigger, if only because all the obvious constants I saw seemed 
to already be powers-of-two..

So did you hunt it down to a particular cases where it triggers?

		Linus
-

From: Arnd Bergmann
Date: Wednesday, March 7, 2007 - 1:52 pm

IIRC, it crashed on boot in the powerpc iommu code when slab
debugging is enabled. Not sure if it was on Cell or on benh's
powerbook though.

	Arnd <><
-

From: Benjamin Herrenschmidt
Date: Thursday, March 8, 2007 - 1:10 am

Not iommu code, but dma_alloc_coherent() for non-iommu 32 bits
machines :-) Oh and it wasn't slab but DEBUG_PAGEALLOC :-)

Ben.


-

From: Benjamin Herrenschmidt
Date: Thursday, March 8, 2007 - 1:08 am

Well, at least one of the problem I caught with my ppc32 implementation
of DEBUG_PAGEALLOC yes. PowerPC dma_alloc_coherent, on machines with
cache consistent PCI DMA, would use get_order to allocate pages and then
memset over the size passed in. The ide-pmac driver, among others, would
trigger that bug by asking for 0x1020 bytes while get_order only
returned 0. (I should look into making the ide-pmac driver allocate <=
4K but that's a different matter).
 

Yup, the above. Calls to dma_alloc_consistent with a constant size that
is not a multiple of the page size and larger than one page. (Our
dma_alloc_consistent implementation on 32 bits is inline).

Ben.

-

From: Michal Piotrowski
Date: Wednesday, March 7, 2007 - 5:56 am

Hi,


I get this while
echo shutdown > /sys/power/disk; echo disk > /sys/power/state

BUG: using smp_processor_id() in preemptible [00000001] code: swsusp_shutdown/3359
caller is check_tsc_sync_source+0x1b/0xef
 [<c010503d>] show_trace_log_lvl+0x1a/0x2f
 [<c0105724>] show_trace+0x12/0x14
 [<c01057d6>] dump_stack+0x16/0x18
 [<c01f835e>] debug_smp_processor_id+0xa2/0xb4
 [<c0113cc5>] check_tsc_sync_source+0x1b/0xef
 [<c011367d>] __cpu_up+0x136/0x158
 [<c0141aec>] _cpu_up+0x74/0xbf
 [<c0141b5d>] cpu_up+0x26/0x38
 [<c0141bbc>] enable_nonboot_cpus+0x4d/0x9a
 [<c0146ae0>] pm_suspend_disk+0x11c/0x210
 [<c014597e>] enter_state+0x50/0x1d0
 [<c0145b84>] state_store+0x86/0x9c
 [<c01a53d0>] subsys_attr_store+0x20/0x25
 [<c01a54ea>] sysfs_write_file+0xc1/0xe9
 [<c017199b>] vfs_write+0xaf/0x138
 [<c0171f65>] sys_write+0x3d/0x61
 [<c0104064>] syscall_call+0x7/0xb
 =======================

l *check_tsc_sync_source+0x1b/0xef
0xc0113caa is in check_tsc_sync_source (/mnt/md0/devel/linux-git/arch/i386/kernel/../../x86_64/kernel/tsc_sync.c:99).
94      /*
95       * Source CPU calls into this - it waits for the freshly booted
96       * target CPU to arrive and then starts the measurement:
97       */
98      void __cpuinit check_tsc_sync_source(int cpu)
99      {
100             int cpus = 2;
101
102             /*
103              * No need to check if we already know that the TSC is not

echo platform > /sys/power/disk; echo disk > /sys/power/state
doesn't work (as always).

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/boot.log
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Linus Torvalds
Date: Wednesday, March 7, 2007 - 9:34 am

[ Ingo and Thomas added to  Cc, because I think this is them.. ]

Ingo, I think this came in during commit 95492e4646, "x86: rewrite SMP TSC 
sync code".

(Leaving the original message quoted in full for Ingo and Thomas, sorry 
for the waste of bandwidth)

		Linus

---
-

From: Ingo Molnar
Date: Wednesday, March 7, 2007 - 10:12 am

Michal, could you try the patch below?

	Ingo

----------------------------->
Subject: [patch] CPU hotplug: call check_tsc_sync_source() with irqs off
From: Ingo Molnar <mingo@elte.hu>

check_tsc_sync_source() depends on being called with irqs disabled (it 
checks whether the TSC is coherent across two specific CPUs). This is 
incidentally true during bootup, but not during cpu hotplug __cpu_up(). 
This got found via smp_processor_id() debugging.

disable irqs explicitly and remove the unconditional enabling of 
interrupts. Add touch_nmi_watchdog() to the cpu_online_map busy loop.

this bug is present both on i386 and on x86_64.

Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/i386/kernel/smpboot.c   |   16 ++++++++++------
 arch/x86_64/kernel/smpboot.c |    5 ++++-
 2 files changed, 14 insertions(+), 7 deletions(-)

Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -50,6 +50,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/percpu.h>
+#include <linux/nmi.h>
 
 #include <linux/delay.h>
 #include <linux/mc146818rtc.h>
@@ -1283,8 +1284,9 @@ void __cpu_die(unsigned int cpu)
 
 int __cpuinit __cpu_up(unsigned int cpu)
 {
+	unsigned long flags;
 #ifdef CONFIG_HOTPLUG_CPU
-	int ret=0;
+	int ret = 0;
 
 	/*
 	 * We do warm boot only on cpus that had booted earlier
@@ -1302,23 +1304,25 @@ int __cpuinit __cpu_up(unsigned int cpu)
 	/* In case one didn't come up */
 	if (!cpu_isset(cpu, cpu_callin_map)) {
 		printk(KERN_DEBUG "skipping cpu%d, didn't come online\n", cpu);
-		local_irq_enable();
 		return -EIO;
 	}
 
-	local_irq_enable();
-
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 	/* Unleash the CPU! */
 	cpu_set(cpu, smp_commenced_mask);
 
 	/*
-	 * Check TSC synchronization with the AP:
+	 * Check TSC ...
From: Michal Piotrowski
Date: Wednesday, March 7, 2007 - 10:45 am

I think that this patch fixes the problem. Thanks!

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Stephane Casset
Date: Wednesday, March 7, 2007 - 12:12 pm

Hi,

I just tryed linux-2.6.21-rc3 on my machine (P4HT 2.8GHz, with 512Mo)
with Tickless System (Dynamic Ticks) and High Resolution Timer Support
(.config in attachement)

The problem is that the kernel hang on boot. I tried different
configuration with nohz and highres on the kernel command line.

The only combination that works is : nohz=off highres=off

I also tried compiling the kernel without Tickless and without  High
resolution timer, this kernel is working ok and is one of the first
kernel to suspend and resume from RAM. Congratulations ! ;p

I tried to compile te kernel with only Tickless System or  High
Resolution timer, both hang on boot.

The hang is just after :
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x2040-0x2047, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x2048-0x204f, BIOS settings: hdc:DMA, hdd:pio

And I have the message :
Switched to NOHZ mode on CPU #1
or
Switched to high resolution mode on CPU #1
Depending on the option enabled/disabled 

What can I do to help find the bug ?

dmesg and .config of the system booted with nohz=off highres=off are in
attachements.

Regards
-- 
St
From: Thomas Gleixner
Date: Wednesday, March 7, 2007 - 12:52 pm

There should be no difference between compile time and runtime

Can you capture a boot log with highres and/or dynticks enabled ? 

Enable CONFIG_SERIAL_8250_CONSOLE and add "console=ttyS0,115200" to the
commandline. Capture the output with minicom on a second box.

Also please enable CONFIG_MAGIC_SYSRQ and try to send a SysRq-T and a
SysRq-Q to the machine via keyboard or the serial line.

Thanks

	tglx


-

From: Stephane Casset
Date: Wednesday, March 7, 2007 - 2:16 pm

When the system hangs, the keyboard is dead :(

I just tried clocksource=acpi_pm and the hang disapears... 

I tested 2.6.21-rc1 which also hangs but not always, when it hangs I
tried Sysrq-T and got this, I noted in parenthesis some value when it does'nt
hang...

SysRq : Show Pending Timers
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: X
now at XXXXXXXXXXX nsecs
cpu: 0
 clock 0:
  .index:      0
  .resolution: 10000000 nsecs / 1ns (when it does'nt hang)
  .get_time:   ktime_get_real
  .offset:     0 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 10000000 nsecs / 1ns (when it does'nt hang)
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
  .expires_next   : 9223372036854775807 nsecs (some thing resonneable when not hanging)
Almost the same for cpu1
and

Tick Device: mode:     1
Clock Event Device: pit
 max_delta_ns:   27461866
 min_delta_ns:   12571
 mult:           5124677
 shift:          32
 mode:           3
 next_event:     9223372036854775807 nsecs
 set_next_event: pit_next_event
 set_mode:       init_pit_timer
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000001
tick_broadcast_oneshot_mask: 00000000

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   672715459
 min_delta_ns:   1202
 mult:           53557254
 shift:          32
 mode:           3
 next_event:     84460000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   672715459
 min_delta_ns:   1202
 mult:           53557254
 shift:          32
 mode:           3
 next_event:     84790000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

So it seems that the clock source selection is not working properly or the pit
(the default clock source right ?) is not correctly initialised...

If you need the complete ...
From: Thomas Gleixner
Date: Wednesday, March 7, 2007 - 3:09 pm

Hrmpf. Netconsole should work.

Enable CONFIG_NETCONSOLE and compile the network driver into your
kernel. See Documentation/networking/netconsole.txt for the kernel
command line option.




------------------------------^

ACPI does only take care of one CPU

ACPI: processor limited to max C-state 1
ACPI: CPU0 (power states: C1[C1] C3[C3])
ACPI: Processor [CPU0] (supports 8 throttling states)

but there is no entry for the second CPU.

Also it seems that the power state limit is possibly ignored.

That would explain the hang, as TSC and local APIC might get stuck.

Broken BIOS/ACPI I fear. Can you please go to

http://www.linuxfirmwarekit.org/download.php



Not now.

	tglx


-

From: Michal Piotrowski
Date: Wednesday, March 7, 2007 - 6:09 am

BTW. Does anyone care about parport console?
console=lp0 hangs since at least 2.6.18

Calling initcall 0xc0438939: pty_init+0x0/0x231()
Calling initcall 0xc0439235: lp_init_module+0x0/0x238()
lp: driver loaded but no devices found
Calling initcall 0xc043947f: mod_init+0x0/0x286()
intel_rng: FWH not detected
Calling initcall 0xc0439aa9: serial8250_init+0x0/0x114()
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
PM: Adding info for platform:serial8250
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
PM: Adding info for No Bus:ttyS2
PM: Adding info for No Bus:ttyS3
Calling initcall 0xc0439c6c: serial8250_pnp_init+0x0/0xf()
PM: Removing info for No Bus:ttyS0
00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PM: Adding info for No Bus:ttyS0
PM: Removing info for No Bus:ttyS1
00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
PM: Adding info for No Bus:ttyS1
Calling initcall 0xc0439c7b: serial8250_pci_init+0x0/0x16()
Calling initcall 0xc043a16d: parport_default_proc_register+0x0/0x16()
Calling initcall 0xc043a250: parport_pc_init+0x0/0x196()
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3/git-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Linus Torvalds
Date: Wednesday, March 7, 2007 - 9:25 am

I do think we care, but I don't think anybody in particular feels singled 

Ok, that's not exactly new then, which implies that not a *lot* of people 
even care ;)

Do you think you'd be willing to try to figure out when it started? You 
seem to be the first one to have even noticed.

(I tried to google it, and the most recent thing google finds is your 
report, although I also saw a report of somebody trying it under qemu in 
July last year and also reported a hang)

Looking through the history of the last few years (it in git), I don't see 
anything even *remotely* suspicious there, so it's probably either 
 (a) really old, and hasn't worked in a loong time and nobody just uses it
 (b) something really stupid that happened while doing other cleanups (but 
     the changes in the last two years are *literally* just things like 
     removing devfs support)
 (c) some infrastructure change that subtly broke lpconsole, probably 
     causing an oops during printk, which obviously results in a printk 
     itself, which thus hangs.

It would be good to get it fixed, although for obvious reasons it's not a 
huge priority..

		Linus
-

From: Stephen Mollett
Date: Wednesday, March 7, 2007 - 10:14 am

For the record, I used console=lp0 quite recently (stock 2.6.19 according to 
the printout, running on i386) [to find out what was causing a panic that 
immediately vanished off the top of the screen because of "atkbd.c: Spurious 
ACK..."s from the flashing kb LEDs] and it worked just fine.

The parport-related lines went:

lp: driver loaded but no devices found
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
parport0: Printer, EPSON Stylus COLOR 600
lp0: using parport0 (interrupt-driven)
lp0: console ready

... then the kernel continued booting until the panic occurred (it was a silly 
storage-related misconfig on my part).

If anyone wants me to try anything (newer kernel or different parport-related 
BIOS settings, perhaps, to see if I can duplicate the problem?) and report 
back, let me know.

Stephen
-

From: Russell King
Date: Wednesday, March 7, 2007 - 10:35 am

ISTR lp consoles block indefinitely until the printer is ready, so
if you ask for a lp console but don't have a working printer connected
it will hang.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-

From: Thomas Gleixner
Date: Wednesday, March 7, 2007 - 7:22 am

Still having SATA breakage on resume:

Caught that one (from screen)

ATA: abnormal status 0x7F on port 0x000118cf
irq 21: nobody cared (try booting ......)
...
Disabling IRQ #21


During normal boot I see the "ATA: abnormal status 0x7F on port
0x000118cf" once, but there the system behaves normal

	tglx


-

From: Thomas Gleixner
Date: Wednesday, March 7, 2007 - 10:14 am

I enabled ATA_DEBUG and hacked it to provide debug output only on
resume. Now the disk resumes and no stale interrupt happens.

Full log at: http://www.tglx.de/private/tglx/sata-2.6.21-rc3.log

Both states are fully reproducible. (DEBUG ON/OFF == GOOD/BAD)

/me continues the libata exploration

	tglx


-

From: Soeren Sonnenburg
Date: Wednesday, March 7, 2007 - 10:42 am

maybe that is also causing the hang I am still seeing with the full
config... :(
(no display, no usb device activation, but I tend to think the mbp wants
to access the hdd...)

SCSI device sda: write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ata1.00: qc timeout (cmd 0xa1)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7
ATA: abnormal status 0x80 on port 0x000101f7

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-

From: Alistair John Strachan
Date: Thursday, March 8, 2007 - 10:28 am

(Dropped LKML, whoops.)


Robert and Jeff already know about these, but I thought I'd send out a
reminder.

ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 
status 0x500 next cpb count 0x0 next cpb idx 0x0
ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd 35/00:30:b5:c1:8f/00:01:01:00:00/e0 tag 0 cdb 0x0 data 155648 out
         res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA

They didn't happen (or didn't happen as frequently) in 2.6.20; it's a serious
bug. Happened in -rc2 and -rc3. A patch from Robert reverting
721449bf0d51213fe3abf0ac3e3561ef9ea7827a seems to make them go away.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:49 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter  : Oliver Pinter <oliver.pntr@gmail.com>
Status     : unknown


Subject    : kwin dies silently
References : http://lkml.org/lkml/2007/2/28/112
Submitter  : Sid Boyce <g3vbv@blueyonder.co.uk>
Status     : unknown


Subject    : mmc card reader no longer works
References : http://lkml.org/lkml/2007/2/27/91
Submitter  : Pavel Machek <pavel@ucw.cz>
Handled-By : Oliver Neukum <oneukum@suse.de>
Status     : unknown


Subject    : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter  : Mark Lord <lkml@rtr.ca>
Caused-By  : Jim Radford <radford@blackbean.org>
             commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
             Jim Radford <radford@blackbean.org>
Status     : problem is being debugged


Subject    : snd_intel8x0: divide error: 0000
References : http://lkml.org/lkml/2007/3/5/252
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Takashi Iwai <tiwai@suse.de>
Status     : submitter was asked to test a patch


Subject    : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By  : Randy Cushman <rcushman_linux@earthlink.net>
             commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Randy Cushman <rcushman_linux@earthlink.net>
             Takashi Iwai <tiwai@suse.de>
Status     : problem is being debugged

-

From: Pierre Ossman
Date: Tuesday, March 13, 2007 - 6:08 am

First I heard of this. The error report is a bit thin so Pavel will need to
elaborate a bit more.

Rgds
-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  PulseAudio, core developer          http://pulseaudio.org
  rdesktop, core developer          http://www.rdesktop.org
-

From: Oliver Neukum
Date: Tuesday, March 13, 2007 - 6:36 am

The device is a USB serial device. USB serial was known to have issues
in the version this happened. As far as I know the bug has not been
replicated after this bugs were fixed.

	Regards
		Oliver
-

From: Pavel Machek
Date: Tuesday, March 13, 2007 - 11:11 am

Ahha, now I see where the confusion comes from.

No, the reader is not a serial device, it is reader build-in x60. USB
serial device (siemens sx1) has separate problem.

Device is 

15:00.2 Generic system peripheral [0805]: Ricoh Co Ltd R5C822
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 18)

root@amd:~# ls -al /dev/mmc
brw-r--r-- 1 root root 251, 0 Nov  5 16:57 /dev/mmc
...

...anything else I should try? Card is obviously detected, but I can't
access it..

Uhuh. User error, lets close the report.

mmc changed the major to 

236 mmc

... while it was something else in 2.6.20. Can we get stable device
allocation for mmc?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Pierre Ossman
Date: Tuesday, March 13, 2007 - 12:07 pm

What kind of savages do not use udev these days?! ;)

I don't have the time and energy to jump through all the hoops required to get
an official number right now. Most users use udev and those that don't can use
the "major" parameter for mmc_block.

Rgds
-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  PulseAudio, core developer          http://pulseaudio.org
  rdesktop, core developer          http://www.rdesktop.org
-

From: Mws
Date: Tuesday, March 13, 2007 - 12:12 pm

hi,

i don't know if you ever used linux on embedded devices like set-top-boxes.

you have a mostly fixed device infrastructure on those devices.

even if you call it a "kind of savage",
using udev there instead of fixed major device numbers is crap.

best regards
marcel





-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 12:15 pm

Those whose Linux installation predates the devfs hype
and postdates the devfs hype
and predates the udev hype
and will postdate the udev hype
and predates the next hype

cu
Adri "static /dev" an

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Pavel Machek
Date: Tuesday, March 13, 2007 - 1:05 pm

That's okay, but if one of those savages got major for you, would you
be willing to use it? :-).
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Pierre Ossman
Date: Tuesday, March 13, 2007 - 1:31 pm

Indeed I would.

-- 
     -- Pierre Ossman

  Linux kernel, MMC maintainer        http://www.kernel.org
  PulseAudio, core developer          http://pulseaudio.org
  rdesktop, core developer          http://www.rdesktop.org
-

From: Takashi Iwai
Date: Tuesday, March 13, 2007 - 6:40 am

At Tue, 13 Mar 2007 13:49:57 +0100,

Already fixed.  The patch is in ALSA HG tree, but not synced to
git... 
Jaroslav, could you do prepare and push request ASAP, please?

thanks,

Takashi
-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : ThinkPad X60: resume no longer works  (PCI related?)
References : http://lkml.org/lkml/2007/3/13/3
Submitter  : Dave Jones <davej@redhat.com>
Caused-By  : PCI merge
             commit 78149df6d565c36675463352d0bfe0000b02b7a7
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
             Rafael J. Wysocki <rjw@sisk.pl>
Status     : problem is being debugged


Subject    : ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
             http://lkml.org/lkml/2007/2/28/348
Submitter  : Jens Axboe <jens.axboe@oracle.com>
             Jeff Chua <jeff.chua.linux@gmail.com>
Status     : unknown


Subject    : suspend to disk hangs
References : http://lkml.org/lkml/2007/3/6/142
Submitter  : Jeff Chua <jeff.chua.linux@gmail.com>
Status     : unknown


Subject    : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter  : Ray Lee <ray-lk@madrabbit.org>
Caused-By  : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
             commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch      : http://lkml.org/lkml/2007/3/12/228
Status     : patch available



-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : resume: slab error in verify_redzone_free(): cache `size-512':
                     memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter  : Pavel Machek <pavel@ucw.cz>
Status     : unknown


Subject    : beeps get longer after suspend
References : http://lkml.org/lkml/2007/2/26/276
Submitter  : Pavel Machek <pavel@ucw.cz>
Status     : unknown


Subject    : suspend/resume hangs until keypress
References : http://bugzilla.kernel.org/show_bug.cgi?id=8181
Submitter  : Tomas Janousek <tomi@nomi.cz>
Status     : unknown


Subject    : SATA breakage on resume
References : http://lkml.org/lkml/2007/3/7/233
Submitter  : Thomas Gleixner <tglx@linutronix.de>
             Soeren Sonnenburg <kernel@nn7.de>
Status     : unknown


Subject    : first disk access after resume takes several minutes
References : http://lkml.org/lkml/2007/3/8/117
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
Status     : unknown


Subject    : after resume: X hangs after drawing a couple of windows
References : http://lkml.org/lkml/2007/3/8/117
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
Status     : unknown


Subject    : ThinkPad Z60m: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
             http://lkml.org/lkml/2007/2/28/172
Submitter  : Arkadiusz Miskiewicz <arekm@maven.pl>
Caused-By  : Konstantin Karasyov <konstantin.a.karasyov@intel.com>
             commit 0a6139027f3986162233adc17285151e78b39cac
Handled-By : Konstantin Karasyov ...
From: Lukas Hejtmanek
Date: Tuesday, March 13, 2007 - 6:29 am

seems to be fixed in 2.6.21-rc3

-- 
Lukáš Hejtmánek
-

From: Pavel Machek
Date: Tuesday, March 13, 2007 - 11:14 am

From: Arkadiusz Miskiewicz
Date: Tuesday, March 13, 2007 - 2:46 pm

It's fixed in git tree. Commit ff24ba74b6d3befbfbafa142582211b5a6095d45

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : Dynticks and High resolution Timer hanging the system
References : http://lkml.org/lkml/2007/3/7/504
Submitter  : Stephane Casset <sept@logidee.com>
Caused-By  : Thomas Gleixner <tglx@linutronix.de>
Status     : unknown


Subject    : Clocksource tsc unstable (delta = -154983451 ns)
References : http://lkml.org/lkml/2007/3/9/271
Submitter  : Jiri Slaby <jirislaby@gmail.com>
Status     : unknown


Subject    : hrtimer_switch_to_hres():
             wrong tick_init_highres() return value handling
References : http://lkml.org/lkml/2007/3/6/262
Submitter  : Linus Torvalds <torvalds@linux-foundation.org>
Caused-By  : Thomas Gleixner <tglx@linutronix.de>
             commit 54cdfdb47f73b5af3d1ebb0f1e383efbe70fde9e
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status     : unknown


Subject    : soft lockup detected on CPU#0
References : http://lkml.org/lkml/2007/3/3/152
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
             Ingo Molnar <mingo@elte.hu>
Status     : unknown


Subject    : dynticks makes ksoftirqd1 use unreasonable amount of cpu time
References : http://bugzilla.kernel.org/show_bug.cgi?id=8100
Submitter  : Emil Karlson <jkarlson@cc.hut.fi>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status     : problem is being debugged


Subject    : system doesn't come out of suspend  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
             Soeren Sonnenburg <kernel@nn7.de>
Handled-By : Thomas ...
From: Thomas Gleixner
Date: Tuesday, March 13, 2007 - 1:05 pm

Linus merged the original patch, which solved the real problem. 

He just gave me a lesson how to do it right next time.

	tglx


-

From: Adrian Bunk
Date: Wednesday, March 14, 2007 - 4:31 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Thomas Gleixner
Date: Tuesday, March 13, 2007 - 1:46 pm

That's not a regression. That's an informal message, when the TSC
watchdog detects that the TSC is unreliable. 

	tglx


-

From: Adrian Bunk
Date: Wednesday, March 14, 2007 - 4:44 am

Looking at [1], there's also be a probably related "doesn't boot" 
problem.
My first guess would be commit 6bb74df481223731af6c7e0ff3adb31f6442cfcd
"clocksource init adjustments (fix bug #7426)".

Jiri, is the message also present with 2.6.21-rc2 (at a different place 

cu
Adrian

[1] http://lkml.org/lkml/2007/3/13/219

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Jiri Slaby
Date: Wednesday, March 14, 2007 - 5:16 am

Yes, it's present there too, some lines below the place, where it is placed 
in -rc3.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Adrian Bunk
Date: Wednesday, March 14, 2007 - 10:31 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Florian Lohoff
Date: Wednesday, March 14, 2007 - 11:02 am

With the current git of today the halt on boot is gone. I am running=20
it now ...

Flo
--=20
Florian Lohoff                  flo@rfc822.org             +49-171-2280134
	Those who would give up a little freedom to get a little=20
          security shall soon have neither - Benjamin Franklin
From: Thomas Gleixner
Date: Wednesday, March 14, 2007 - 11:28 am

I'm really curious what made it go away.

	tglx


-

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : ipv6 crash
References : http://lkml.org/lkml/2007/3/10/2
Submitter  : Len Brown <lenb@kernel.org>
Status     : unknown


Subject    : ThinkPad X60: bluetooth hardlocks
References : http://lkml.org/lkml/2007/3/2/85
Submitter  : Pavel Machek <pavel@ucw.cz>
Handled-By : Marcel Holtmann <marcel@holtmann.org>
Status     : unknown


Subject    : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins <kernel@marduk.letterboxes.org>
Handled-By : Ayaz Abdulla <aabdulla@nvidia.com>
Status     : problem is being debugged


-

From: Cornelia Huck
Date: Tuesday, March 13, 2007 - 6:30 am

On Tue, 13 Mar 2007 13:50:03 +0100,

Does this still happen with -rc3? I'd have thought Mark's patch in
0de1517e23c2e28d58a6344b97a120596ea200bb fixed that...
-

From: Mark Lord
Date: Tuesday, March 13, 2007 - 6:35 am

Pavel?  Could you retest this now on a ThinkPad X60 ?

???
-

From: Pavel Machek
Date: Tuesday, March 13, 2007 - 11:13 am

From: Adrian Bunk
Date: Tuesday, March 13, 2007 - 5:50 am

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter  : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status     : problem is being debugged


Subject    : x86_64: boot hangs unless CONFIG_PCIEPORTBUS=n and acpi=off
References : http://bugzilla.kernel.org/show_bug.cgi?id=8162
Submitter  : Randy Dunlap <randy.dunlap@oracle.com>
Status     : unknown


Subject    : ACPI regression with noapic
References : http://lkml.org/lkml/2007/3/8/468
Submitter  : Ray Lee <ray-lk@madrabbit.org>
Status     : unknown


Subject    : acpi_serialize locks system during boot
References : http://bugzilla.kernel.org/show_bug.cgi?id=8171
Submitter  : Colchao <colchaodemola@gmail.com>
Status     : unknown


Subject    : NCQ problem with ahci and Hitachi drive  (ACPI related)
References : http://lkml.org/lkml/2007/3/4/178
             http://lkml.org/lkml/2007/3/9/475
Submitter  : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
Status     : unknown


Subject    : kernels fail to boot with drives on ATIIXP controller
             (ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
             http://lkml.org/lkml/2007/3/4/257
Submitter  : Michal Jaegermann <michal@ellpspace.math.ualberta.ca>
Status     : unknown


Subject    : libata: PATA UDMA/100 configured as UDMA/33
References : http://lkml.org/lkml/2007/2/20/294
             http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html
      ...
From: Alan Cox
Date: Tuesday, March 13, 2007 - 7:03 am

Some cases should be fixed now but probably not all (eg the Nvidia one)
-

From: Fabio Comolli
Date: Tuesday, March 13, 2007 - 1:12 pm

This regression is still present in 2.6.21-rc3-g8b9909de (pulled from
Linus' tree less than one hour ago).

Fabio
-

From: Andi Kleen
Date: Tuesday, March 13, 2007 - 8:13 am

It uses RDTSC when it shouldn't. Already got a fix for that.

-Andi
-

From: Eric W. Biederman
Date: Tuesday, March 13, 2007 - 12:26 pm

Here is a quick summary of the regressions I am looking at.

- Currently we appear to have a pid leak in tty_io.c
  http://lkml.org/lkml/2007/3/8/222

- There is a missing init_WORK in vt.c that cases oops
  when we attempt to use SAK.
  http://lkml.org/lkml/2007/3/11/148

- We have a network ABI regression caused by the latest sysfs
  changes to net-sysfs.c   In particular we now cannot rename network
  devices if our destination name happens to be the name of a sysfs file that
  the network device appears in, and if we try the kernel gets very
  confused and we loose access to the network device. 

  Do we just want to revert commit 43cb76d91ee85f579a69d42bc8efc08bac560278
  Greg has been working on this off and on and has not found a
  simple solution yet.

- pci_save_state, pci_restore_state are broken and have been for a
  while if used on anything besides plain pci (pci-x, pci-e and msi)
  and are not used in pairs. (gregkh and Andrew have the patches to 
  correct this).

- I am still confirming that I have fixed all of the irq handling
  problems that resulted in the "No irq for vector" message.  I think
  I have but I have at least one indirect bug report that I'm still
  following up on.

Eric
-

From: Greg KH
Date: Tuesday, March 13, 2007 - 12:40 pm

I do not think this should be reverted, as the odds that some one will
rename their network device to be "irq" or something else that is in the
pci device's directory is pretty slim.  It also only shows up if
CONFIG_SYSFS_DEPRECATED is disabled, not the common option.

But I am still working on it, I sent you and Kay a patch that, while it

I think these are already in Linus's tree right now, right?

thanks,

greg k-h
-

From: Linus Torvalds
Date: Tuesday, March 13, 2007 - 12:48 pm

Yes. I just wanted some more testing of it, and while I didn't hear much, 
at least Auke added his ack, and the old state was clearly broken, so they 
got applied yesterday.

		Linus
-

From: Eric W. Biederman
Date: Tuesday, March 13, 2007 - 1:04 pm

Oops I missed that.

Eric
-

From: Adrian Bunk
Date: Wednesday, March 14, 2007 - 11:11 am

This email lists known regressions in Linus' tree compared to 2.6.20
with patches available.

If possible, the patches should be included in 2.6.21-rc4 for reducing 
the number of known regressiond in -rc4 a little bit.


If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : USB: Oops when connecting USB 1.1 docks
References : http://lkml.org/lkml/2007/3/4/266
Submitter  : Mark Lord <lkml@rtr.ca>
Caused-By  : Jim Radford <radford@blackbean.org>
             commit d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
Handled-By : Oliver Neukum <oneukum@suse.de>
             Jim Radford <radford@blackbean.org>
Patch      : http://lkml.org/lkml/2007/3/13/217
Status     : patch available


Subject    : snd-intel8x0: no 3d surround sound
References : http://lkml.org/lkml/2007/3/5/164
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Caused-By  : Randy Cushman <rcushman_linux@earthlink.net>
             commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Handled-By : Takashi Iwai <tiwai@suse.de>
Status     : patch available


Subject    : AMD Elan: Crash after "Allocating PCI resources"
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter  : Vladimir Brik <no.hope@gmail.com>
Handled-By : Andi Kleen <ak@muc.de>
Status     : patch available


Subject    : laptop immediately resumes after suspend
References : http://lkml.org/lkml/2007/3/8/469
Submitter  : Ray Lee <ray-lk@madrabbit.org>
Caused-By  : Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
             commit ed41dab90eb40ac4911e60406bc653661f0e4ce1
Handled-By : Len Brown <lenb@kernel.org>
Patch      : http://lkml.org/lkml/2007/3/12/228
Status     : patch available



-

Previous thread: [kj]Patch8:replace pci_find_device in drivers/telephony/ixj.c by Surya on Tuesday, March 6, 2007 - 9:45 pm. (1 message)

Next thread: kernel-headers by zhangxiliang on Tuesday, March 6, 2007 - 10:14 pm. (2 messages)