Re: 2.6.26-rc9: Reported regressions from 2.6.25

Previous thread: [PATCH] kconfig: fix typos: "Suport" -> "Support" by Heikki Orsila on Sunday, July 6, 2008 - 5:48 am. (3 messages)

Next thread: [PATCH] scsi_cmnd.h: remove double inclusion of linux/blkdev.h by Alexander Beregalov on Sunday, July 6, 2008 - 6:01 am. (2 messages)
From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:39 am

This message contains a list of some regressions from 2.6.25, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.25, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-07-06      166       38          26
  2008-06-29      158       43          31
  2008-06-22      148       39          28
  2008-06-14      130       37          28
  2008-06-07      125       48          33
  2008-05-31      115       52          31
  2008-05-24       94       47          28
  2008-05-18       80       51          37
  2008-05-11       53       46          34


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11041
Subject		: All 2.6.26-rcX hang immediately after loading ohci_hcd
Submitter	: Andrey Borzenkov <arvidjaar@mail.ru>
Date		: 2008-07-05 7:08 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121524504505805&w=4
Handled-By	: Linus Torvalds <torvalds@linux-foundation.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11040
Subject		: 2.6.26-rc: host can not shutdown: ata problem
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2008-07-03 21:43 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121512197225068&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11039
Subject		: 2.6.28-rc8-git3 forcedeth WARNING (kills the interface)
Submitter	: Brad Campbell <brad@wasp.net.au>
Date		: 2008-07-03 10:07 (4 days old)
References	: ...
From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:39 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10493
Subject		: mips BCM47XX compile error
Submitter	: Adrian Bunk <adrian.bunk@movial.fi>
Date		: 2008-04-20 17:07 (78 days old)
References	: http://lkml.org/lkml/2008/4/20/34
		  http://lkml.org/lkml/2008/5/12/30
		  http://lkml.org/lkml/2008/5/18/131
		  http://lkml.org/lkml/2008/5/31/202
		  http://lkml.org/lkml/2008/6/7/154
Patch		: http://marc.info/?l=linux-kernel&m=120876451216558&w=2


--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 6:39 am

cu
Adrian
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-05 09:59 (63 days old)
References	: http://lkml.org/lkml/2008/5/5/28
Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10714
Subject		: powerpc: Badness seen on 2.6.26-rc2 with lockdep enabled
Submitter	: Balbir Singh <balbir@linux.vnet.ibm.com>
Date		: 2008-05-14 12:57 (54 days old)
References	: http://marc.info/?l=linux-kernel&m=121076917429133&w=4
Handled-By	: Benjamin Herrenschmidt <benh@kernel.crashing.org>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10741
Subject		: bug in `tty: BKL pushdown'?
Submitter	: Johannes Weiner <hannes@saeurebad.de>
Date		: 2008-05-18 2:16 (50 days old)
References	: http://marc.info/?l=linux-kernel&m=121107706506181&w=4
		  http://lkml.org/lkml/2008/6/16/104
Handled-By	: Alan Cox <alan@lxorguk.ukuu.org.uk>


--

From: Johannes Weiner
Date: Sunday, July 6, 2008 - 7:47 am

Hi,



	Hannes
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10821
Subject		: rt25xx: lock dependency warning, association failure, and kmalloc corruption
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-05-29 14:30 (39 days old)
Handled-By	: Ivo van Doorn <IvDoorn@gmail.com>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10786
Subject		: parisc: 64bit SMP does not boot on J5600
Submitter	: Domenico Andreoli <cavokz@gmail.com>
Date		: 2008-05-22 16:14 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=121147328028081&w=4


--

From: Domenico Andreoli
Date: Thursday, July 10, 2008 - 6:42 am

I am going to check it later. Last time I saw it was around 2.6.26-rc7/8.

Anyway the error message was not the same firstly reported there,

ciao,
Domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://www.dandreoli.com/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50
--

From: Domenico Andreoli
Date: Thursday, July 24, 2008 - 9:43 am

Hi,


I finally have the test result:

Linux version 2.6.26 (cavok@ska) (gcc version 4.1.3 20080623 (prerelease) (Debian 4.1.2-23)) #34 SMP Wed Jul 23 13:50:45 CEST 2008
FP[0] enabled: Rev 1 Model 16
The 64-bit Kernel has started...
console [ttyB0] enabled
Initialized PDC Console for debugging.
Determining PDC firmware type: System Map.
model 00005d10 00000491 00000000 00000002 77b406fc 100000f0 00000008 000000b2 000000b2
vers  00000300
CPUID vers 17 rev 10 (0x0000022a)
capabilities 0x3
model 9000/785/J5600
Total Memory: 3840 MB
LCD display at fffffff0f05d0008,fffffff0f05d0000 registered
SMP: bootstrap CPU ID is 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 969600
Kernel command line: root=/dev/sdb5 panic=60 HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux.git
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 160x64
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 3858944k/3932160k available (2967k kernel code, 73036k reserved, 1304k data, 240k init)
virtual kernel memory layout:
    vmalloc : 0x0000000000008000 - 0x000000003f000000   (1007 MB)
    memory  : 0x0000000040000000 - 0x0000000130000000   (3840 MB)
      .init : 0x00000000405c4000 - 0x0000000040600000   ( 240 kB)
      .data : 0x00000000403e5d88 - 0x000000004052c000   (1304 kB)
      .text : 0x0000000040100000 - 0x00000000403e5d88   (2967 kB)
Calibrating delay loop... <6>1101.00 BogoMIPS (lpj=5505024)
Security Framework initialized
Mount-cache hash table entries: 256
Brought up 1 CPUs
net_namespace: 1432 bytes
NET: Registered protocol family 16
EISA bus registered
Searching for devices...
Found devices:
1. Astro BC Runway Port at 0xfffffffffed00000 [10] { 12, 0x0, 0x582, 0x0000b }
2. Elroy PCI Bridge at 0xfffffffffed30000 [10/0] { 13, 0x0, 0x782, 0x0000a }
3. Elroy PCI Bridge at 0xfffffffffed32000 [10/1] { 13, 0x0, 0x782, 0x0000a }
4. Elroy PCI Bridge at ...
From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10725
Subject		: USB Mass storage mount fails: Write protect on
Submitter	: Maciej Rutecki <maciej.rutecki@gmail.com>
Date		: 2008-05-16 14:55 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=121095168003572&w=4
Handled-By	: Alan Stern <stern@rowland.harvard.edu>
Patch		: http://marc.info/?l=linux-scsi&m=121433068314568&w=2


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10906
Subject		: repeatable slab corruption with LTP msgctl08
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-06-12 5:13 (25 days old)
References	: http://marc.info/?l=linux-kernel&m=121324775927704&w=4
Handled-By	: Pekka J Enberg <penberg@cs.helsinki.fi>
		  Christoph Lameter <clameter@sgi.com>
		  Manfred Spraul <manfred@colorfullife.com>
		  Andi Kleen <andi@firstfloor.org>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10954
Subject		: hda_intel: azx_get_response timeout, switching to polling mode: last cmd=0x011f000c
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-06-21 2:05 (16 days old)
References	: http://marc.info/?l=linux-kernel&m=121401399622190&w=4
		  http://marc.info/?t=121416231700010&r=1&w=4


--

From: Justin Mattock
Date: Sunday, July 6, 2008 - 10:47 am

yes;
as an estimate(been busy with another bug), I'm going to say out of
100 reboots this message will
appear 2 times in a row, and then disappear.
regards;

-- 
Justin P. Mattock
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10919
Subject		: [regression] display dimming is slow and laggy - Acer Travelmate 661lci
Submitter	: Maximilian Engelhardt <maxi@daemonizer.de>
Date		: 2008-06-14 22:31 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=121348428828320&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10955
Subject		: v2.6.26-rc7: BUG task_struct: Poison overwritten
Submitter	: Vegard Nossum <vegard.nossum@gmail.com>
Date		: 2008-06-21 19:24 (16 days old)
References	: http://marc.info/?l=linux-kernel&m=121407641925121&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10984
Subject		: MMC print trace information when resume from suspend
Submitter	: Jie Luo <clotho67@gmail.com>
Date		: 2008-06-26 01:15 (11 days old)


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10960
Subject		: 2.6.26-rc: SPARC: Sun Ultra 10 can not boot
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2008-06-19 14:07 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121388456519637&w=4
Handled-By	: David Miller <davem@davemloft.net>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10989
Subject		: kernel oopses when wiggling the mouse to make it known to hidd
Submitter	: Daniel Vetter <daniel@ffwll.ch>
Date		: 2008-06-26 10:32 (11 days old)


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11008
Subject		: after laptop re-dock: usb-storage device no longer detected
Submitter	: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date		: 2008-06-26 17:37 (11 days old)
References	: http://lkml.org/lkml/2008/6/26/391
Handled-By	: Alan Stern <stern@rowland.harvard.edu>
Patch		: http://lkml.org/lkml/2008/6/30/305


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11009
Subject		: No console on Riva TNT since 2.6.26-0.rc4
Submitter	: Quel Qun <kelk1@comcast.net>
Date		: 2008-06-26 20:04 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=121451344229718&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11025
Subject		: [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected)
Submitter	: Dan Williams <dan.j.williams@intel.com>
Date		: 2008-07-01 1:57 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=121487749429883&w=4
Handled-By	: Mel Gorman <mel@csn.ul.ie>
		  Andy Whitcroft <apw@shadowen.org>


--

From: Mel Gorman
Date: Monday, July 7, 2008 - 9:16 am

Fixed by commit 494de90098784b8e2797598cefdd34188884ec2e and should be
closed.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Rafael J. Wysocki
Date: Monday, July 7, 2008 - 3:30 pm

Already closed.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11035
Subject		: System hangs on 2.6.26-rc8
Submitter	: Roman Mindalev <lists@r000n.net>
Date		: 2008-07-02 14:25 (5 days old)
References	: http://marc.info/?l=linux-kernel&m=121500871414995&w=4


--

From: Roman Mindalev
Date: Thursday, July 10, 2008 - 5:29 am

I have this problem with 2.6.25 too.
Unfortunately, I can't check 2.6.24. With config from 2.6.26-rc8 I got
error of compiling:

  LD      .tmp_vmlinux1
kernel/built-in.o: In function `timespec_add_ns':
/usr/src/kernels/linux-2.6.24/include/linux/time.h:179: undefined
reference to `__umoddi3'
kernel/built-in.o: In function `do_gettimeofday':
/usr/src/kernels/linux-2.6.24/kernel/time/timekeeping.c:131: undefined
reference to `__udivdi3'
/usr/src/kernels/linux-2.6.24/kernel/time/timekeeping.c:132: undefined
reference to `__umoddi3'
kernel/built-in.o: In function `timespec_add_ns':
/usr/src/kernels/linux-2.6.24/include/linux/time.h:174: undefined
reference to `__udivdi3'
/usr/src/kernels/linux-2.6.24/include/linux/time.h:179: undefined
reference to `__umoddi3'
/usr/src/kernels/linux-2.6.24/include/linux/time.h:174: undefined
reference to `__udivdi3'
/usr/src/kernels/linux-2.6.24/include/linux/time.h:179: undefined
reference to `__umoddi3'
/usr/src/kernels/linux-2.6.24/include/linux/time.h:174: undefined
reference to `__udivdi3'
/usr/src/kernels/linux-2.6.24/include/linux/time.h:179: undefined
reference to `__umoddi3'
make: *** [.tmp_vmlinux1] Error 1

I attached 3 configs, from 2.6.26-rc8 (where I in first time got hang of
system as result of SIGSEGV), from 2.6.25, and from 2.6.24
From: Luiz Fernando N. Capitulino
Date: Thursday, July 10, 2008 - 11:08 am

Em Thu, 10 Jul 2008 16:29:39 +0400
Roman Mindalev <lists@r000n.net> escreveu:

| Rafael J. Wysocki wrote:
| > This message has been generated automatically as a part of a report
| > of recent regressions.
| > 
| > The following bug entry is on the current list of known regressions
| > from 2.6.25.  Please verify if it still should be listed.
| > 
| > 
| > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11035
| > Subject		: System hangs on 2.6.26-rc8
| > Submitter	: Roman Mindalev <lists@r000n.net>
| > Date		: 2008-07-02 14:25 (5 days old)
| > References	: http://marc.info/?l=linux-kernel&m=121500871414995&w=4
| > 
| > 
| > 
| I have this problem with 2.6.25 too.
| Unfortunately, I can't check 2.6.24. With config from 2.6.26-rc8 I got
| error of compiling:
| 
|   LD      .tmp_vmlinux1
| kernel/built-in.o: In function `timespec_add_ns':
| /usr/src/kernels/linux-2.6.24/include/linux/time.h:179: undefined
| reference to `__umoddi3'

 [...]

 You're trying to compile with gcc 4.3, right? You can try the
attached fix or the following cherry-pick if you're compiling
from a git tree:

$ git cherry-pick 38332cb98772f5ea757e6486bed7ed0381cb5f98

 But I can't tell whether the compiler and/or the fix will
change the behaivor of the bug you're reporting.

-- 
Luiz Fernando N. Capitulino
From: Roman Mindalev
Date: Saturday, July 12, 2008 - 8:07 am

Yes, GCC 4.3.1. With time.patch kernel compiled successfully. Thanks for
help!
--

From: Roman Mindalev
Date: Saturday, July 12, 2008 - 8:48 am

Announce: It is long history, if you don't want to read it, go directly
to assumption ;)

Short description of problem: SIGSEGV on high I/O load (reading packages
database, kernel untaring) without any records in logs

Prehistory: 2.6.26-rc7 works, 2.6.26-rc8 buggy, configs very simular.

History: In last days I compiled some 2.6.25 kernels (took it, because
it is stable) with different configs (step-by-step disabling options,
not equals in -rc7 and -rc8) and got next results:

2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, snd_seq_dummy,
snd_rtctimer, snd_seq_rtctimer_default, debug_preempt, rcu_tortune_test)
- bug
2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, snd_seq_dummy,
debug_preempt, rcu_tortune_test) - bug
2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, debug_preempt,
rcu_tortune_test) - bug
2.6.25 (preemtible, preempt rcu, 300 Hz, debug_preempt,
rcu_tortune_test) - bug
2.6.25 (preemtible, preempt rcu, 250 Hz, debug_preempt,
rcu_tortune_test) - bug
2.6.25 (preemtible, preempt rcu, 250 Hz, snd_rtctimer, debug_preempt,
rcu_tortune_test) - bug
2.6.25 (preemtible, preempt rcu, 100 Hz, debug_preempt,
rcu_tortune_test) - bug
2.6.25 (preemtible, 250 Hz, debug_preempt, rcu_tortune_test) - bug
2.6.25 (250 Hz, rcu_tortune_test) - bug
2.6.25 (250 Hz) - bug

And I understand - problem not (only?) in kernel, problem in GCC too (I
updated whole system in June).

2.6.24 - one kernel (I'm tested from it to latest rc), which (with
time.patch) works with GCC 4.3.1
2.6.25 and above (tested with 2.6.26-rc7, 2.6.26-rc8, 2.6.26-rc9) don't
works, if compiled with this compiler version.

Then I look on my (working) kernels - 2.6.26-rc6 was compiled with GCC
4.2.4, and 2.6.26-rc7 too...

In testing purposes I took some listed kernels and recompiled them with
other GCC version.

Common results in table:
GCC 4.3.1, kernel 2.6.24 - works
GCC 4.2.4, kernel 2.6.25 - works
GCC 4.3.1, kernel 2.6.25 - bug
GCC 4.2.4, kernel 2.6.26-rc7 - works
GCC 4.3.1, ...
From: Roman Mindalev
Date: Tuesday, July 15, 2008 - 6:40 am

I done bisection.
Result below:

8f46924600e30b140445f5b84abe9b80d2fff5fb is first bad commit
commit 8f46924600e30b140445f5b84abe9b80d2fff5fb
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Jan 30 13:34:09 2008 +0100

    x86: enable CONFIG_DEBUG_PAGEALLOC more widely

    make CONFIG_DEBUG_PAGEALLOC universally available.

    CONFIG_HIBERNATION and CONFIG_HUGETLBFS was disabling it, for no
    particular reason.

    If there are any unfixed bugs here we'll fix it, but do not disable
    vital debugging facilities like that ..

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
446e8dd9bb2fcb1698d038b09800dc8aa8c335ab M      arch

Commit body:

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2a859a7..347e33e 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -40,7 +40,7 @@ comment "Page alloc debug is incompatible with
Software Suspend on i386"

 config DEBUG_PAGEALLOC
        bool "Debug page memory allocations"
-       depends on DEBUG_KERNEL && !HIBERNATION && !HUGETLBFS
+       depends on DEBUG_KERNEL
        help
          Unmap pages from the kernel linear mapping after free_pages().
          This results in a large slowdown, but helps to find certain types

I applied it (reversed) to 2.6.25 source and compiled new kernel.
Hibernation enabled, hugetlbfs too. And difference between configs:

diff config-2.6.25-old config-2.6.25-new
4c4
< # Sat Jul 12 17:24:17 2008
1927d1926
< CONFIG_DEBUG_PAGEALLOC=y

I have no problems with this (new) config.

Seems conflict between new features in GCC 4.3.1 and pagealloc debug?
From: Ingo Molnar
Date: Friday, July 18, 2008 - 12:11 am

as far as i can see you see a lockup under certain circumstances, right?

this debug option catches use-after-free and other types of invalid 
memory accesses. When it catches a bug the kernel most likely crashes 
and produces a backlog. Because you are in graphical mode that is not 
visible.

This would possibly be debuggable if you set up netconsole logging to 
another system on a local LAN - see 
Documentation/networking/netconsole.txt.

Vegard - would it be possible to make DEBUG_PAGEALLOC faults single-shot 
and non-fatal, just like kmemcheck does it? That way people would see a 
nice kernel message instead of an immediate crash. That means we'd have 
to find a reliable filter for DEBUG_PAGEALLOC-provoked pagefaults though 
...

	Ingo
--

From: Nigel Cunningham
Date: Friday, July 18, 2008 - 12:17 am

Hi.


Not that it matters now, but that original commit message was wrong -
CONFIG_HIBERNATION used to be incompatible with CONFIG_DEBUG_PAGEALLOC
because [u]swsusp didn't (until very recently) handle the fact that
DEBUG_SLAB unmaps empty pages on x86.

Regards,

Nigel

--

From: Vegard Nossum
Date: Friday, July 18, 2008 - 12:28 am

Hm.. Yes, we could do it in a similar fashion using single-stepping.
It should take little effort; we already have most of the code to do
it; mmiotrace does the same thing too, after all.

These are some considerations:

1. If the page is kernel space but currently unmapped, does it point
to a valid page of RAM even though it is non-present?
2. Should we allow reading/writing of the underlying physical page (if
it exists), or should we prevent writes (i.e. allow the instruction to
proceed, but don't really write anything) and reads (i.e. allow the
instruction to read 0 or another magic number).

For the filter you mentioned, we could perhaps use one more bit in the
PTE. This is what we do for kmemcheck, and IIRC DEBUG_PAGEALLOC is
incompatible with kmemcheck anyway (I don't remember why exactly), so
we could reuse the same bit.

BTW, I didn't consider that argument (of continuing as far as
possible) before, but it's a good one; if we don't crash completely,
the user can still copy the log we have a better report of it. I guess
kerneloops.org is currently missing out a great deal of reports which
all shut down the machine immediately without a chance to go into the
log.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Ingo Molnar
Date: Friday, July 18, 2008 - 3:25 pm

yes. There are two techniques to improve the 'yield' of kerneloops.org: 
1) make a better job of getting the logs off the box 2) make a better 
job of not crashing the box when we can do better.

For example lockdep tries very hard to never crash the box. It is a 
feature that warns about a chance of a lockup, not about a lockup itself 
- so crashing the box at the point of the bug detection is 
counter-productive.

The same applies to DEBUG_PAGEALLOC as well: technically nobody (but the 
buggy code itself) is hurt by accessing already freed data. So we could 
try and let it run.

(Btw., this might be a way to share a mechanism between kmemcheck and 
DEBUG_PAGEALLOC, and make kmemcheck more useful to the general kernel as 
a whole.)

	Ingo
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11039
Subject		: 2.6.28-rc8-git3 forcedeth WARNING (kills the interface)
Submitter	: Brad Campbell <brad@wasp.net.au>
Date		: 2008-07-03 10:07 (4 days old)
References	: http://marc.info/?l=linux-netdev&m=121508714430752&w=4


--

From: Brad Campbell
Date: Sunday, July 6, 2008 - 6:44 am

While it is certainly a problem I can't verify it as a regression. When I got the machine I ran it 
with 2.6.25 but found SATA errors were locking the box.

The SATA issue is resolved with 2.6.26-rc and I'm not terribly keen to risk my data to go back and 
check unless someone absolutely needs me to.

It does appear to be quite a problem though.

brad@srv:~$ dmesg | head -n5
[    0.000000] Linux version 2.6.26-rc8-git4 (brad@srv) (gcc version 4.1.2 20061115 (prerelease) 
(Debian 4.1.1-21)) #5 SMP Fri Jul 4 23:08:38 GST 2008
[    0.000000] Command line: root=/dev/md1 ro
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009d400 (usable)
[    0.000000]  BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved)

brad@srv:~$ dmesg | grep 'eth1: tx_timeout' | wc -l
27

brad@srv:~$ uptime
  17:40:25 up 1 day,  1:15,  5 users,  load average: 0.73, 0.61, 0.49

Regards,
Brad
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11040
Subject		: 2.6.26-rc: host can not shutdown: ata problem
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2008-07-03 21:43 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121512197225068&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9791
Subject		: Clock is running too fast^Wslow using acpi_pm clocksource
Submitter	: tosn00j02@sneakemail.com
Date		: 2008-05-03 05:09 (65 days old)
Handled-By	: Maciej W. Rozycki <macro@linux-mips.org>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=16180


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11023
Subject		: 2.6.26-rc8-git2 - kernel BUG at mm/page_alloc.c:585
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-07-02 11:55 (5 days old)
References	: http://lkml.org/lkml/2008/7/2/32
Handled-By	: Andrew Morton <akpm@linux-foundation.org>


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10865
Subject		: Oops trying to mount an ntfs partition on thinkpad
Submitter	: Alex Romosan <romosan@sycorax.lbl.gov>
Date		: 2008-06-05 14:47 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121267834421414&w=4


--

From: Alex Romosan
Date: Sunday, July 6, 2008 - 9:07 am

save behaviour with rc9.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10860
Subject		: total system freeze at boot with 2.6.26-rc
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-06-05 12:38 (32 days old)
Handled-By	: Tejun Heo <htejun@gmail.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=16556


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11041
Subject		: All 2.6.26-rcX hang immediately after loading ohci_hcd
Submitter	: Andrey Borzenkov <arvidjaar@mail.ru>
Date		: 2008-07-05 7:08 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121524504505805&w=4
Handled-By	: Linus Torvalds <torvalds@linux-foundation.org>


--


This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11042
Subject		: build issue #477 for v2.6.26-rc8-290-gb8a0b6c : input_event" [drivers/media/dvb/ttpci/dvb-ttpci.ko] undefined!
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-07-05 15:25 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121527158632563&w=4
Handled-By	: Oliver Endriss <o.endriss@gmx.de>
Patch		: http://marc.info/?l=linux-kernel&m=121529790229531&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10815
Subject		: 2.6.26-rc4: RIP find_pid_ns+0x6b/0xa0
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-27 09:23 (41 days old)
References	: http://lkml.org/lkml/2008/5/27/9
		  http://lkml.org/lkml/2008/6/14/87
Handled-By	: Oleg Nesterov <oleg@tv-sign.ru>
		  Linus Torvalds <torvalds@linux-foundation.org>
		  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Patch		: http://lkml.org/lkml/2008/5/28/16


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11024
Subject		: 2.6.25 to 2.6.26-rc8 regression (related to ahci and acpi _GTF)
Submitter	: Mathieu Bérard <Mathieu.Berard@crans.org>
Date		: 2008-07-01 9:39 (6 days old)
References	: http://marc.info/?t=121490593600001&r=1&w=4
Handled-By	: Tejun Heo <htejun@gmail.com>
Patch		: http://marc.info/?l=linux-kernel&m=121514631317343&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10726
Subject		: x86-64 NODES_SHIFT compile failure.
Submitter	: Dave Jones <davej@codemonkey.org.uk>
Date		: 2008-05-16 12:54 (52 days old)
References	: http://lkml.org/lkml/2008/5/16/312
Handled-By	: Mike Travis <travis@sgi.com>
Patch		: http://lkml.org/lkml/2008/5/16/343


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10985
Subject		: backlight doesn't come on after resume with i915 video
Submitter	: Jon Dowland <jon+bugzilla.kernel.org@alcopop.org>
Date		: 2008-06-26 02:09 (11 days old)


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10957
Subject		: pata_pcmcia with Sandisk Extreme III 8GB
Submitter	: Komuro <komurojun-mbn@nifty.com>
Date		: 2008-06-07 13:37 (30 days old)
References	: http://marc.info/?l=linux-kernel&m=121284627119861&w=4
Handled-By	: Tejun Heo <htejun@gmail.com>
		  Dominik Brodowski <linux@dominikbrodowski.net>
		  Komuro <komurojun-mbn@nifty.com>
Patch		: http://marc.info/?l=linux-kernel&m=121530861605673&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10971
Subject		: radeonfb : radeon X800 family support (atombios)
Submitter	: Jimmy.Jazz@gmx.net
Date		: 2008-06-23 14:35 (14 days old)


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11006
Subject		: 2.6.26-rc6: pcmcia stopped working
Submitter	: Pavel Machek <pavel@suse.cz>
Date		: 2008-06-22 22:40 (15 days old)
References	: http://marc.info/?l=linux-kernel&m=121420740806363&w=4
		  http://marc.info/?t=121439185700001&r=1&w=4
Handled-By	: Tejun Heo <htejun@gmail.com>
		  Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Patch		: http://marc.info/?l=linux-kernel&m=121526230022719&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10724
Subject		: ACPI: EC: GPE storm detected, disabling EC GPE
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-05-16 6:17 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=121091875711824&w=4
		  http://lkml.org/lkml/2008/5/18/168
		  http://lkml.org/lkml/2008/5/25/195
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view


--

From: Justin Mattock
Date: Sunday, July 6, 2008 - 10:44 am

yes;
From what I can see so far,  the reason for the gpe storm detector going off
is due to too many interrupts with the battery. As an example there was
a bug filed for the same issue with the macbook's:
https://bugzilla.novell.com/show_bug.cgi?id=301365#c10
using acpi_osi=Darwin does prevent the gpe storm detector from
going off, but you loose any info on the battery. If anybody has any ideas on
modifying any of the battery modules, or adjusting the DSDT it sure
would be appreciated.
regards;

-- 
Justin P. Mattock
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10861
Subject		: 2.6.26-rc4-git2 - long pause during boot
Submitter	: Chris Clayton <chris2553@googlemail.com>
Date		: 2008-06-01 4:15 (36 days old)
References	: http://marc.info/?l=linux-kernel&m=121229382917834&w=4


--

From: James Bottomley
Date: Sunday, July 6, 2008 - 7:33 am

Erm, this isn't a kernel regression, it's a udev rule problem.

The rule has been identified and fixed; the reporter confirms it fixes
the problem and the upstream maintainer of udev is putting out a release
with it in ... what more exactly needs to be done?

James


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 10:16 am

Closed as "invalid".

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10862
Subject		: forcedeth: lockdep warning on ethtool -s
Submitter	: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Date		: 2008-06-01 8:37 (36 days old)
References	: http://marc.info/?l=linux-kernel&m=121230964032247&w=4


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10843
Subject		: Display artifacts on XOrg logout with PAT kernel and VESA framebuffer
Submitter	: Frans Pop <elendil@planet.nl>
Date		: 2008-05-31 14:04 (37 days old)
References	: http://lkml.org/lkml/2008/6/7/206
		  http://lkml.org/lkml/2008/6/15/119
		  http://lkml.org/lkml/2008/6/23/160


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:45 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10872
Subject		: x86_64 boot hang when CONFIG_NUMA=n
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2008-06-05 21:50 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121270308607116&w=4
		  http://lkml.org/lkml/2008/6/11/355
		  http://lkml.org/lkml/2008/6/15/117
Handled-By	: Yinghai Lu <yhlu.kernel@gmail.com>


--

From: Randy Dunlap
Date: Sunday, July 6, 2008 - 11:17 am

This still happens with 2.6.26-rc9.  Using CONFIG_NUMA=y boots OK.

The last lines from the (netconsole) log are:

calling  early_fill_mp_bus_info+0x0/0x7b2
node 0 link 1: io port [1000, 3fff]
node 1 link 2: io port [4000, ffff]
TOM: 0000000080000000 aka 2048M
node 0 link 1: mmio [e8000000, fddfffff]
node 1 link 2: mmio [fde00000, fdffffff]
node 0 link 1: mmio [80000000, 83ffffff]
node 1 link 2: mmio [84000000, 8fffffff]
node 0 link 1: mmio [a0000, bffff]
TOM2: 0000000280000000 aka 10240M
bus: [00,3f] on node 0 link 1
bus: 00 index 0 io port: [0, 3fff]
bus: 00 index 1 mmio: [90000000, fddfffff]
bus: 00 index 2 mmio: [80000000, 83ffffff]
bus: 00 index 3 mmio: [a0000, bffff]
bus: 00 index 4 mmio: [fe000000, ffffffff]
bus: 00 index 5 mmio: [280000000, fcffffffff]
bus: [40,ff] on node 1 link 2
bus: 40 index 0 io port: [4000, ffff]
bus: 40 index 1 mmio: [fde00000, fdffffff]


---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
--

From: Linus Torvalds
Date: Sunday, July 6, 2008 - 11:33 am

Ok, then it wasn't the nr_zones thing.

Since it seems to be repeatable for you, can you bisect it?

		Linus
--

From: Ingo Molnar
Date: Sunday, July 6, 2008 - 11:32 pm

one guess would be:

| commit e8ee6f0ae5cd860e8e6c02807edfa3c1fa01bcb5
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date:   Sun Apr 13 01:41:58 2008 -0700
|
|     x86: work around io allocation overlap of HT links

but ... since CONFIG_NUMA makes it work, i'm not sure about that.

Randy, could you post the full CONFIG_NUMA bootlog as well, does it show 
any difference in resource allocations?

	Ingo

--

From: Yinghai Lu
Date: Sunday, July 6, 2008 - 11:57 pm

l looked resource allocations in that bootlog.

all my AMD test servers work well with Randy's config (!NUMA)
( Linus tree or tip tree)

YH
--

From: Randy Dunlap
Date: Monday, July 7, 2008 - 11:39 am

Good and bad boot logs are attached.  There are several differences, but I don't
see any that are significant.

I've started bisecting with:

$ git bisect start
$ git bisect bad v2.6.26-rc1
$ git bisect good v2.6.25

That's only about 1.29M lines of changes.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/

From: Randy Dunlap
Date: Monday, July 7, 2008 - 3:40 pm

git bisect and normal rebooting did not find a problem.

I'll repeat this using kexec to boot the new kernel and see if that
locates any issues... since I normally use kexec to load/test new kernels
and that was how the failure occurred (occurs).

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
--

From: Yinghai Lu
Date: Monday, July 7, 2008 - 5:24 pm

same NON-NUMA kernel kexec NON-NUMA kernel?

or other kernel kexex it?

YH
--

From: Randy Dunlap
Date: Monday, July 7, 2008 - 10:28 pm

Ah.  Good question.  I hadn't noticed that.
NUMA kernel kexec-ing a non-NUMA kernel now fails, but it worked in 2.6.25.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
--

From: Yinghai Lu
Date: Tuesday, July 8, 2008 - 12:07 am

can you resend out that two config?

YH
--

From: Randy Dunlap
Date: Tuesday, July 8, 2008 - 8:44 am

The host/first kernel that loads the second/failing kernel uses
config-2625-work.  The second kernel that hangs during boot uses
kconfig.numa.bad .  (both attached)

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
From: Yinghai Lu
Date: Tuesday, July 8, 2008 - 12:02 pm

too bad, still can not dupicate here with your sequence.

YH
From: Ingo Molnar
Date: Monday, July 7, 2008 - 10:16 pm

ah, so the hang only occurs if you kexec a post-v2.6.25 kernel? (from 
any other kernel, or from a post-v2.6.25 kernel?)

	Ingo
--

From: Randy Dunlap
Date: Tuesday, July 8, 2008 - 8:39 am

Host (first) kernel is 2.6.25 with NUMA=y.
kexec of 2.6.25 with NUMA=n works, kexec of 2.6.26-rc[123456789] with
NUMA=n fails/hangs.  kexec of 2.6.26-rc* with NUMA=y works.

kernel configs will be in next email reply to YH.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
--

From: Ingo Molnar
Date: Sunday, July 6, 2008 - 11:42 pm

one thing i dont see you having followed up on is whether tip/master 
works fine:

  http://lkml.org/lkml/2008/6/11/355

(or, whether linux-next works fine with the same !NUMA config.)

i.e. whether this is a genuine new problem or something we already 
fixed. (just didnt realize the upstream relevance of)

( if tip/master works fine then it would be very useful to do an 
  'inverse bisection' for the fix. )

	Ingo
--

From: Yinghai Lu
Date: Monday, July 7, 2008 - 12:15 am

in the response to me, he said tip/master has the same problem

YH
--

From: Ingo Molnar
Date: Monday, July 7, 2008 - 12:24 am

ok, so an unfixed problem.

	Ingo
--

From: Ingo Molnar
Date: Sunday, July 6, 2008 - 7:14 am

fixed by:

 commit efac41894df57d32b483ac622d03541b5b2692c0
 Author: Thomas Gleixner <tglx@linutronix.de>
 Date:   Tue Jul 1 08:56:32 2008 +0200

     x86: fix NODES_SHIFT Kconfig range

	Ingo
--

From: Linus Torvalds
Date: Sunday, July 6, 2008 - 8:46 am

Fixed by commit 494de90098784b8e2797598cefdd34188884ec2e: "Do not 

I wonder if this one could be related. The 'nr_zones' overwriting bug 
would result in kswapd not reclaiming any memory asynchronously, so the 
kernel would basically be constantly under a low-memory situation, and 
processes would be forced to do synchronous reclaim. 

That, in turn, could easily explain laggy operation, especially if it is 
something bigger that needs to allocate new memory (not that I know if X 
dimming needs to, but I could imagine that it does some double buffering 

This really doesn't sound like low memory, but the CONFIG_NUMA thing is 
intriguing, since again, the 'nr_zones' thing depended on that. It would 
break 'balance_pgdat()' entirely, and maybe some balancing operation can 
get confused even before you actually run out of memory


This patch got merged: commit 1236edf1c70107a0d31b3fba0b2a8783615d0d24 

Ditto: commit 7cd95f56cb61f5348d062527c9d3653196f6e629 ("ide: fix 

Fixed in commit 70a3143af87c6ca188107cbd49ab5eec2c86c456 ("sata_uli: 

This one is the same thing that is reported as unresolved, and no, I don't 
think that existing patch was ever really tested to fix anything. Paul?

I suspect SRCU will need to be simply marked BROKEN for now, because 
nobody knows what the problem Alexey sees is. Apparently it's been seen by 

Fixed in commit efac41894df57d32b483ac622d03541b5b2692c0 ("x86: fix 

Hmm. James?

				Linus
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 8:58 am

Thanks, all closed.

(I do try to close fixed regression bugs immediately after a fix enters
 your tree, but espicially for commits without a reference to a Bugzilla

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 10:05 am

Also, in the next couple of days I'll be closing the bugs the reporters of
which have been totally unresponsive.

I'll post an updated report after that.

Thanks,
Rafael
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 10:40 am

If no action by us gets combined with automated weekly emails then
no responses to the latter is not an unexpected event.

Such a submitter might be perfectly responsive to actual work on a bug 
while really pissed off by getting the bug closed.

Look e.g. at #10865 that had a 1 month gap in submitter responses,
but the actual problem is that noone of us ever bothered to look at
this Oops...

I just did a run through all open 2.6.26-rc regressions, and I did not 
find a single one where we seem to be waiting for some time for an 

cu
Adrian

[1] unless there was communication not linked from Bugzilla

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 11:02 am

The following are my candidates:

10629
10786
10815
10906 (the thread has apparently died)
11009

Thanks,
Rafael
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 11:26 am

Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10629
Subject         : 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
Submitter       : Alexey Dobriyan <adobriyan@gmail.com>
Date            : 2008-05-05 09:59 (63 days old)
References      : http://lkml.org/lkml/2008/5/5/28
Handled-By      : Paul E. McKenney <paulmck@linux.vnet.ibm.com>



Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10786
Subject         : parisc: 64bit SMP does not boot on J5600
Submitter       : Domenico Andreoli <cavokz@gmail.com>
Date            : 2008-05-22 16:14 (46 days old)
References      : http://marc.info/?l=linux-kernel&m=121147328028081&w=4


Submitter sent bug report, it seems no kernel developer ever bothered
to answer.


Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10815
Subject         : 2.6.26-rc4: RIP find_pid_ns+0x6b/0xa0
Submitter       : Alexey Dobriyan <adobriyan@gmail.com>
Date            : 2008-05-27 09:23 (41 days old)
References      : http://lkml.org/lkml/2008/5/27/9
                  http://lkml.org/lkml/2008/6/14/87
Handled-By      : Oleg Nesterov <oleg@tv-sign.ru>
                  Linus Torvalds <torvalds@linux-foundation.org>
                  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Patch           : http://lkml.org/lkml/2008/5/28/16


Linus just restarted discussing this one and #10629.

My impression of both bugs is that not that Alexey was unresponsive but 

Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10906
Subject         : repeatable slab corruption with LTP msgctl08
Submitter       : Andrew Morton <akpm@linux-foundation.org>
Date            : 2008-06-12 5:13 (25 days old)
References      : http://marc.info/?l=linux-kernel&m=121324775927704&w=4
Handled-By      : Pekka J Enberg <penberg@cs.helsinki.fi>
                  Christoph Lameter <clameter@sgi.com>
                  Manfred Spraul <manfred@colorfullife.com>
                  Andi Kleen <andi@firstfloor.org>

Andrew Morton said [1]:
  ...
From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 2:04 pm

The report is 46 days old and the reporter has been sent a request to confirm
the presence of the problem every week.  Since he hasn't responded to any
of those requests, I assume we're not going to hear from him.  Thus, it's not
useful to track this any more.

Thanks,
Rafael
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 2:32 pm

Andrew Morton also never bothered to answer the automated emails you 
sent him regarding the regression he reported...

Humans react differently to programs than to humans interacting with 
them, and tons of automated mails without any actual efforts by humans 
can easily be considered a non-friendly act.

But for this bug I now found the commit that fixed it back in May.

Is there any specific reason why your automated emails only go to the 
submitters but not to the maintainers of the code in question? If you 
had Cc'ed Kyle once during the last 46 days he might have remembered 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 2:47 pm

Yes, there is.  We'd have to add special annotations to bug reports for that

He might have looked at the regression reports just as well.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 2:56 pm

BTW, the automated emails I'm sending are to let the reporters know that I'm
interested in the current status of the bug.  They are free not to reply to
them, but in that case I assume they don't really care whether or not I'm
tracking the bugs they reported.

Thanks,
Rafael
--

From: Rene Herman
Date: Sunday, July 6, 2008 - 3:10 pm

I did/do wonder by the way when I get them if I should be replying if 
the status is unchanged from my viewpoint...

I believe your automated emails say something like "please verify if 
this problem is still relevant" but don't spell out what do after you
verified that it is. It's sort of natural to take that as "I need to 
reply telling people it's fixed if it is but can remain silent if 
nothing changed".

Being more explicit about liking a reporter to report "yes, nothing 
changed" would probably be good if that IS what's wanted.

Rene.


--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 3:57 pm

The exact wording is

"The following bug entry is on the current list of known regressions

Well, I can change it to

"Please verify if it still should be listed and let me know."

if that's better.

Thanks,
Rafael
--

From: Rene Herman
Date: Sunday, July 6, 2008 - 4:05 pm

Or even "please verify that it should still be listed, and let me know 
either way". Simple "yes" replies feel a little silly, to non-frequent 
posters at least, without that kind of explicit encouragement so it 
might make for a few more replies.

Rene.
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 4:15 pm

Okay, I will do this.

Thanks,
Rafael
--

From: Nigel Cunningham
Date: Sunday, July 6, 2008 - 4:11 pm

Hi Rafael etc.


I would suggest that you should assume it's still relevant until the
bugzilla entry gets closed. The person fixing the bug should be
responsible for modifying the report to say that a patch is available
and then has been merged (or for saying it's an invalid report etc).

This way, you're making the whole process less burdensome rather than
so.

Regards,

Nigel

--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 3:07 pm

Then make a wild guess.

Long ago when I was doing regression tracking I sometimes added a dozen 
addresses from 3 different MAINTAINERS entry for one bug report for 
getting emails to all involved parties.

Not sure whether Bugzilla is the right place for maintaining this 
information, but even if it takes a few minutes to add it to 
all regression reports when sending the emails it's in my experience 

There is a huge difference between offering information for maintainers 
that are actively searching for it and regularly annoying maintainers 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Linus Torvalds
Date: Sunday, July 6, 2008 - 2:47 pm

Adrian - please just stop this.

*DEVELOPERS* are human too.

And if you cannot accept the fact that developers need feedback as well as 
reporters - a simple "is it still a problem" report - then why the *hell* 

Exactly. A _lot_ of problems are fixed independently of a specific 
bug-report, either because others reported the issue too (and people 
didn't necessarily even realize that it was the same problem), or because 

Is there any specific reason that you have been complaining about this for 
A HELL OF A LONG TIME, without ever actually listening to what people like 
me tell you, over and over and over again?

And no, this email wasn't autogenerated. But you seem to never react to 
this.

Bug-reports absolutely *have* to be closed if they don't get minimal 
feedback, including just a "it's still a problem".

Just accept it, Adrian.

		Linus
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 3:19 pm

When did you tell me that maintainers should not or cannot be Cc'ed on 
regression reports?

Sorry, but I honestly don't remember this.

You did not complain when I did this when I was doing regression 
tracking, and if you explained to me why it's now no longer wanted
I must have completely missed it.


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Linus Torvalds
Date: Sunday, July 6, 2008 - 3:27 pm

That is not what I'm complaining about.

I'm complaining about the fact that you *always* argue against closing 
bugreports.

You have argued against it for over a YEAR now. And every single time I 
tell you that you are wrong, and exactly *why* you are wrong.

If a reporter doesn't respond to say "it's still open", it needs to be 
closed. It doesn't matter one whit whether there has been developer action 
on it or not. We cannot keep old reports open - it's a total waste for 
developers to even _look_ at anything that is more than roughly a month 
old and hasn't been verified to be still be an issue.

			Linus
--

From: Adrian Bunk
Date: Sunday, July 6, 2008 - 4:06 pm

I'm not always against closing bugs, and e.g. during the last years I've
closed at about 500 bugs in the kernel Bugzilla due to submitters having

We only differ on whether a human should ask this question once before 
closing a bug or whether regular automated requests are enough.

E.g. although Andrew has't responded to Rafaels emails for nearly a 
month whether the slab corruption he reported is still present I 
wouldn't take this as a definitive indication that he won't answer when 
someone has a question. I'd bet Andrew will answer if a human asks him 
about the status of this regression.

A developer asking manually "Is this still present?" does cost nearly no 
time and gives the submitter a much better feeling than only automated 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Johannes Weiner
Date: Sunday, July 6, 2008 - 4:55 pm

Hi,


I prefer being bugged regularly as a reporter.  At the moment I have a
bug at a machine I do not use every day and if I get this email, it
reminds me to test the latest kernel on that machine (or try to
reproduce the bug if it happens in situations not common in my usual
workflow).  Then I report back.

If these remainders weren't, it would be possible that I forget about a
bug and come back to it when it's a real pain to hunt it down by
change-history or when a possible cause for the bug has left the
developers mind a long time ago.

	Hannes
--

From: Adrian Bunk
Date: Monday, July 7, 2008 - 1:58 am

It often depends on the kind of bug.

E.g. if you reported "my main computer crashes twice a day" you would be 
more interested to see some developer actually working on it before 

You get me wrong.

I'm not saying the automated reminders should vanish.

But before closing a bug as "reported does not respond" IMHO a manual 
request should be done first.

Otherwise we get people into the "I reported a bug, got only automated 
emails, and the only action by the developers was to close the bug once 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: James Bottomley
Date: Sunday, July 6, 2008 - 9:11 am

Oh ... it was reported against 2.6.26-rc1, so the fix is actually
crrently queued for scsi-misc in my internal test rig.  However, the
reason for the long QA is that there's a potential danger to the fix
which is clearing the buffer from the reported length to the actual
length will break a SCSI card that misreports the returned length.  It
seems to be OK on my esoteric SCSI card collection, so I'll push it out
to scsi-rc-fixes.  However, I think it still wants to run in linux-next
for a few days to see if anyone else can turn up a problem.

The true issue, of course, is that we won't see a problem until the
2.6.26 release because the hardware that triggers it isn't in the set
we're testing with.  A less risky fix for 2.6.26-rc9 might be to move
the clearing into the affected subsystem (usbstorage).

James


--

From: Ingo Molnar
Date: Sunday, July 6, 2008 - 9:58 am

I'm not sure it's directly related to SRCU - it can change timings and 
freeing patterns enough to tickle other bugs. Since Alexey Dobriyan has 
reported it - are perhaps namespaces in use during this stress-test? 
Maybe it's some namespaces related bug that is more easily reproduced 
under SRCU - namespaces is not a commonly tested feature.

Also, i've been running rcutorture stress-tests on a number of 
test-systems ever since this got reported (and they are running 
currently as well) and cannot see it - neither could Paul reproduce it.

( and Paul is very good in producing RCU related problems - he's
  triggered and fixed many RCU related problems that no-one else saw
  before. )

	Ingo
--

From: Paul E. McKenney
Date: Friday, August 1, 2008 - 2:09 pm

I have CONFIG_NAMESPACES=y.  Should I also set one or more of
CONFIG_UTS_NS, CONFIG_IPC_NS, CONFIG_USER_NS, or CONFIG_PID_NS?

What the heck, I will just set them all and pound away on kernbenchx170
and rcutorture.

--

From: Pavel Machek
Date: Monday, July 7, 2008 - 10:37 am

Hmm, but that would mean whole system is slow, right?

I'd bet this is ACPI EC problem...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Maximilian Engelhardt
Date: Monday, July 7, 2008 - 11:26 am

I didn't notice anything that my system is slow. Also I think this patch go=
t=20
included in 2.6.26-rc9 but I still have this problem with it.

Maxi
From: Paul E. McKenney
Date: Friday, August 1, 2008 - 2:09 pm

Alexey tested the above patch, and it did not fix his failure
(http://lkml.org/lkml/2008/6/15/93).  Neither did the patch
at http://lkml.org/lkml/2008/6/14/209.  I was never able to
reproduce Alexey's failure, whether by running LTP in parallel
with 170 kernel builds or by running either in parallel with
rcutorture.  Some enhancements to make rcutorture more vicious
were unable to provoke failures.

Alexey is able to provoke the failure on a maxcpus=1 configuration,
which should narrow things down quite a bit.  I dug through
assembly, and found no issues at that level.

Alexey, would you be willing to send along your vmlinux or disassembly
of the RCU functions?


PREEMPT_RCU is already marked "default n" with a "Say N if you are
unsure.  Shouldn't that cover it?

I don't believe that SRCU is involved, please let me know if I missed
something.

Nick Piggin mentioned seeing failures similar to Alexey's, and I still
need his repeat-by.  Nick?

							Thanx, Paul
--

From: Linus Torvalds
Date: Sunday, July 6, 2008 - 2:54 pm

The revert that was confirmed by Andrey to fix this regression is now 
committed as 09ca8adbe9f724a7e96f512c0039c4c4a1c5dcc0.

		Linus
--

From: Rafael J. Wysocki
Date: Sunday, July 6, 2008 - 3:00 pm

Thanks, I have closed the bug.

Rafael
--

From: Benjamin Herrenschmidt
Date: Sunday, July 6, 2008 - 5:51 pm

Still can't reproduce that one, waiting to get access to Balbir's
machine.

Cheers,
Ben.


--

Previous thread: [PATCH] kconfig: fix typos: "Suport" -> "Support" by Heikki Orsila on Sunday, July 6, 2008 - 5:48 am. (3 messages)

Next thread: [PATCH] scsi_cmnd.h: remove double inclusion of linux/blkdev.h by Alexander Beregalov on Sunday, July 6, 2008 - 6:01 am. (2 messages)