Re: 2.6.26-rc6-git2: Reported regressions from 2.6.25

Previous thread: none

Next thread: 2.6.25.4-rt6 doesn't build with RT_GROUP_SCHED && !SMP by Adam Sampson on Saturday, June 14, 2008 - 2:10 pm. (1 message)
From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:04 pm

This message contains a list of some regressions from 2.6.25, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.25, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-06-14      130       37          28
  2008-06-07      125       48          33
  2008-05-31      115       52          31
  2008-05-24       94       47          28
  2008-05-18       80       51          37
  2008-05-11       53       46          34


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10912
Subject		: Regressions in the last kernels
Submitter	: werner <werner@sys-linux.yi.org>
Date		: 2008-06-14 18:26 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121346933911641&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10908
Subject		: IPF Montvale machine panic when running a network-relevent testing
Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date		: 2008-06-13 8:19 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121334523711437&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10906
Subject		: repeatable slab corruption with LTP msgctl08
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-06-12 5:13 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121324775927704&w=4
Handled-By	: Pekka J Enberg <penberg@cs.helsinki.fi>
		  Christoph Lameter <clameter@sgi.com>
		  Manfred Spraul <manfred@colorfullife.com>
		  Andi Kleen <andi@firstfloor.org>


Bug-Entry	: ...
From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:04 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10493
Subject		: mips BCM47XX compile error
Submitter	: Adrian Bunk <adrian.bunk@movial.fi>
Date		: 2008-04-20 17:07 (56 days old)
References	: http://lkml.org/lkml/2008/4/20/34
		  http://lkml.org/lkml/2008/5/12/30
		  http://lkml.org/lkml/2008/5/18/131
		  http://lkml.org/lkml/2008/5/31/202
		  http://lkml.org/lkml/2008/6/7/154
Patch		: http://marc.info/?l=linux-kernel&m=120876451216558&w=2


--

From: Adrian Bunk
Date: Saturday, June 14, 2008 - 2:26 pm

cu
Adrian
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-05 09:59 (41 days old)
References	: http://lkml.org/lkml/2008/5/5/28
Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10711
Subject		: BUG: unable to handle kernel paging request - scsi_bus_uevent
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-05-14 11:23 (32 days old)
References	: http://lkml.org/lkml/2008/5/14/111


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10642
Subject		: general protection fault: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-05-07 16:03 (39 days old)
References	: http://lkml.org/lkml/2008/5/7/48


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10714
Subject		: Badness seen on 2.6.26-rc2 with lockdep enabled
Submitter	: Balbir Singh <balbir@linux.vnet.ibm.com>
Date		: 2008-05-14 12:57 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121076917429133&w=4


--

From: Adrian Bunk
Date: Saturday, June 14, 2008 - 4:04 pm

Benjamin, you said you wanted to have a look at this?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Benjamin Herrenschmidt
Date: Saturday, June 14, 2008 - 4:29 pm

Ah yes, slipped out of my mind. It's probably a missing annotation in
the RTAS code, I'll have a look tomorrow.

Thanks,
Ben.


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10724
Subject		: ACPI: EC: GPE storm detected, disabling EC GPE
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-05-16 6:17 (30 days old)
References	: http://marc.info/?l=linux-kernel&m=121091875711824&w=4
		  http://lkml.org/lkml/2008/5/18/168
		  http://lkml.org/lkml/2008/5/25/195
		  http://lkml.org/lkml/2008/5/25/195
Patch		:  debug EC GPE
		   debug EC GPE
		   debug EC GPE
		   debug EC GPE
		  http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view
		   debug EC GPE</a>


--

From: Justin Mattock
Date: Saturday, June 14, 2008 - 3:29 pm

I've just pulled the latest git, and applied the latest patch that was
sent to me, I'm not seeing this message at the moment.
I am unsure if the problem is fixed or not.


-- 
Justin P. Mattock
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10726
Subject		: x86-64 NODES_SHIFT compile failure.
Submitter	: Dave Jones <davej@codemonkey.org.uk>
Date		: 2008-05-16 12:54 (30 days old)
References	: http://lkml.org/lkml/2008/5/16/312
Handled-By	: Mike Travis <travis@sgi.com>
Patch		: http://lkml.org/lkml/2008/5/16/343


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10730
Subject		: build issue #503 for v2.6.26-rc2-433-gf26a398 : undefined reference to `request_firmware'
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-05-16 17:06 (30 days old)
References	: http://marc.info/?l=linux-kernel&m=121095777616792&w=4
Handled-By	: James Bottomley <James.Bottomley@HansenPartnership.com>
Patch		: http://marc.info/?l=linux-scsi&m=121101871800303&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10741
Subject		: bug in `tty: BKL pushdown'?
Submitter	: Johannes Weiner <hannes@saeurebad.de>
Date		: 2008-05-18 2:16 (28 days old)
References	: http://marc.info/?l=linux-kernel&m=121107706506181&w=4
Handled-By	: Alan Cox <alan@lxorguk.ukuu.org.uk>


--

From: Johannes Weiner
Date: Monday, June 16, 2008 - 3:03 am

Hi,


The bug still exists, however, a bisect on another machine with the same
userland leads to different commit
(47f86834bbd4193139d61d659bebf9ab9d691e37 "redo locking of tty->pgrp"),
so it is not all that clear and stable.

I will investigate further but the entry should probably stay for now.

	Hannes
--

From: Alan Cox
Date: Monday, June 16, 2008 - 3:33 am

Now that would actually make a lot more sense as a root cause.
--

From: Alan Cox
Date: Monday, June 16, 2008 - 4:46 am

On Mon, 16 Jun 2008 11:33:13 +0100

Experiment time. In _proc_set_tty() in tty_io.c move the

	tty->session = get_pid(task_session(tsk));

back inside the lock just before

	tty->pgrp = get_pid(task_pgrp(tsk));

Alan
--
  "Standards committees don't like hashing.  It looks complicated and
    insufficiently deterministic on an overhead projector."
					- Vern Schryver
--

From: Johannes Weiner
Date: Monday, June 16, 2008 - 8:33 am

Hi,


Like this:?

spin_lock()
put_pid()
put_pid()
tty->session =
tty->pgrp =
spin_unlock()

That does not fix it.

	Hannes
--

From: Alan Cox
Date: Monday, June 16, 2008 - 11:22 am

Thanks. That rules out the one case I could see that might have pointed
to a potential bug.

Alan
--

From: Johannes Weiner
Date: Thursday, June 19, 2008 - 4:06 am

Hi,


The second bisection was the wrong one, sorry for the confusion.

I tried again (manually) and the result is (still) this:

Everything fine with HEAD at e5238442 "serial_core: Prepare for BKL push
down".
  
Weird behaviour as described with HEAD at 04f378b19 "tty: BKL pushdown".

	Hannes
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10725
Subject		: Write protect on on
Submitter	: Maciej Rutecki <maciej.rutecki@gmail.com>
Date		: 2008-05-16 14:55 (30 days old)
References	: http://marc.info/?l=linux-kernel&m=121095168003572&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10764
Subject		: some serial configurations are now broken
Submitter	: Russell King <rmk+lkml@arm.linux.org.uk>
Date		: 2008-05-20 7:35 (26 days old)
References	: http://marc.info/?l=linux-kernel&m=121126931810706&w=2
Handled-By	: Javier Herrero <jherrero@hvsistemas.es>
		  Russell King <rmk+lkml@arm.linux.org.uk>


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10786
Subject		: 2.6.26-rc3 64bit SMP does not boot on J5600
Submitter	: Domenico Andreoli <cavokz@gmail.com>
Date		: 2008-05-22 16:14 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121147328028081&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10799
Subject		: sky2 general protection fault
Submitter	: Nicolas Mailhot <Nicolas.Mailhot@LaPoste.net>
Date		: 2008-05-26 11:05 (20 days old)


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10794
Subject		: mips: CONF_CM_DEFAULT build error
Submitter	: Adrian Bunk <adrian.bunk@movial.fi>
Date		: 2008-05-25 10:11 (21 days old)
References	: http://lkml.org/lkml/2008/5/25/168
		  http://lkml.org/lkml/2008/6/11/295
Patch		: http://lkml.org/lkml/2008/6/1/125


--

From: Adrian Bunk
Date: Saturday, June 14, 2008 - 2:24 pm

cu
Adrian
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10815
Subject		: 2.6.26-rc4: RIP find_pid_ns+0x6b/0xa0
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-27 09:23 (19 days old)
References	: http://lkml.org/lkml/2008/5/27/9
		  http://lkml.org/lkml/2008/6/14/87
Handled-By	: Oleg Nesterov <oleg@tv-sign.ru>
		  Linus Torvalds <torvalds@linux-foundation.org>
		  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Patch		: http://lkml.org/lkml/2008/5/28/16


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10819
Subject		: Fatal DMA error with b43 driver since 2.6.26
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-05-29 13:16 (17 days old)


--

From: Michael Buesch
Date: Saturday, June 14, 2008 - 2:34 pm

This regression is fixed by
21691a38db9d465a109c5ec25cd3956a18cfcf5d

Author: Michael Buesch <mb@bu3sch.de>  2008-06-12 15:33:13
Committer: John W. Linville <linville@tuxdriver.com>  2008-06-14 01:18:58
Parent: 9983f35f12b8be71d13b8aca6dbf781d3342c7aa (rt2x00: LEDS build failure)
Child:  33593dbf334869456167bc66511bc54c4ba39dc5 (mac80211 : fix for iwconfig in ad-hoc mode)
Branches: master, remotes/origin/master
Follows: merge-2008-06-14
Precedes: master-2008-06-14

    ssb: Fix coherent DMA mask for PCI devices
    
    This fixes setting the coherent DMA mask for PCI devices.
    
    Signed-off-by: Michael Buesch <mb@bu3sch.de>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

------------------------------ drivers/ssb/main.c ------------------------------
index 7cf8851..d184f2a 100644
@@ -1168,15 +1168,21 @@ EXPORT_SYMBOL(ssb_dma_translation);
 int ssb_dma_set_mask(struct ssb_device *ssb_dev, u64 mask)
 {
 	struct device *dma_dev = ssb_dev->dma_dev;
+	int err = 0;
 
 #ifdef CONFIG_SSB_PCIHOST
-	if (ssb_dev->bus->bustype == SSB_BUSTYPE_PCI)
-		return dma_set_mask(dma_dev, mask);
+	if (ssb_dev->bus->bustype == SSB_BUSTYPE_PCI) {
+		err = pci_set_dma_mask(ssb_dev->bus->host_pci, mask);
+		if (err)
+			return err;
+		err = pci_set_consistent_dma_mask(ssb_dev->bus->host_pci, mask);
+		return err;
+	}
 #endif
 	dma_dev->coherent_dma_mask = mask;
 	dma_dev->dma_mask = &dma_dev->coherent_dma_mask;
 
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL(ssb_dma_set_mask);
 



-- 
Greetings Michael.
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10821
Subject		: rt25xx: lock dependency warning, association failure, and kmalloc corruption
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-05-29 14:30 (17 days old)


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10826
Subject		: NFS oops in 2.6.26rc4
Submitter	: Dave Jones <davej@redhat.com>
Date		: 2008-05-27 19:04 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=121191548915522&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10827
Subject		: 2.6.26rc4 GFS2 oops.
Submitter	: Dave Jones <davej@redhat.com>
Date		: 2008-05-27 15:44 (19 days old)
References	: http://lkml.org/lkml/2008/5/27/297


--

From: Adrian Bunk
Date: Sunday, June 22, 2008 - 2:09 am

Dave, what is the status of this bug?

It's currently listed as a 2.6.26-rc regression.

Is it actually confirmed that 2.6.25 is fine?

According to the thread of the bug report there should now be a bug 
report in the Red Hat Bugzilla for it. Bug number?

Thanks
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Bob Peterson
Date: Monday, June 23, 2008 - 7:40 am

Hi,

This appears to be a known bug.  There's a Fedora bugzilla record for
it here, which contains a patch to fix the problem:

https://bugzilla.redhat.com/show_bug.cgi?id=448866

The bug does not appear to be in 2.6.25; 2.6.25 is fine afaict.

Regards,

Bob Peterson
Red Hat GFS


--

From: Adrian Bunk
Date: Monday, June 23, 2008 - 8:14 am

Yup, the patch in your Bugzilla is for code that is new in 2.6.26.

Can you push your patch for inclusion into 2.6.26 so that 2.6.26 won't 

Thanks
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Bob Peterson
Date: Monday, June 23, 2008 - 8:32 am

Hi Adrian,

Unfortunately, I cannot.  All access to the gfs2 "-nmw" git tree is
controlled by Steve Whitehouse, and he is on vacation/holiday until
tomorrow.

I've submitted the patch to cluster-devel, so hopefully he'll push it
as soon as he returns tomorrow.

Regards,

Bob Peterson
Red Hat GFS


--

From: Rafael J. Wysocki
Date: Monday, June 23, 2008 - 10:02 am

You can post the patch in this thread, with CC to Andrew Morton.

Thanks,
Rafael
--

From: Adrian Bunk
Date: Monday, June 23, 2008 - 10:05 am

If Steve is on vacation only until tomorrow there's not a need to bypass

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10828
Subject		: [2.6.25-git18 => 2.6.26-rc1-git1] Xorg crash with xf86MapVidMem error
Submitter	: Rufus & Azrael <rufus-azrael@numericable.fr>
Date		: 2008-05-04 10:24 (42 days old)
References	: http://lkml.org/lkml/2008/5/4/37
Handled-By	: Ingo Molnar <mingo@elte.hu>
		  H. Peter Anvin <hpa@zytor.com>
		  Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
Patch		: http://lkml.org/lkml/2008/5/29/371


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10843
Subject		: Display artifacts on XOrg logout with PAT kernel and VESA framebuffer
Submitter	: Frans Pop <elendil@planet.nl>
Date		: 2008-05-31 14:04 (15 days old)
References	: http://lkml.org/lkml/2008/6/7/206


--

From: Romano Giannetti
Date: Sunday, June 15, 2008 - 3:22 am

It happens to me too. Do you see
http://bugzilla.kernel.org/show_bug.cgi?id=10892 ?

Romano


-- 
Romano Giannetti                            Dep. de Electrónica y Automática
http://www.dea.icai.upcomillas.es/romano    Univ. Pontificia Comillas (MADRID)


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 
--

From: Rafael J. Wysocki
Date: Sunday, June 15, 2008 - 3:40 am

Do these two bug entries refer to the same problem?

Rafael
--

From: Frans Pop
Date: Sunday, June 15, 2008 - 4:25 am

Exactly what happens to you to? Do you also see the artifacts?
Do you also use vesafb? Do the artifacts go away if you boot 
with 'video=vfb:off' to disable the framebuffer? Do they go away if you 


Looks unrelated to me.

Cheers,
FJP
--

From: Romano Giannetti
Date: Sunday, June 15, 2008 - 4:35 am

Yes, maybe you're right. Could not test too, booting with nopat gave me
two times in a row the black screen. 

Mmhhh....

Romano 
-- 
Romano Giannetti                            Dep. de Electrónica y Automática
http://www.dea.icai.upcomillas.es/romano    Univ. Pontificia Comillas (MADRID)


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 
--

From: Romano Giannetti
Date: Sunday, June 15, 2008 - 4:26 am

I do not know, just a wild guess. I noticed the flashing color when
doing shutdown, and being Intel the common chipset...

-- 
Romano Giannetti                            Dep. de Electrónica y Automática
http://www.dea.icai.upcomillas.es/romano    Univ. Pontificia Comillas (MADRID)


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 
--

From: Adrian Bunk
Date: Sunday, June 15, 2008 - 3:39 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Romano Giannetti
Date: Sunday, June 15, 2008 - 4:25 am

Dunno. Do I need to recompile or there is some boot option to disable
it?

Rmano
-- 
Romano Giannetti                            Dep. de Electrónica y Automática
http://www.dea.icai.upcomillas.es/romano    Univ. Pontificia Comillas (MADRID)


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 
--

From: Adrian Bunk
Date: Sunday, June 15, 2008 - 5:23 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Siddha, Suresh B
Date: Sunday, June 15, 2008 - 12:29 pm

Frans, With or with out pat, in the recent kernels (like 2.6.26-rc4/rc5 etc), ioremap()
uses  UC- and PCI mmap of /sys/devices/pci.../resource (used by X) uses UC-

And fb_mmap() also uses UC-.

It's interesting that you don't see this artifact with "nopat". Essentially with
or with out pat enabled, we use the same memory attributes. So depending on the
MTRR setup (set by X server), effective memory attribute across different mappings
should be same (which is UC- or WC with mtrr).

Can you also check, if there is any impact with kernel boot param for vesafb "mtrr:3"?

thanks,
suresh
--

From: Frans Pop
Date: Sunday, June 15, 2008 - 4:02 pm

Hello Suresh. Thanks for responding.

I've done 4 successive boots with the boot parameters as shown below.
Each boot was basically: set correct parameters in grub -> login to KDE
                         -> reboot and check for artifacts.

1) (none)                      --> clean
2) vga=791                     --> artifacts
3) vga=791 nopat               --> clean
4) vga=791 video=vesafb:mtrr:3 --> artifacts

So the mtrr option did not help (if I passed it correctly; the double ":"
is somewhat non-intuitive). The kernel log also does not show any
difference I can see in the last boot, but I don't know if the mtrr option
is supposed to show up in any way.

From the kernel log for each boot:
1) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
   Console: colour VGA+ 80x25
   console [tty0] enabled

2) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
   Console: colour dummy device 80x25
   console [tty0] enabled
   vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
   vesafb: mode is 1024x768x16, linelength=2048, pages=4
   vesafb: scrolling: redraw
   vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
   Console: switching to colour frame buffer device 128x48
   fb0: VESA VGA frame buffer device

3) PAT support disabled
   Console: colour dummy device 80x25
   console [tty0] enabled
   vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
   vesafb: mode is 1024x768x16, linelength=2048, pages=4
   vesafb: scrolling: redraw
   vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
   Console: switching to colour frame buffer device 128x48
   fb0: VESA VGA frame buffer device

4) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
   Console: colour dummy device 80x25
   console [tty0] enabled
   vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
   vesafb: mode is 1024x768x16, linelength=2048, pages=4
   vesafb: ...
From: Suresh Siddha
Date: Sunday, June 15, 2008 - 5:41 pm

If the initlevel is '3', then the mtrr option will show up in /proc/mtrr
otheriwse not. In init level '5', X server will add the mtrr (irrespective
of boot option, if it's not already there) and will remove it when the X process
completes its execution.

Can you also please try if "mtrr:1" makes any difference. This will setup the
mapping as UC during boot. Apart from PAT WC mapping(which we shouldn't be using
in your current setup), UC MTRR should override all the other PAT mappings and
should be consistent across X and VT console mappings. As such, if the
problem is because of improper aliasing, then with this UC MTRR,
my understanding is that we shouldn't see any artifacts with the "mtrr:1".

with this mtrr:1, we should now see a UC mtrr setting in /proc/mtrr.

thanks,
suresh
--

From: Frans Pop
Date: Monday, June 16, 2008 - 3:53 am

That was a useful pointer. I do see some differences when I compare

mtrr:1 still gives the artifacts and no any difference to /proc/mtrr.

Here's /proc/cmdline + /proc/mtrr for three different boots:
root=/dev/mapper/main-root ro vga=791 quiet
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x7f800000 (2040MB), size=   8MB: uncachable, count=1
reg02: base=0x7f700000 (2039MB), size=   1MB: uncachable, count=1
reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1

root=/dev/mapper/main-root ro vga=791 quiet video=vesafb:mtrr:1
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x7f800000 (2040MB), size=   8MB: uncachable, count=1
reg02: base=0x7f700000 (2039MB), size=   1MB: uncachable, count=1
reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1

root=/dev/mapper/main-root ro vga=791 quiet video=vesafb:mtrr:3
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x7f800000 (2040MB), size=   8MB: uncachable, count=1
reg02: base=0x7f700000 (2039MB), size=   1MB: uncachable, count=1
reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1


I do see some differences in Xorg logs, so it does seem that the mtrr
options _are_ being recognized.
Attached my "normal" Xorg log (with 'vga=791') which I used as the base
for the diffs below. Other than shown, the logs are identical.

With mtrr:1 I get (added at the end of the log):
@@ -688,3 +688,11 @@
 (II) evaluating device (Generic Keyboard)
 (II) XINPUT: Adding extended input device "Generic Keyboard" (type: KEYBOARD)
 (II) Configured Mouse: ps2EnableDataReporting: succeeded
+(II) intel(0): xf86UnbindGARTMemory: unbind key 0
+(II) intel(0): xf86UnbindGARTMemory: unbind key 1
+(II) intel(0): xf86UnbindGARTMemory: unbind key 2
+(II) intel(0): xf86UnbindGARTMemory: unbind key 3
+(II) intel(0): xf86UnbindGARTMemory: unbind key 4
+(II) intel(0): [drm] removed 1 reserved context for kernel
+(II) ...
From: Frans Pop
Date: Monday, June 16, 2008 - 4:07 am

Oops. Just realized that this is completely bogus. I used the .old log for 
this one while I used logs for still running Xorg sessions for the 
others. So this was actually the only one that contains Xorg shutdown 

This is still valid though.

Sorry for the confusion.
--

From: Frans Pop
Date: Monday, June 23, 2008 - 5:38 am

Any progress on this issue? It's still there with -rc7, but I doubt that 
comes as a surprise.

Has anyone tried to reproduce this? I would think that should be trivial.

Just as a summary:
- Intel 82945G/GZ graphics [8086:2772] (ICH7 based system)
- FB_VESA=y, FRAMEBUFFER_CONSOLE=y
- boot with vga=791
- Log in to X and KDE; I do need to really log in there are no artifacts
  if I exit X from the kdm login dialog
- artifacts show on logout

I doubt it's KDE related or even related to my specific graphics card.

It may well be related to what is or has been displayed on the display 
before logging out, so running some apps may make sense. Seems like I do 
see remnants of for example aptitude (Debian apt frontend) after I've run 
it in an X term (KDE's konsole).

Cheers,
FJP
--

From: Suresh Siddha
Date: Tuesday, June 24, 2008 - 4:22 pm

FJP, We will try to reproduce this and getback. Your earlier responses did not
give many clues.

thanks,
suresh
--

From: Frans Pop
Date: Friday, September 12, 2008 - 3:54 am

Bug-Entry  : http://bugzilla.kernel.org/show_bug.cgi?id=10843


Hello all,

I'd like to bring this issue to your attention once again as it is still 
present in 2.6.27-rc6.

Note also that I can trivially reproduce exactly the same behavior on 
three rather different systems. The artifacts even look similar and in 
all cases they disappear with 'nopat'.

The systems are:
Toshiba Satellite A40 laptop:
- Intel 82852/855GM Integrated Graphics Device [8086:3582]
- ICH4 based
- Mobile Intel Pentium 4 processor, i386 kernel
Intel desktop system:
- Intel 82945G/GZ Integrated Graphics Controller [8086:2772]
- ICH7 based
- Pentium D processor, x86_64 kernel
HP Compaq 2510p laptop:
- Intel Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02]
- ICH8 based
- Core2 Duo processor, x86_64 kernel


Well, unfortunately I can only provide the info you ask for :-)

Cheers,
FJP
--

From: Pallipadi, Venkatesh
Date: Friday, September 12, 2008 - 5:43 am

Hi,

What does the output of x86/pat_memtype_list under debugfs look like?

You may need the following if you are not already mounting debugfs.
mount -t debugfs debugfs /proc/sys/debug
cat /proc/sys/debug/x86/pat_memtype_list

Thanks,
Venki
--

From: Frans Pop
Date: Friday, September 12, 2008 - 6:33 am

I've attached that file and dmesg output for two of the machines.

Cheers,
FJP

From: Pallipadi, Venkatesh
Date: Friday, September 12, 2008 - 9:05 am

OK. This is the same issue as the one on this thread here
http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/1532.html


We have too many entires in PAT list due to RAM pages being marked
UC by drivers. Unfortunately, there are no quick fixes that can fix
this for 2.6.27. However, we are aware of the problem here and
working on a more complete fix for this. We should have the patch
for it soon.

Thanks,
Venki
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10830
Subject		: two different oopses with 2.6.26-rc4
Submitter	: Alejandro Riveira Fernández <alejandro.riveira@gmail.com>
Date		: 2008-05-28 9:50 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121196833026310&w=4
Handled-By	: Johannes Berg <johannes@sipsolutions.net>
		  Andrew Morton <akpm@linux-foundation.org>
		  Peter Zijlstra <a.p.zijlstra@chello.nl>
Patch		: http://lkml.org/lkml/2008/5/20/683


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10860
Subject		: total system freeze at boot with 2.6.26-rc
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-06-05 12:38 (10 days old)


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10861
Subject		: 2.6.26-rc4-git2 - long pause during boot
Submitter	: Chris Clayton <chris2553@googlemail.com>
Date		: 2008-06-01 4:15 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=121229382917834&w=4


--

From: Chris Clayton
Date: Saturday, June 14, 2008 - 11:12 pm

Well, we know how to create (and to avoid) the problem. Since the original patch 
adversely affects existing user space, it could be argued that it is a 
regression, but I see that there is a counter argument that the udev rule that 
triggers the problem is quite simply a bad rule.

I'll leave to those more closely concerned that rule on that.

Thanks

Chris

-- 
Beauty is in the eye of the beerholder.
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10865
Subject		: i get the following oops trying to mount an ntfs partition on thinkpad
Submitter	: Alex Romosan <romosan@sycorax.lbl.gov>
Date		: 2008-06-05 14:47 (10 days old)
References	: http://marc.info/?l=linux-kernel&m=121267834421414&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10864
Subject		: [regression][bisected] ~90,000 wakeups as of 2.6.26-rc3
Submitter	: Németh Márton <nm127@freemail.hu>
Date		: 2008-06-03 5:18 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=121247101601790&w=4


--

From: Németh Márton
Date: Saturday, June 14, 2008 - 11:30 pm

Hi,

I already mentioned at the bug report that 2.6.26-rc6 this is fixed.

Maybe tell your robot to first check the latest activities in the bug
report since the last -rc release. What I want also to tell your robot
that it should mention what actions should be taken in case the bug
should be still listed or when the bug can be closed.

Regards,

	Márton Németh


--

From: Ingo Molnar
Date: Sunday, June 15, 2008 - 12:50 am

i think the current regression tracking methods that Rafael uses work 
very well and i'd like to thank Rafael for those efforts - to me as a 
subsystem maintainer it is a _very_ useful thing.

In this case there was no real harm from the "this bug is already fixed" 
condition - just an extra email. Real harm would only come from missed 
regressions or from incorrectly closed regressions - but those are not 
happening.

note that there is no "robot" involved in changing the state of bugs - 
the real important work here is done by Rafael and checking whether a 
bug is still relevant is an inevitably manual work. The mails and 
reports are auto-generated but crawling discussions and determining the 
status of a regression is very hard to automate.

	Ingo
--

From: Németh Márton
Date: Sunday, June 15, 2008 - 1:54 am

Sorry, I thought a robot missed my comments in the bug tracking system for
the second time:

1.
Comment was on 2008-06-06 13:27:16 ( http://bugzilla.kernel.org/show_bug.cgi?id=10864#c3 )
The mail was coming: 7 Jun 2008 22:42:57 +0200 (CEST) ( http://lkml.org/lkml/2008/6/7/193 )

2.
Comment was on 2008-06-13 23:19:53 ( http://bugzilla.kernel.org/show_bug.cgi?id=10864#c4 )
The mail was coming: 14 Jun 2008 22:12:04 +0200 (CEST) ( http://lkml.org/lkml/2008/6/14/160 )

Nevertheless the bug #10864 can be closed I think.

Regards,

	Márton Németh
--

From: Rafael J. Wysocki
Date: Sunday, June 15, 2008 - 3:48 am

Closed now.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10862
Subject		: forcedeth: lockdep warning on ethtool -s
Submitter	: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Date		: 2008-06-01 8:37 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=121230964032247&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10868
Subject		: Oops on loading ipaq module since 2.6.26, prevents use of device
Submitter	: Adam Williamson <awilliamson@mandriva.com>
Date		: 2008-06-05 17:39 (10 days old)
Handled-By	: Alan Cox <alan@redhat.com>


--

From: Alan Cox
Date: Saturday, June 14, 2008 - 2:26 pm

On Sat, 14 Jun 2008 22:12:04 +0200 (CEST)

Still waiting for the actual attached result of the test patch to debug
this further. Guess it will miss 2.6.26
--

From: Adam Williamson
Date: Saturday, June 14, 2008 - 11:44 pm

I replied via email - as you requested earlier in the thread - and
attached the result to that email. Did you not get it?
-- 
adamw

--

From: Alan Cox
Date: Monday, June 16, 2008 - 2:10 am

Andrew may have asked you to use email not me.

The last I have is "Okay, output with the patch is attached. Thanks for your help"

only the output in question isn't attached to the bug ?

Alan

--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10866
Subject		: /dev/rtc was missing until I disabled CONFIG_RTC_CLASS
Submitter	: Lior Dotan <liodot@gmail.com>
Date		: 2008-06-05 15:04 (10 days old)
References	: http://marc.info/?l=linux-kernel&m=121267834521432&w=4
		  http://marc.info/?l=linux-kernel&m=121267834521432&w=4
		  http://marc.info/?l=linux-kernel&m=121267834521432&w=4
		  http://marc.info/?l=linux-kernel&m=121267834521432&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10892
Subject		: Sometime (often) X come out blank (black screen) on cold boot - Intel chipset
Submitter	: Romano Giannetti <romano.giannetti@gmail.com>
Date		: 2008-06-10 05:33 (5 days old)
References	: http://lkml.org/lkml/2008/6/10/137


--

From: Romano Giannetti
Date: Sunday, June 15, 2008 - 3:21 am

It happened, again, with the new Ubuntu x-intel driver. Rebooting with
nousplash solved the problem, but alas, sometime simply rebooting solves
the problem. It seems a race...

Romano

--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10872
Subject		: x86_64 boot hang when CONFIG_NUMA=n
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2008-06-05 21:50 (10 days old)
References	: http://marc.info/?l=linux-kernel&m=121270308607116&w=4
		  http://lkml.org/lkml/2008/6/11/355
Handled-By	: Yinghai Lu <yhlu.kernel@gmail.com>


--

From: Randy Dunlap
Date: Sunday, June 15, 2008 - 9:35 am

Yes, still happens for me on 2.6.26-rc6-git2.


---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
--

From: Yinghai Lu
Date: Sunday, June 15, 2008 - 12:18 pm

please send out whole boot log with numa on and numa off
and boot with debug

please apply attached debug patch too.

YH
From: Randy Dunlap
Date: Sunday, June 15, 2008 - 6:11 pm

OK, did all of that.
I should probably note that in both cases, the kernel is loaded/booted by using kexec.
Both boot logs are captured generated via netconsole.

The failing boot log is netcon-4409.log.  The working boot log (CONFIG_NUMA=y) is
netcon-4410.log.  Enabling CONFIG_NUMA makes the following changes:

4c4
< # Sun Jun 15 15:00:56 2008
241c241,246
< # CONFIG_NUMA is not set


Thanks,
---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
From: Yinghai Lu
Date: Sunday, June 15, 2008 - 9:12 pm

the print out looks all right.

any chance to use normal serial console to capture the boot log?

YH
--

From: Randy Dunlap
Date: Sunday, June 15, 2008 - 10:14 pm

Not that I know of.

---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
--

From: Yinghai Lu
Date: Sunday, June 15, 2008 - 9:15 pm

how about the numa=off on the kernel with CONFIG_NUMA=y?

YH
--

From: Randy Dunlap
Date: Monday, June 16, 2008 - 8:32 am

Hi,
That hangs the same way as the previous.

BTW, that kernel boot option needs to be documented in
Documentation/kernel-parameters.txt ...

---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10903
Subject		: ssh connections hang with 2.6.26-rc5
Submitter	: Didier Raboud <didier@raboud.com>
Date		: 2008-06-13 02:39 (2 days old)


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10905
Subject		: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Submitter	: Miquel van Smoorenburg <miquels@cistron.nl>
Date		: 2008-05-21 13:30 (25 days old)
References	: http://lkml.org/lkml/2008/5/21/131
		  http://lkml.org/lkml/2008/6/12/121
Handled-By	: Glauber Costa <gcosta@redhat.com>
		  Andi Kleen <andi@firstfloor.org>
		  Miquel van Smoorenburg <mikevs@xs4all.net>
Patch		: http://lkml.org/lkml/2008/5/28/42


--

From: Miquel van Smoorenburg
Date: Monday, June 16, 2008 - 4:26 am

This bug was recently fixed in 2.6.26 by commit
0269c5c6d9a9de22715ecda589730547435cd3e8

Mike.

--

From: Rafael J. Wysocki
Date: Monday, June 16, 2008 - 6:19 am

Thanks, closed.

Rafael
--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10906
Subject		: repeatable slab corruption with LTP msgctl08
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-06-12 5:13 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121324775927704&w=4
Handled-By	: Pekka J Enberg <penberg@cs.helsinki.fi>
		  Christoph Lameter <clameter@sgi.com>
		  Manfred Spraul <manfred@colorfullife.com>
		  Andi Kleen <andi@firstfloor.org>


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10908
Subject		: IPF Montvale machine panic when running a network-relevent testing
Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date		: 2008-06-13 8:19 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121334523711437&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10912
Subject		: Regressions in the last kernels
Submitter	: werner <werner@sys-linux.yi.org>
Date		: 2008-06-14 18:26 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121346933911641&w=4


--

From: Rafael J. Wysocki
Date: Saturday, June 14, 2008 - 1:12 pm

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.25.  Please verify if it still should be listed.


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9791
Subject		: Clock is running too fast^Wslow using acpi_pm clocksource
Submitter	: tosn00j02@sneakemail.com
Date		: 2008-05-03 05:09 (43 days old)


--

From: Linus Torvalds
Date: Saturday, June 14, 2008 - 2:42 pm

I don't believe this is a regression, at least the 8GB thing. The 
HIGHMEM64G config option has had a

	depends on !M386 && !M486

for quite a while now. It certainly was in 2.6.25 already.

So if you want PAE support, we do require that you ask for a kernel that 
has cmpxchg8b support (needed for the atomic 64-bit clearing of a PAE page 
table entry). Not to mention a CPU that supports PAE. And that is simply 
incompatible with "I want it to work on an i486 too".

So saying "I want a kernel that uses PAE _and_ works on an i486" is simply 
nonsensical. If we ever supported it, it was a mistake, and wouldn't have 

I think this got fixed by ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5: "tcp: 

I think this is likely fixed by the same revert as above.

David?

		Linus
--

From: Maciej W. Rozycki
Date: Saturday, June 14, 2008 - 3:00 pm

From what you have written it looks the dependency should actually be:

	depends on !M386 && !M486 && !M586 && !M586TSC && !M586MMX

as none of the pre-Pentium-Pro processors had the PAE feature (I am not
sure about non-Intel implementations, so the case of M586 would have to be
investigated).  It was originally planned for the Pentium, but abandoned
because of the die size required -- the details behind the story were
obviously never very well known, but it was definitely related to some
cost implications.  The feature was reportedly documented in the earlier
not-so-widely-available revisions of the Pentium manuals and later on
removed while some of the other stuff was migrated to the (in)famous
Appendix H.  This also explains the odd location of the PAE bits among the
CPUID flags and in the CR4 register, which was initially marked as
reserved.

  Maciej
--

From: Linus Torvalds
Date: Saturday, June 14, 2008 - 5:00 pm

Yes, it's the non-intel ones that would keep me from saying !M586.

For intel, PAE was a PPro feature (at least officially, as you point out), 
but I do not know about various other manufacturers. From personal 
experience, the line between Pentium and PPro features doesn't tend to be 
totally black-and-white (although I suspect that when it comes to PAE it 
_may_ be).

		Linus
--

From: Maciej W. Rozycki
Date: Sunday, June 15, 2008 - 4:31 pm

Well, PAE is quite a significant block to implement and Intel kept it
hidden until they published the long awaited PentiumPro manual sometime in
1996.  I am fairly sure the K5 did not implement it (it may have had PSE 
and VME, especially in the later revisions) and Google does not show up 
any Cyrix processors with PAE.  I may have a K5 manual somewhere, so I can 
see if I can verify it.

 Please also note these processors tried to compete with Intel on the
desktop market where 4GB of RAM was completely unreasonable in late 90s.  
I think unless someone can recall a counter-example, it can be safely
assumed these chips did not have the PAE.  We could try to extend the
dependency and see if anybody screams.

  Maciej
--

From: Dave Jones
Date: Tuesday, June 17, 2008 - 8:24 am

On Mon, Jun 16, 2008 at 12:31:52AM +0100, Maciej W. Rozycki wrote:
 > On Sat, 14 Jun 2008, Linus Torvalds wrote:
 > 
 > > >  From what you have written it looks the dependency should actually be:
 > > > 
 > > > 	depends on !M386 && !M486 && !M586 && !M586TSC && !M586MMX
 > > > 
 > > > as none of the pre-Pentium-Pro processors had the PAE feature (I am not
 > > > sure about non-Intel implementations, so the case of M586 would have to be
 > > > investigated).
 > > 
 > > Yes, it's the non-intel ones that would keep me from saying !M586.
 > > 
 > > For intel, PAE was a PPro feature (at least officially, as you point out), 
 > > but I do not know about various other manufacturers. From personal 
 > > experience, the line between Pentium and PPro features doesn't tend to be 
 > > totally black-and-white (although I suspect that when it comes to PAE it 
 > > _may_ be).
 > 
 >  Well, PAE is quite a significant block to implement and Intel kept it
 > hidden until they published the long awaited PentiumPro manual sometime in
 > 1996.  I am fairly sure the K5 did not implement it (it may have had PSE 
 > and VME, especially in the later revisions) and Google does not show up 
 > any Cyrix processors with PAE.  I may have a K5 manual somewhere, so I can 
 > see if I can verify it.

Even the K6 didn't have PAE.  The Athlon was AMD's first CPU that had it.
 
 >  Please also note these processors tried to compete with Intel on the
 > desktop market where 4GB of RAM was completely unreasonable in late 90s.  
 > I think unless someone can recall a counter-example, it can be safely
 > assumed these chips did not have the PAE.  We could try to extend the
 > dependency and see if anybody screams.
 
I agree.  To the best of my knowledge (and looking through output of
x86info from lots of old CPUs), Intel had the only CPUs with PAE in
that era.

	Dave

-- 
http://www.codemonkey.org.uk
--

From: David Miller
Date: Saturday, June 14, 2008 - 4:31 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

No, this is looking like a different bug.

The behavior of that bug would not usually be a crash, but
rather stuck connections, and I severely doubt anything in
that specweb test setup is using the deferred-accept option

I think this is also a seperate bug.  Ilpo has asked the reporter for
more information.
--

From: Linus Torvalds
Date: Saturday, June 14, 2008 - 5:41 pm

Are you sure? Because that revert seems to basically revert all changes 
since 2.6.25 in tcp_rcv_established(), which is the function that oopses. 
After that revert, the function is back to exactly what it used to be.

Of course, inlining makes it less obvious what other changes end up doing, 
but even the offset in the function (not quite at the very end of it, but 
not that far off that end either) matches where you'd expect that that 
'tcp_defer_accept_check()' thing used to be before the revert.

Also: see the report saying

  "As a matter of fact, kernel paniced at statement
   "queue->rskq_accept_tail->dl_next = req" in function reqsk_queue_add,
   because queue->rskq_accept_tail is NULL.  The call chain is:
   tcp_rcv_established => inet_csk_reqsk_queue_add => reqsk_queue_add."

and realize that that whole inet_csk_reqsk_queue_add() call only exists
in that tcp_defer_accept_check() thing that no longer exists.

IOW, I'm pretty damn sure that the bug entry above is very much a result
of the tcp_defer_accept_check() thing, and that commit ec0a196626 fixed

Hey, I might be wrong. But see above. I don't think I am. I think the 
deferred-accept was just even buggier than you believed.

But who knows. 

		Linus
--

From: David Miller
Date: Saturday, June 14, 2008 - 6:07 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

I agree with the gist of your analysis.

And it seems that Apache does try to use the deferred accept socket
option.  So we may indeed have a hit on this IA64 bug.

The wording in the report about versions is a little confusing:

	With kernel 2.6.26-rc5 and a git kernel just between rc4
	and rc5, my kernel panic...

Does this mean that the problem appeared between rc4 and rc5?  Or
that all 2.6.26-rcX releases have the problem?  That's an important
fact because the change in question showed up in 2.6.26-rc1, as it

Because of the requirements to trigger the new code, this case is
not likely to match the revert.  SSH absolutely does not use the
deferred accept socket option.

Let's look at the change in question.

Every single code path touched in the data paths are guarded
with "tp->defer_tcp_accept.request" which will be NULL unless
1) defer-accept socket option enabled and 2) a new connection
got queued up there.

Nothing about the normal accept queue handling got modified by those
changes which were reverted.

And note that this means the behavior change only hits listening
sockets.  So if we have a report that client outgoing SSH
connections hang with the current kernel, that report cannot
reasonably match this revert.

I also anticipate that if this change could trigger problems for
non-deferred-accept cases, we'd see a ton more reports than we have.

And we did some research and one of the only major servers that use
this obscure defer-accept feature is distcc and apache.  It is this
element of Ingo's bug report (that he uses distcc heavily and it was a
distcc socket which hung) that helped us narrow things down.

The SSH report clearly states "With kernel 2.6.26-rc5, ssh connections
to _remote_ servers randomly hang".  So this is a report about SSH
client connections under 2.6.26-rc5, not SSH server connections and
therefore not listening sockets.

So right now I'd say that the IA64 case could ...
From: Linus Torvalds
Date: Saturday, June 14, 2008 - 7:15 pm

Ok. The only reason I matched up the ssh case was because it was reported 
to fix the stuck connections that Ingo had with distcc-over-loopback, so I 
thought the stuck-ssh thing could be the same.

But if you are sure ssh doesn't trigger the same thing, I really didn't 
have anything else to go on than "stuck TCP connection sounds familiar".

		Linus
--

From: Rafael J. Wysocki
Date: Sunday, June 15, 2008 - 3:51 am

No, it is not.  The other problem described in this message seems to be a
recent regression, though, at least the reporter thinks so.

Thanks,
Rafael

--

From: Vegard Nossum
Date: Saturday, June 14, 2008 - 3:09 pm

http://lkml.org/lkml/2008/6/14/62

was just reported today. Seems to have been caused by

commit 3ac7fe5a4aab409bd5674d0b070bce97f9d20872
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Apr 30 00:55:01 2008 -0700

    infrastructure to debug (dynamic) objects

which was introduced just after v2.6.25, but not discovered until now,
probably because it requires the (admittedly obscure) combination of
lockdep and slub/object debugging.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Rafael J. Wysocki
Date: Sunday, June 15, 2008 - 4:14 am

Added to the list as http://bugzilla.kernel.org/show_bug.cgi?id=10918 .

Thanks,
Rafael
--

Previous thread: none

Next thread: 2.6.25.4-rt6 doesn't build with RT_GROUP_SCHED && !SMP by Adam Sampson on Saturday, June 14, 2008 - 2:10 pm. (1 message)