This message contains a list of some regressions from 2.6.25, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.25, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2008-06-14 130 37 28 2008-06-07 125 48 33 2008-05-31 115 52 31 2008-05-24 94 47 28 2008-05-18 80 51 37 2008-05-11 53 46 34 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10912 Subject : Regressions in the last kernels Submitter : werner <werner@sys-linux.yi.org> Date : 2008-06-14 18:26 (1 days old) References : http://marc.info/?l=linux-kernel&m=121346933911641&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10908 Subject : IPF Montvale machine panic when running a network-relevent testing Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com> Date : 2008-06-13 8:19 (2 days old) References : http://marc.info/?l=linux-kernel&m=121334523711437&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10906 Subject : repeatable slab corruption with LTP msgctl08 Submitter : Andrew Morton <akpm@linux-foundation.org> Date : 2008-06-12 5:13 (3 days old) References : http://marc.info/?l=linux-kernel&m=121324775927704&w=4 Handled-By : Pekka J Enberg <penberg@cs.helsinki.fi> Christoph Lameter <clameter@sgi.com> Manfred Spraul <manfred@colorfullife.com> Andi Kleen <andi@firstfloor.org> Bug-Entry : ...
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10493 Subject : mips BCM47XX compile error Submitter : Adrian Bunk <adrian.bunk@movial.fi> Date : 2008-04-20 17:07 (56 days old) References : http://lkml.org/lkml/2008/4/20/34 http://lkml.org/lkml/2008/5/12/30 http://lkml.org/lkml/2008/5/18/131 http://lkml.org/lkml/2008/5/31/202 http://lkml.org/lkml/2008/6/7/154 Patch : http://marc.info/?l=linux-kernel&m=120876451216558&w=2 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10629 Subject : 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160 Submitter : Alexey Dobriyan <adobriyan@gmail.com> Date : 2008-05-05 09:59 (41 days old) References : http://lkml.org/lkml/2008/5/5/28 Handled-By : Paul E. McKenney <paulmck@linux.vnet.ibm.com> --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10711 Subject : BUG: unable to handle kernel paging request - scsi_bus_uevent Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com> Date : 2008-05-14 11:23 (32 days old) References : http://lkml.org/lkml/2008/5/14/111 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10642 Subject : general protection fault: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com> Date : 2008-05-07 16:03 (39 days old) References : http://lkml.org/lkml/2008/5/7/48 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10714 Subject : Badness seen on 2.6.26-rc2 with lockdep enabled Submitter : Balbir Singh <balbir@linux.vnet.ibm.com> Date : 2008-05-14 12:57 (32 days old) References : http://marc.info/?l=linux-kernel&m=121076917429133&w=4 --
Benjamin, you said you wanted to have a look at this?
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Ah yes, slipped out of my mind. It's probably a missing annotation in the RTAS code, I'll have a look tomorrow. Thanks, Ben. --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10724 Subject : ACPI: EC: GPE storm detected, disabling EC GPE Submitter : Justin Mattock <justinmattock@gmail.com> Date : 2008-05-16 6:17 (30 days old) References : http://marc.info/?l=linux-kernel&m=121091875711824&w=4 http://lkml.org/lkml/2008/5/18/168 http://lkml.org/lkml/2008/5/25/195 http://lkml.org/lkml/2008/5/25/195 Patch : debug EC GPE debug EC GPE debug EC GPE debug EC GPE http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view debug EC GPE</a> --
I've just pulled the latest git, and applied the latest patch that was sent to me, I'm not seeing this message at the moment. I am unsure if the problem is fixed or not. -- Justin P. Mattock --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10726 Subject : x86-64 NODES_SHIFT compile failure. Submitter : Dave Jones <davej@codemonkey.org.uk> Date : 2008-05-16 12:54 (30 days old) References : http://lkml.org/lkml/2008/5/16/312 Handled-By : Mike Travis <travis@sgi.com> Patch : http://lkml.org/lkml/2008/5/16/343 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10730 Subject : build issue #503 for v2.6.26-rc2-433-gf26a398 : undefined reference to `request_firmware' Submitter : Toralf Förster <toralf.foerster@gmx.de> Date : 2008-05-16 17:06 (30 days old) References : http://marc.info/?l=linux-kernel&m=121095777616792&w=4 Handled-By : James Bottomley <James.Bottomley@HansenPartnership.com> Patch : http://marc.info/?l=linux-scsi&m=121101871800303&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10741 Subject : bug in `tty: BKL pushdown'? Submitter : Johannes Weiner <hannes@saeurebad.de> Date : 2008-05-18 2:16 (28 days old) References : http://marc.info/?l=linux-kernel&m=121107706506181&w=4 Handled-By : Alan Cox <alan@lxorguk.ukuu.org.uk> --
Hi, The bug still exists, however, a bisect on another machine with the same userland leads to different commit (47f86834bbd4193139d61d659bebf9ab9d691e37 "redo locking of tty->pgrp"), so it is not all that clear and stable. I will investigate further but the entry should probably stay for now. Hannes --
Now that would actually make a lot more sense as a root cause. --
On Mon, 16 Jun 2008 11:33:13 +0100
Experiment time. In _proc_set_tty() in tty_io.c move the
tty->session = get_pid(task_session(tsk));
back inside the lock just before
tty->pgrp = get_pid(task_pgrp(tsk));
Alan
--
"Standards committees don't like hashing. It looks complicated and
insufficiently deterministic on an overhead projector."
- Vern Schryver
--
Hi, Like this:? spin_lock() put_pid() put_pid() tty->session = tty->pgrp = spin_unlock() That does not fix it. Hannes --
Thanks. That rules out the one case I could see that might have pointed to a potential bug. Alan --
Hi, The second bisection was the wrong one, sorry for the confusion. I tried again (manually) and the result is (still) this: Everything fine with HEAD at e5238442 "serial_core: Prepare for BKL push down". Weird behaviour as described with HEAD at 04f378b19 "tty: BKL pushdown". Hannes --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10725 Subject : Write protect on on Submitter : Maciej Rutecki <maciej.rutecki@gmail.com> Date : 2008-05-16 14:55 (30 days old) References : http://marc.info/?l=linux-kernel&m=121095168003572&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10764 Subject : some serial configurations are now broken Submitter : Russell King <rmk+lkml@arm.linux.org.uk> Date : 2008-05-20 7:35 (26 days old) References : http://marc.info/?l=linux-kernel&m=121126931810706&w=2 Handled-By : Javier Herrero <jherrero@hvsistemas.es> Russell King <rmk+lkml@arm.linux.org.uk> --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10786 Subject : 2.6.26-rc3 64bit SMP does not boot on J5600 Submitter : Domenico Andreoli <cavokz@gmail.com> Date : 2008-05-22 16:14 (24 days old) References : http://marc.info/?l=linux-kernel&m=121147328028081&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10799 Subject : sky2 general protection fault Submitter : Nicolas Mailhot <Nicolas.Mailhot@LaPoste.net> Date : 2008-05-26 11:05 (20 days old) --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10794 Subject : mips: CONF_CM_DEFAULT build error Submitter : Adrian Bunk <adrian.bunk@movial.fi> Date : 2008-05-25 10:11 (21 days old) References : http://lkml.org/lkml/2008/5/25/168 http://lkml.org/lkml/2008/6/11/295 Patch : http://lkml.org/lkml/2008/6/1/125 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10815 Subject : 2.6.26-rc4: RIP find_pid_ns+0x6b/0xa0 Submitter : Alexey Dobriyan <adobriyan@gmail.com> Date : 2008-05-27 09:23 (19 days old) References : http://lkml.org/lkml/2008/5/27/9 http://lkml.org/lkml/2008/6/14/87 Handled-By : Oleg Nesterov <oleg@tv-sign.ru> Linus Torvalds <torvalds@linux-foundation.org> Paul E. McKenney <paulmck@linux.vnet.ibm.com> Patch : http://lkml.org/lkml/2008/5/28/16 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10819 Subject : Fatal DMA error with b43 driver since 2.6.26 Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2008-05-29 13:16 (17 days old) --
This regression is fixed by
21691a38db9d465a109c5ec25cd3956a18cfcf5d
Author: Michael Buesch <mb@bu3sch.de> 2008-06-12 15:33:13
Committer: John W. Linville <linville@tuxdriver.com> 2008-06-14 01:18:58
Parent: 9983f35f12b8be71d13b8aca6dbf781d3342c7aa (rt2x00: LEDS build failure)
Child: 33593dbf334869456167bc66511bc54c4ba39dc5 (mac80211 : fix for iwconfig in ad-hoc mode)
Branches: master, remotes/origin/master
Follows: merge-2008-06-14
Precedes: master-2008-06-14
ssb: Fix coherent DMA mask for PCI devices
This fixes setting the coherent DMA mask for PCI devices.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
------------------------------ drivers/ssb/main.c ------------------------------
index 7cf8851..d184f2a 100644
@@ -1168,15 +1168,21 @@ EXPORT_SYMBOL(ssb_dma_translation);
int ssb_dma_set_mask(struct ssb_device *ssb_dev, u64 mask)
{
struct device *dma_dev = ssb_dev->dma_dev;
+ int err = 0;
#ifdef CONFIG_SSB_PCIHOST
- if (ssb_dev->bus->bustype == SSB_BUSTYPE_PCI)
- return dma_set_mask(dma_dev, mask);
+ if (ssb_dev->bus->bustype == SSB_BUSTYPE_PCI) {
+ err = pci_set_dma_mask(ssb_dev->bus->host_pci, mask);
+ if (err)
+ return err;
+ err = pci_set_consistent_dma_mask(ssb_dev->bus->host_pci, mask);
+ return err;
+ }
#endif
dma_dev->coherent_dma_mask = mask;
dma_dev->dma_mask = &dma_dev->coherent_dma_mask;
- return 0;
+ return err;
}
EXPORT_SYMBOL(ssb_dma_set_mask);
--
Greetings Michael.
--
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10821 Subject : rt25xx: lock dependency warning, association failure, and kmalloc corruption Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2008-05-29 14:30 (17 days old) --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10826 Subject : NFS oops in 2.6.26rc4 Submitter : Dave Jones <davej@redhat.com> Date : 2008-05-27 19:04 (19 days old) References : http://marc.info/?l=linux-kernel&m=121191548915522&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10827 Subject : 2.6.26rc4 GFS2 oops. Submitter : Dave Jones <davej@redhat.com> Date : 2008-05-27 15:44 (19 days old) References : http://lkml.org/lkml/2008/5/27/297 --
Dave, what is the status of this bug?
It's currently listed as a 2.6.26-rc regression.
Is it actually confirmed that 2.6.25 is fine?
According to the thread of the bug report there should now be a bug
report in the Red Hat Bugzilla for it. Bug number?
Thanks
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Hi, This appears to be a known bug. There's a Fedora bugzilla record for it here, which contains a patch to fix the problem: https://bugzilla.redhat.com/show_bug.cgi?id=448866 The bug does not appear to be in 2.6.25; 2.6.25 is fine afaict. Regards, Bob Peterson Red Hat GFS --
Yup, the patch in your Bugzilla is for code that is new in 2.6.26.
Can you push your patch for inclusion into 2.6.26 so that 2.6.26 won't
Thanks
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Hi Adrian, Unfortunately, I cannot. All access to the gfs2 "-nmw" git tree is controlled by Steve Whitehouse, and he is on vacation/holiday until tomorrow. I've submitted the patch to cluster-devel, so hopefully he'll push it as soon as he returns tomorrow. Regards, Bob Peterson Red Hat GFS --
You can post the patch in this thread, with CC to Andrew Morton. Thanks, Rafael --
If Steve is on vacation only until tomorrow there's not a need to bypass
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10828 Subject : [2.6.25-git18 =&gt; 2.6.26-rc1-git1] Xorg crash with xf86MapVidMem error Submitter : Rufus &amp; Azrael <rufus-azrael@numericable.fr> Date : 2008-05-04 10:24 (42 days old) References : http://lkml.org/lkml/2008/5/4/37 Handled-By : Ingo Molnar <mingo@elte.hu> H. Peter Anvin <hpa@zytor.com> Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> Patch : http://lkml.org/lkml/2008/5/29/371 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10843 Subject : Display artifacts on XOrg logout with PAT kernel and VESA framebuffer Submitter : Frans Pop <elendil@planet.nl> Date : 2008-05-31 14:04 (15 days old) References : http://lkml.org/lkml/2008/6/7/206 --
Yes. See also: http://lkml.org/lkml/2008/6/13/159 --
It happens to me too. Do you see http://bugzilla.kernel.org/show_bug.cgi?id=10892 ? Romano -- Romano Giannetti Dep. de Electrónica y Automática http://www.dea.icai.upcomillas.es/romano Univ. Pontificia Comillas (MADRID) -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. --
Do these two bug entries refer to the same problem? Rafael --
Exactly what happens to you to? Do you also see the artifacts? Do you also use vesafb? Do the artifacts go away if you boot with 'video=vfb:off' to disable the framebuffer? Do they go away if you Looks unrelated to me. Cheers, FJP --
Yes, maybe you're right. Could not test too, booting with nopat gave me two times in a row the black screen. Mmhhh.... Romano -- Romano Giannetti Dep. de Electrónica y Automática http://www.dea.icai.upcomillas.es/romano Univ. Pontificia Comillas (MADRID) -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. --
I do not know, just a wild guess. I noticed the flashing color when doing shutdown, and being Intel the common chipset... -- Romano Giannetti Dep. de Electrónica y Automática http://www.dea.icai.upcomillas.es/romano Univ. Pontificia Comillas (MADRID) -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. --
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Dunno. Do I need to recompile or there is some boot option to disable it? Rmano -- Romano Giannetti Dep. de Electrónica y Automática http://www.dea.icai.upcomillas.es/romano Univ. Pontificia Comillas (MADRID) -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. --
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Frans, With or with out pat, in the recent kernels (like 2.6.26-rc4/rc5 etc), ioremap() uses UC- and PCI mmap of /sys/devices/pci.../resource (used by X) uses UC- And fb_mmap() also uses UC-. It's interesting that you don't see this artifact with "nopat". Essentially with or with out pat enabled, we use the same memory attributes. So depending on the MTRR setup (set by X server), effective memory attribute across different mappings should be same (which is UC- or WC with mtrr). Can you also check, if there is any impact with kernel boot param for vesafb "mtrr:3"? thanks, suresh --
Hello Suresh. Thanks for responding.
I've done 4 successive boots with the boot parameters as shown below.
Each boot was basically: set correct parameters in grub -> login to KDE
-> reboot and check for artifacts.
1) (none) --> clean
2) vga=791 --> artifacts
3) vga=791 nopat --> clean
4) vga=791 video=vesafb:mtrr:3 --> artifacts
So the mtrr option did not help (if I passed it correctly; the double ":"
is somewhat non-intuitive). The kernel log also does not show any
difference I can see in the last boot, but I don't know if the mtrr option
is supposed to show up in any way.
From the kernel log for each boot:
1) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Console: colour VGA+ 80x25
console [tty0] enabled
2) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Console: colour dummy device 80x25
console [tty0] enabled
vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
vesafb: mode is 1024x768x16, linelength=2048, pages=4
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
3) PAT support disabled
Console: colour dummy device 80x25
console [tty0] enabled
vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
vesafb: mode is 1024x768x16, linelength=2048, pages=4
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
4) x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Console: colour dummy device 80x25
console [tty0] enabled
vesafb: framebuffer at 0x80000000, mapped to 0xffffc20000900000, using 3072k, total 7872k
vesafb: mode is 1024x768x16, linelength=2048, pages=4
vesafb: ...If the initlevel is '3', then the mtrr option will show up in /proc/mtrr otheriwse not. In init level '5', X server will add the mtrr (irrespective of boot option, if it's not already there) and will remove it when the X process completes its execution. Can you also please try if "mtrr:1" makes any difference. This will setup the mapping as UC during boot. Apart from PAT WC mapping(which we shouldn't be using in your current setup), UC MTRR should override all the other PAT mappings and should be consistent across X and VT console mappings. As such, if the problem is because of improper aliasing, then with this UC MTRR, my understanding is that we shouldn't see any artifacts with the "mtrr:1". with this mtrr:1, we should now see a UC mtrr setting in /proc/mtrr. thanks, suresh --
That was a useful pointer. I do see some differences when I compare mtrr:1 still gives the artifacts and no any difference to /proc/mtrr. Here's /proc/cmdline + /proc/mtrr for three different boots: root=/dev/mapper/main-root ro vga=791 quiet reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7f800000 (2040MB), size= 8MB: uncachable, count=1 reg02: base=0x7f700000 (2039MB), size= 1MB: uncachable, count=1 reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1 root=/dev/mapper/main-root ro vga=791 quiet video=vesafb:mtrr:1 reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7f800000 (2040MB), size= 8MB: uncachable, count=1 reg02: base=0x7f700000 (2039MB), size= 1MB: uncachable, count=1 reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1 root=/dev/mapper/main-root ro vga=791 quiet video=vesafb:mtrr:3 reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7f800000 (2040MB), size= 8MB: uncachable, count=1 reg02: base=0x7f700000 (2039MB), size= 1MB: uncachable, count=1 reg03: base=0x80000000 (2048MB), size= 256MB: write-combining, count=1 I do see some differences in Xorg logs, so it does seem that the mtrr options _are_ being recognized. Attached my "normal" Xorg log (with 'vga=791') which I used as the base for the diffs below. Other than shown, the logs are identical. With mtrr:1 I get (added at the end of the log): @@ -688,3 +688,11 @@ (II) evaluating device (Generic Keyboard) (II) XINPUT: Adding extended input device "Generic Keyboard" (type: KEYBOARD) (II) Configured Mouse: ps2EnableDataReporting: succeeded +(II) intel(0): xf86UnbindGARTMemory: unbind key 0 +(II) intel(0): xf86UnbindGARTMemory: unbind key 1 +(II) intel(0): xf86UnbindGARTMemory: unbind key 2 +(II) intel(0): xf86UnbindGARTMemory: unbind key 3 +(II) intel(0): xf86UnbindGARTMemory: unbind key 4 +(II) intel(0): [drm] removed 1 reserved context for kernel +(II) ...
Oops. Just realized that this is completely bogus. I used the .old log for this one while I used logs for still running Xorg sessions for the others. So this was actually the only one that contains Xorg shutdown This is still valid though. Sorry for the confusion. --
Any progress on this issue? It's still there with -rc7, but I doubt that comes as a surprise. Has anyone tried to reproduce this? I would think that should be trivial. Just as a summary: - Intel 82945G/GZ graphics [8086:2772] (ICH7 based system) - FB_VESA=y, FRAMEBUFFER_CONSOLE=y - boot with vga=791 - Log in to X and KDE; I do need to really log in there are no artifacts if I exit X from the kdm login dialog - artifacts show on logout I doubt it's KDE related or even related to my specific graphics card. It may well be related to what is or has been displayed on the display before logging out, so running some apps may make sense. Seems like I do see remnants of for example aptitude (Debian apt frontend) after I've run it in an X term (KDE's konsole). Cheers, FJP --
FJP, We will try to reproduce this and getback. Your earlier responses did not give many clues. thanks, suresh --
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10843 Hello all, I'd like to bring this issue to your attention once again as it is still present in 2.6.27-rc6. Note also that I can trivially reproduce exactly the same behavior on three rather different systems. The artifacts even look similar and in all cases they disappear with 'nopat'. The systems are: Toshiba Satellite A40 laptop: - Intel 82852/855GM Integrated Graphics Device [8086:3582] - ICH4 based - Mobile Intel Pentium 4 processor, i386 kernel Intel desktop system: - Intel 82945G/GZ Integrated Graphics Controller [8086:2772] - ICH7 based - Pentium D processor, x86_64 kernel HP Compaq 2510p laptop: - Intel Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] - ICH8 based - Core2 Duo processor, x86_64 kernel Well, unfortunately I can only provide the info you ask for :-) Cheers, FJP --
Hi, What does the output of x86/pat_memtype_list under debugfs look like? You may need the following if you are not already mounting debugfs. mount -t debugfs debugfs /proc/sys/debug cat /proc/sys/debug/x86/pat_memtype_list Thanks, Venki --
I've attached that file and dmesg output for two of the machines. Cheers, FJP
OK. This is the same issue as the one on this thread here http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/1532.html We have too many entires in PAT list due to RAM pages being marked UC by drivers. Unfortunately, there are no quick fixes that can fix this for 2.6.27. However, we are aware of the problem here and working on a more complete fix for this. We should have the patch for it soon. Thanks, Venki --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10830 Subject : two different oopses with 2.6.26-rc4 Submitter : Alejandro Riveira Fernández <alejandro.riveira@gmail.com> Date : 2008-05-28 9:50 (18 days old) References : http://marc.info/?l=linux-kernel&m=121196833026310&w=4 Handled-By : Johannes Berg <johannes@sipsolutions.net> Andrew Morton <akpm@linux-foundation.org> Peter Zijlstra <a.p.zijlstra@chello.nl> Patch : http://lkml.org/lkml/2008/5/20/683 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10860 Subject : total system freeze at boot with 2.6.26-rc Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2008-06-05 12:38 (10 days old) --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10861 Subject : 2.6.26-rc4-git2 - long pause during boot Submitter : Chris Clayton <chris2553@googlemail.com> Date : 2008-06-01 4:15 (14 days old) References : http://marc.info/?l=linux-kernel&m=121229382917834&w=4 --
Well, we know how to create (and to avoid) the problem. Since the original patch adversely affects existing user space, it could be argued that it is a regression, but I see that there is a counter argument that the udev rule that triggers the problem is quite simply a bad rule. I'll leave to those more closely concerned that rule on that. Thanks Chris -- Beauty is in the eye of the beerholder. --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10865 Subject : i get the following oops trying to mount an ntfs partition on thinkpad Submitter : Alex Romosan <romosan@sycorax.lbl.gov> Date : 2008-06-05 14:47 (10 days old) References : http://marc.info/?l=linux-kernel&m=121267834421414&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10864 Subject : [regression][bisected] ~90,000 wakeups as of 2.6.26-rc3 Submitter : Németh Márton <nm127@freemail.hu> Date : 2008-06-03 5:18 (12 days old) References : http://marc.info/?l=linux-kernel&m=121247101601790&w=4 --
Hi, I already mentioned at the bug report that 2.6.26-rc6 this is fixed. Maybe tell your robot to first check the latest activities in the bug report since the last -rc release. What I want also to tell your robot that it should mention what actions should be taken in case the bug should be still listed or when the bug can be closed. Regards, Márton Németh --
i think the current regression tracking methods that Rafael uses work very well and i'd like to thank Rafael for those efforts - to me as a subsystem maintainer it is a _very_ useful thing. In this case there was no real harm from the "this bug is already fixed" condition - just an extra email. Real harm would only come from missed regressions or from incorrectly closed regressions - but those are not happening. note that there is no "robot" involved in changing the state of bugs - the real important work here is done by Rafael and checking whether a bug is still relevant is an inevitably manual work. The mails and reports are auto-generated but crawling discussions and determining the status of a regression is very hard to automate. Ingo --
Sorry, I thought a robot missed my comments in the bug tracking system for the second time: 1. Comment was on 2008-06-06 13:27:16 ( http://bugzilla.kernel.org/show_bug.cgi?id=10864#c3 ) The mail was coming: 7 Jun 2008 22:42:57 +0200 (CEST) ( http://lkml.org/lkml/2008/6/7/193 ) 2. Comment was on 2008-06-13 23:19:53 ( http://bugzilla.kernel.org/show_bug.cgi?id=10864#c4 ) The mail was coming: 14 Jun 2008 22:12:04 +0200 (CEST) ( http://lkml.org/lkml/2008/6/14/160 ) Nevertheless the bug #10864 can be closed I think. Regards, Márton Németh --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10862 Subject : forcedeth: lockdep warning on ethtool -s Submitter : Tobias Diedrich <ranma+kernel@tdiedrich.de> Date : 2008-06-01 8:37 (14 days old) References : http://marc.info/?l=linux-kernel&m=121230964032247&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10868 Subject : Oops on loading ipaq module since 2.6.26, prevents use of device Submitter : Adam Williamson <awilliamson@mandriva.com> Date : 2008-06-05 17:39 (10 days old) Handled-By : Alan Cox <alan@redhat.com> --
On Sat, 14 Jun 2008 22:12:04 +0200 (CEST) Still waiting for the actual attached result of the test patch to debug this further. Guess it will miss 2.6.26 --
I replied via email - as you requested earlier in the thread - and attached the result to that email. Did you not get it? -- adamw --
Andrew may have asked you to use email not me. The last I have is "Okay, output with the patch is attached. Thanks for your help" only the output in question isn't attached to the bug ? Alan --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10866 Subject : /dev/rtc was missing until I disabled CONFIG_RTC_CLASS Submitter : Lior Dotan <liodot@gmail.com> Date : 2008-06-05 15:04 (10 days old) References : http://marc.info/?l=linux-kernel&m=121267834521432&w=4 http://marc.info/?l=linux-kernel&amp;m=121267834521432&amp;w=4 http://marc.info/?l=linux-kernel&amp;m=121267834521432&amp;w=4 http://marc.info/?l=linux-kernel&amp;m=121267834521432&amp;w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10892 Subject : Sometime (often) X come out blank (black screen) on cold boot - Intel chipset Submitter : Romano Giannetti <romano.giannetti@gmail.com> Date : 2008-06-10 05:33 (5 days old) References : http://lkml.org/lkml/2008/6/10/137 --
It happened, again, with the new Ubuntu x-intel driver. Rebooting with nousplash solved the problem, but alas, sometime simply rebooting solves the problem. It seems a race... Romano --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10872 Subject : x86_64 boot hang when CONFIG_NUMA=n Submitter : Randy Dunlap <randy.dunlap@oracle.com> Date : 2008-06-05 21:50 (10 days old) References : http://marc.info/?l=linux-kernel&m=121270308607116&w=4 http://lkml.org/lkml/2008/6/11/355 Handled-By : Yinghai Lu <yhlu.kernel@gmail.com> --
Yes, still happens for me on 2.6.26-rc6-git2. --- ~Randy '"Daemon' is an old piece of jargon from the UNIX operating system, where it referred to a piece of low-level utility software, a fundamental part of the operating system." --
please send out whole boot log with numa on and numa off and boot with debug please apply attached debug patch too. YH
OK, did all of that. I should probably note that in both cases, the kernel is loaded/booted by using kexec. Both boot logs are captured generated via netconsole. The failing boot log is netcon-4409.log. The working boot log (CONFIG_NUMA=y) is netcon-4410.log. Enabling CONFIG_NUMA makes the following changes: 4c4 < # Sun Jun 15 15:00:56 2008 241c241,246 < # CONFIG_NUMA is not set Thanks, --- ~Randy '"Daemon' is an old piece of jargon from the UNIX operating system, where it referred to a piece of low-level utility software, a fundamental part of the operating system."
the print out looks all right. any chance to use normal serial console to capture the boot log? YH --
Not that I know of. --- ~Randy '"Daemon' is an old piece of jargon from the UNIX operating system, where it referred to a piece of low-level utility software, a fundamental part of the operating system." --
how about the numa=off on the kernel with CONFIG_NUMA=y? YH --
Hi, That hangs the same way as the previous. BTW, that kernel boot option needs to be documented in Documentation/kernel-parameters.txt ... --- ~Randy '"Daemon' is an old piece of jargon from the UNIX operating system, where it referred to a piece of low-level utility software, a fundamental part of the operating system." --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10903 Subject : ssh connections hang with 2.6.26-rc5 Submitter : Didier Raboud <didier@raboud.com> Date : 2008-06-13 02:39 (2 days old) --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10905 Subject : 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Submitter : Miquel van Smoorenburg <miquels@cistron.nl> Date : 2008-05-21 13:30 (25 days old) References : http://lkml.org/lkml/2008/5/21/131 http://lkml.org/lkml/2008/6/12/121 Handled-By : Glauber Costa <gcosta@redhat.com> Andi Kleen <andi@firstfloor.org> Miquel van Smoorenburg <mikevs@xs4all.net> Patch : http://lkml.org/lkml/2008/5/28/42 --
This bug was recently fixed in 2.6.26 by commit 0269c5c6d9a9de22715ecda589730547435cd3e8 Mike. --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10906 Subject : repeatable slab corruption with LTP msgctl08 Submitter : Andrew Morton <akpm@linux-foundation.org> Date : 2008-06-12 5:13 (3 days old) References : http://marc.info/?l=linux-kernel&m=121324775927704&w=4 Handled-By : Pekka J Enberg <penberg@cs.helsinki.fi> Christoph Lameter <clameter@sgi.com> Manfred Spraul <manfred@colorfullife.com> Andi Kleen <andi@firstfloor.org> --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10908 Subject : IPF Montvale machine panic when running a network-relevent testing Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com> Date : 2008-06-13 8:19 (2 days old) References : http://marc.info/?l=linux-kernel&m=121334523711437&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10912 Subject : Regressions in the last kernels Submitter : werner <werner@sys-linux.yi.org> Date : 2008-06-14 18:26 (1 days old) References : http://marc.info/?l=linux-kernel&m=121346933911641&w=4 --
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.25. Please verify if it still should be listed. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9791 Subject : Clock is running too fast^Wslow using acpi_pm clocksource Submitter : tosn00j02@sneakemail.com Date : 2008-05-03 05:09 (43 days old) --
I don't believe this is a regression, at least the 8GB thing. The HIGHMEM64G config option has had a depends on !M386 && !M486 for quite a while now. It certainly was in 2.6.25 already. So if you want PAE support, we do require that you ask for a kernel that has cmpxchg8b support (needed for the atomic 64-bit clearing of a PAE page table entry). Not to mention a CPU that supports PAE. And that is simply incompatible with "I want it to work on an i486 too". So saying "I want a kernel that uses PAE _and_ works on an i486" is simply nonsensical. If we ever supported it, it was a mistake, and wouldn't have I think this got fixed by ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5: "tcp: I think this is likely fixed by the same revert as above. David? Linus --
From what you have written it looks the dependency should actually be: depends on !M386 && !M486 && !M586 && !M586TSC && !M586MMX as none of the pre-Pentium-Pro processors had the PAE feature (I am not sure about non-Intel implementations, so the case of M586 would have to be investigated). It was originally planned for the Pentium, but abandoned because of the die size required -- the details behind the story were obviously never very well known, but it was definitely related to some cost implications. The feature was reportedly documented in the earlier not-so-widely-available revisions of the Pentium manuals and later on removed while some of the other stuff was migrated to the (in)famous Appendix H. This also explains the odd location of the PAE bits among the CPUID flags and in the CR4 register, which was initially marked as reserved. Maciej --
Yes, it's the non-intel ones that would keep me from saying !M586. For intel, PAE was a PPro feature (at least officially, as you point out), but I do not know about various other manufacturers. From personal experience, the line between Pentium and PPro features doesn't tend to be totally black-and-white (although I suspect that when it comes to PAE it _may_ be). Linus --
Well, PAE is quite a significant block to implement and Intel kept it hidden until they published the long awaited PentiumPro manual sometime in 1996. I am fairly sure the K5 did not implement it (it may have had PSE and VME, especially in the later revisions) and Google does not show up any Cyrix processors with PAE. I may have a K5 manual somewhere, so I can see if I can verify it. Please also note these processors tried to compete with Intel on the desktop market where 4GB of RAM was completely unreasonable in late 90s. I think unless someone can recall a counter-example, it can be safely assumed these chips did not have the PAE. We could try to extend the dependency and see if anybody screams. Maciej --
On Mon, Jun 16, 2008 at 12:31:52AM +0100, Maciej W. Rozycki wrote: > On Sat, 14 Jun 2008, Linus Torvalds wrote: > > > > From what you have written it looks the dependency should actually be: > > > > > > depends on !M386 && !M486 && !M586 && !M586TSC && !M586MMX > > > > > > as none of the pre-Pentium-Pro processors had the PAE feature (I am not > > > sure about non-Intel implementations, so the case of M586 would have to be > > > investigated). > > > > Yes, it's the non-intel ones that would keep me from saying !M586. > > > > For intel, PAE was a PPro feature (at least officially, as you point out), > > but I do not know about various other manufacturers. From personal > > experience, the line between Pentium and PPro features doesn't tend to be > > totally black-and-white (although I suspect that when it comes to PAE it > > _may_ be). > > Well, PAE is quite a significant block to implement and Intel kept it > hidden until they published the long awaited PentiumPro manual sometime in > 1996. I am fairly sure the K5 did not implement it (it may have had PSE > and VME, especially in the later revisions) and Google does not show up > any Cyrix processors with PAE. I may have a K5 manual somewhere, so I can > see if I can verify it. Even the K6 didn't have PAE. The Athlon was AMD's first CPU that had it. > Please also note these processors tried to compete with Intel on the > desktop market where 4GB of RAM was completely unreasonable in late 90s. > I think unless someone can recall a counter-example, it can be safely > assumed these chips did not have the PAE. We could try to extend the > dependency and see if anybody screams. I agree. To the best of my knowledge (and looking through output of x86info from lots of old CPUs), Intel had the only CPUs with PAE in that era. Dave -- http://www.codemonkey.org.uk --
From: Linus Torvalds <torvalds@linux-foundation.org> No, this is looking like a different bug. The behavior of that bug would not usually be a crash, but rather stuck connections, and I severely doubt anything in that specweb test setup is using the deferred-accept option I think this is also a seperate bug. Ilpo has asked the reporter for more information. --
Are you sure? Because that revert seems to basically revert all changes since 2.6.25 in tcp_rcv_established(), which is the function that oopses. After that revert, the function is back to exactly what it used to be. Of course, inlining makes it less obvious what other changes end up doing, but even the offset in the function (not quite at the very end of it, but not that far off that end either) matches where you'd expect that that 'tcp_defer_accept_check()' thing used to be before the revert. Also: see the report saying "As a matter of fact, kernel paniced at statement "queue->rskq_accept_tail->dl_next = req" in function reqsk_queue_add, because queue->rskq_accept_tail is NULL. The call chain is: tcp_rcv_established => inet_csk_reqsk_queue_add => reqsk_queue_add." and realize that that whole inet_csk_reqsk_queue_add() call only exists in that tcp_defer_accept_check() thing that no longer exists. IOW, I'm pretty damn sure that the bug entry above is very much a result of the tcp_defer_accept_check() thing, and that commit ec0a196626 fixed Hey, I might be wrong. But see above. I don't think I am. I think the deferred-accept was just even buggier than you believed. But who knows. Linus --
From: Linus Torvalds <torvalds@linux-foundation.org> I agree with the gist of your analysis. And it seems that Apache does try to use the deferred accept socket option. So we may indeed have a hit on this IA64 bug. The wording in the report about versions is a little confusing: With kernel 2.6.26-rc5 and a git kernel just between rc4 and rc5, my kernel panic... Does this mean that the problem appeared between rc4 and rc5? Or that all 2.6.26-rcX releases have the problem? That's an important fact because the change in question showed up in 2.6.26-rc1, as it Because of the requirements to trigger the new code, this case is not likely to match the revert. SSH absolutely does not use the deferred accept socket option. Let's look at the change in question. Every single code path touched in the data paths are guarded with "tp->defer_tcp_accept.request" which will be NULL unless 1) defer-accept socket option enabled and 2) a new connection got queued up there. Nothing about the normal accept queue handling got modified by those changes which were reverted. And note that this means the behavior change only hits listening sockets. So if we have a report that client outgoing SSH connections hang with the current kernel, that report cannot reasonably match this revert. I also anticipate that if this change could trigger problems for non-deferred-accept cases, we'd see a ton more reports than we have. And we did some research and one of the only major servers that use this obscure defer-accept feature is distcc and apache. It is this element of Ingo's bug report (that he uses distcc heavily and it was a distcc socket which hung) that helped us narrow things down. The SSH report clearly states "With kernel 2.6.26-rc5, ssh connections to _remote_ servers randomly hang". So this is a report about SSH client connections under 2.6.26-rc5, not SSH server connections and therefore not listening sockets. So right now I'd say that the IA64 case could ...
Ok. The only reason I matched up the ssh case was because it was reported to fix the stuck connections that Ingo had with distcc-over-loopback, so I thought the stuck-ssh thing could be the same. But if you are sure ssh doesn't trigger the same thing, I really didn't have anything else to go on than "stuck TCP connection sounds familiar". Linus --
No, it is not. The other problem described in this message seems to be a recent regression, though, at least the reporter thinks so. Thanks, Rafael --
http://lkml.org/lkml/2008/6/14/62 was just reported today. Seems to have been caused by commit 3ac7fe5a4aab409bd5674d0b070bce97f9d20872 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Apr 30 00:55:01 2008 -0700 infrastructure to debug (dynamic) objects which was introduced just after v2.6.25, but not discovered until now, probably because it requires the (admittedly obscure) combination of lockdep and slub/object debugging. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Added to the list as http://bugzilla.kernel.org/show_bug.cgi?id=10918 . Thanks, Rafael --
