In the five days since I booted a 2.6.25-rc6 kernel on my development machine, it has happened three times already that I came back to the machine to find that the X server had exited, killing all applications that still had a window open. This has never happened with previous kernel versions (2.6.22.17, 2.6.24.3, 2.6.25-rc5), so I suspect some relation to the latest rc kernel. Syslog entries around the time of the crash (identical each time): Mar 24 21:59:57 xenon kernel: [424469.161592] PM: Removing info for No Bus:vcs7 Mar 24 21:59:57 xenon kernel: [424469.162591] PM: Removing info for No Bus:vcsa7 Mar 24 21:59:57 xenon kernel: [424469.179556] uhci_hcd 0000:00:1d.1: release dev 4 ep81-INT, period 8, phase 4, 93 us Mar 24 21:59:57 xenon gconfd (ts-21705): Signal 15 erhalten, ordungsgem=E4=DFes Herunterfahren Mar 24 21:59:57 xenon gconfd (ts-21705): Beenden Mar 24 21:59:57 xenon gdm[21341]: WARNING: gdm_slave_xioerror_handler: Schwerwiegender X-Fehler - :0 wird neu gestartet Mar 24 21:59:58 xenon kernel: [426716.212842] PM: Adding info for No Bus:vcs7 Mar 24 21:59:58 xenon kernel: [426716.213300] PM: Adding info for No Bus:vcsa7 Mar 24 21:59:58 xenon kernel: [426716.295997] PM: Removing info for No Bus:vcs7 Mar 24 21:59:58 xenon kernel: [426716.296110] PM: Removing info for No Bus:vcsa7 Mar 24 21:59:59 xenon kernel: [426716.434662] PM: Adding info for No Bus:vcs7 Mar 24 21:59:59 xenon kernel: [426716.434742] PM: Adding info for No Bus:vcsa7 Mar 24 22:00:00 xenon kernel: [426717.727427] uhci_hcd 0000:00:1d.1: reserve dev 4 ep81-INT, period 8, phase 4, 93 us Mar 24 22:00:00 xenon kernel: [424472.319521] uhci_hcd 0000:00:1d.1: release dev 4 ep81-INT, period 8, phase 4, 93 us Mar 24 22:00:00 xenon kernel: [426717.997926] uhci_hcd 0000:00:1d.1: reserve dev 4 ep81-INT, period 8, phase 4, 93 us Mar 24 22:00:02 xenon gconfd (root-15531): (Version 2.20.0) wird gestartet, Prozesskennung 15531, Benutzer =BBroot=AB Mar 24 22:00:02 xenon gconfd (root-15531): Die Adresse =BBxml:reado...
anything in the X.org logfiles? does it happen with DRI off? I don't think any of the DRM changes I made could cause this, so I'd be looking towards something more generic.. --
Ah, of course. Sorry I forgot. Xorg.0.log.old ends with this: | Backtrace: | 0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81] | 1: [0xb7fc9400] | 2: /usr/lib/xorg/modules//extensions/libGLcore.so(_swrast_Triangle+0x2d) [0xb51ea5dd] | 3: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb51f1075] | 4: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb5206f2e] | 5: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb52080b8] | 6: /usr/lib/xorg/modules//extensions/libGLcore.so(_tnl_run_pipeline+0x153) [0xb520cb83] | 7: /usr/lib/xorg/modules//extensions/libGLcore.so(_tnl_draw_prims+0x3f0) [0xb51f8340] | 8: /usr/lib/xorg/modules//extensions/libGLcore.so(vbo_exec_vtx_flush+0x1fb) [0xb52471eb] | 9: /usr/lib/xorg/modules//extensions/libGLcore.so(vbo_exec_FlushVertices+0x7= 8) [0xb52488e8] | 10: /usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_Flush+0x83) [0xb5112663] | 11: /usr/lib/xorg/modules//extensions/libglx.so [0xb7c6a107] | 12: /usr/lib/xorg/modules//extensions/libglx.so [0xb7c4aa6d] | 13: /usr/bin/X [0x815809e] | 14: /usr/bin/X(Dispatch+0x1af) [0x808f68f] | 15: /usr/bin/X(main+0x47e) [0x807717e] | 16: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7d5dfe0] | 17: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501] | | Fatal server error: | Caught signal 11. Server aborting I have put the entire file at http://gollum.phnxsoft.com/~ts/linux/Xorg.0.log.old I guess it does, as my Xorg.0.log{,.old} says: | (WW) MGA(0): Direct rendering disabled HTH Tilman
Looks like a GL screensaver kicks in and blows away your X server sw-GL.. not sure how a new kernel could cause this though.. memory layout changes maybe.. --
any look finding what might have caused this? it certainly looks like some heap or address space change affects Mesa. Dave. --
Not really. It's obviously triggered by the screensaver. The machine runs SuSE's default: gnome-screensaver with mode =3D random and a list of 189 candidate screensavers, so I can't tell which one(s) of those actually triggers the problem. But perhaps that doesn't matter anyway. I recently collected another X backtrace with kernel 2.6.25-rc7 which looks quite different in the libglx section: 0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81] 1: [0xb7ef7400] 2: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7d8e8] 3: /usr/lib/xorg/modules//extensions/libglx.so(DoRender+0xdd) [0xb7b7645d= ] 4: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7657c] 5: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7aa6d] 6: /usr/bin/X [0x815809e] 7: /usr/bin/X(Dispatch+0x1af) [0x808f68f] 8: /usr/bin/X(main+0x47e) [0x807717e] 9: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7c8bfe0] 10: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501] But I know next to nothing about GL, so I'm out of my depth there. I'd welcome any suggestions on how to provoke the crash more regularly so that I could attempt to bisect. Thanks, Tilman --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
This is still quite elusive. It only happens once every couple of days, and the only thing that seems to be clear is that it's triggered by a GL screensaver. So if it's ok I'll continue sharing X crash backtraces while I go on groping for a lead. Here's my next one, this time with kernel 2.6.25-rc8: 0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81] 1: [0xb7f22400] 2: /usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_set_enable+0x1343) [0xb50aa673] 3: /usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_Disable+0x5a) [0xb50abc1a] 4: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba8b98] 5: /usr/lib/xorg/modules//extensions/libglx.so(DoRender+0xdd) [0xb7ba145d= ] 6: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba157c] 7: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba5a6d] 8: /usr/bin/X [0x815809e] 9: /usr/bin/X(Dispatch+0x1af) [0x808f68f] 10: /usr/bin/X(main+0x47e) [0x807717e] 11: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7cb6fe0] 12: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501] As for Pavel's suggestion of disabling heap randomization, given the current crash rate I'd have to try that for several weeks before I could be reasonably sure that it helped. I think it'll be more useful if I go on looking for a way to reproduce the crash more reliably. Perhaps I should start playing some GL based games. :-) Hope this helps (slim chance, I know) T.
If you think it's one of the screensavers but don't know which one, then open up a bash shell and do a for i in /usr/lib/xscreensaver/*; do echo $i; $i ; done hitting control-c to break out of the screensaver when you think it's been cleared of any wrongdoing. They'll default to rendering in a window rather than the root window, but perhaps that'll be good enough to hit the bug (it also lets you see the name of the screensaver it just ran). --
Ok, I think I can now reproduce it, and it does not, in fact, appear to be kernel related. Specifically, doing: % /usr/lib/xscreensaver/antspotlight & /usr/lib/xscreensaver/atlantis in a console window (ie. running *two* GL screensavers in parallel) reliably kills my X server on kernels 2.6.22.17, 2.6.24.4 and 2.6.25-rc9,= with a backtrace much like the ones I posted. So it seems to be an X serv= er problem that I just happened to encounter for the first time while trying= out 2.6.25-rc6. Thanks, Tilman
Good idea. I did one pass through the screensavers that way, letting each one run a couple of seconds until it seemed to become repetitive. None of this crashed my X server, which at least seems to tell me that it is not one particular screensaver always triggering the problem. But: twice during that test I was called away, left the thing running, and came back to find the X server had exited again, apparently after the "real" screensaver had kicked in. Twice in a day, that's much more than ever before. So you may have found me a way to trigger the crash with greater probability. Thanks, T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=C3=B6ffnet mindestens haltbar bis: (siehe R=C3=BCckseite)
Its one of the GL screensavers, not sure how to narrow it down, use ldd to figure out which ones are using GL. I've no idea how the kernel is influencing this. Dave. --
Hi, I'm running kernel 2.6.25 here on my AmigaOne G4 PowerPC machine and the Xorg server dies since version 2.6.25-rc8, if DRI is enabled. I'm using a Radeon9200 in PCIGART mode. The test was performed on a Debian Lenny system. Any ideas what could be wrong here? Thanks! regards, Gerhard BTW: Please put me on CC:! -- Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! http://games.entertainment.gmx.net/de/entertainment/games/free --
Sorry for hijacking this thread, but I have a similar problem with the Xorg X server. I'm using kernel 2.6.25-rc8 on my AmigaOne (PowerPC) and the X server dies right on startup, but only if DRI/DRM is enabled. I tested this with both Xorg 7.1 (Debian Etch version) and Xorg 7.2 (Lenny). The X server was working fine with kernel 2.6.25-rc6. I have to mention that the AmigaOne uses uncached memory for DMA operations. Benjamin Herrenschmidt fixed DRI recently for non cache coherent systems and his patches are working fine (I applied them to kernel 2.6.25-rc6). Please take a look at the attached Xorg.0.log file or at the excerpt below: (**) RADEON(0): Initializing backing store (==) RADEON(0): Backing store disabled (**) RADEON(0): DRI Finishing init ! (II) RADEON(0): X context handle = 0x1 Backtrace: 0: /usr/bin/X(xf86SigHandler+0x94) [0x100934c4] 1: [0x100374] 2: [0x102272a8] 3: /usr/lib/xorg/modules/extensions//libdri.so(DRIFinishScreenInit+0xb0) [0xf9948b0] 4: /usr/lib/xorg/modules/drivers//radeon_drv.so(RADEONDRIFinishScreenInit+0x68) [0xf8a699c] 5: /usr/lib/xorg/modules/drivers//radeon_drv.so(RADEONScreenInit+0xf74) [0xf895f78] 6: /usr/bin/X(AddScreen+0x21c) [0x1002d79c] 7: /usr/bin/X(InitOutput+0x284) [0x1006d364] 8: /usr/bin/X(main+0x294) [0x1002e004] 9: /lib/libc.so.6 [0xfc75b10] 10: /lib/libc.so.6 [0xfc75cd0] Fatal server error: Caught signal 11. Server aborting (**) RADEON(0): RADEONLeaveVT regards, Gerhard BTW: Please put me on CC:! -- Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! http://games.entertainment.gmx.net/de/entertainment/games/free
On Mon, Apr 7, 2008 at 12:34 AM, Gerhard Pircher This is not the same problem at all. So no thread hijacking for you. . Dave. --
Hm, so looks like I'm not seeing some ghosts on mine either. 2.6.25-rc8 xorg runs under etch, but enters a tight (no syscalls) loop woth 100% CPU load under lenny. A 2.6.24 based kernel works under both etch and lenny... Haven't had time to try to localize the problem. Glint xorg driver, DRM and AGP are off, CONFIG_FB_PM2=y. Thanks Guennadi --- Guennadi Liakhovetski --
Ok, here's more info. It DOES seem to be a regression. I spent half a day yesterday trying to trace this down, with no success. Multiple bisect attempts ended up with non-bootable kernels on the very first iteration. Today I just wanted to post all possible logs / configs to Bugzilla and - it "suddenly" worked... And the change that made it work was - I disabled CONFIG_FB_PM2 in the kernel... So, I think, this shall be qualified as a regression, although, I have no idea where. pm2fb.c itself hasn't (effectively for little endian) changed since 2.6.24. To make it visual etch lenny v2.6.24 works works v2.6.25 works fails /me thinks he should trade his ap400 to the Linux Kernel Testers Inc. for LOTS of money. Thanks Guennadi --- Guennadi Liakhovetski --
Strange! I guess I should recompile the kernel and test again. :( I also experienced a high load under Lenny, but IIRC this showed up under an older kernel (2.6.18), too (after X server is started, standard The X server only dies on my machine, if DRM is on. I read somewhere that the lastest DRM patches need a newer Xorg version (~7.3). Could that be the problem with kernel 2.6.25-rc8? Thanks! Gerhard -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger --
i have too this problem, the Xserver is dieing, but undependent from kernel version, i think it is a xserver problem Package: xserver-xorg-video-ati Priority: optional Section: x11 Installed-Size: 880 <> Architecture: i386 Version: 1:6.6.3-2 under etch and 2.6.22.22 and 2.6.24.y kernel is ok under lenny and 2.6.22.22 or 2.6.24.y or 2.6.25, in first start the -- Thanks, Oliver --
It looks like one of these patches fixes the crash of the X server on my machine (applied on v2.6.25 release): http://lkml.org/lkml/2008/4/19/244 http://lkml.org/lkml/2008/4/19/243 Also system load seems to be lower. The only problem is that the X server heavily crashes on shutdown (resp. when executing /etc/init.d/gdm stop) now. Even the console gets corrupted. I attached the kernel log file that shows the kernel oopses. regards, Gerhard -- GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen! Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx
... Mmm.. I've had one X-server death so far with 2.6.25-rc7. Didn't keep the logfile, though. -ml --
You can try to disable heap randomization... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Robin Lee Powell | NFS hang + umount -f: better behaviour requested. |
| S.Çağlar | Rescheduling interrupts |
| J.A. | Linux 2.6.27-git3: rtl8169 oops and wireless missing symbols |
git: | |
| A Large Angry SCM | Notes on Using Git with Subprojects |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Nguyen Thai Ngoc Duy | on subtree checkout |
| Manu | Re: fatal: unable to create '.git/index': File exists |
| mufurcz | Nvidia Quadro NVS 140M |
| Juan Miscaro | When will OpenBSD support UTF8? |
| Chris Cohen | Sendmail smarthost |
| Jeff Ross | U320 Drive on U160 controller? |
| Jeff Garzik | Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000. |
| Martin Willi | [RFC PATCH] xfrm: Accept XFRM_STATE_AF_UNSPEC SAs on IPv4/IPv6 only hosts |
| PJ Waskiewicz | [PATCH 3/3] ixgbe: Enable Data Center Bridging (DCB) support |
| David Miller | Re: [RFC,PATCH] loopback: calls netif_receive_skb() instead of netif_rx() |
