login
Header Space

 
 

[2.6.25-rc6] possible regression: X server dying

Previous thread: Please verify your subscription. by Matthew Hopkins on Monday, March 24, 2008 - 6:20 pm. (1 message)

Next thread: Re: + netdev-cassini-use-shorter-list_splice_init-macro-for-brevity.patch added to -mm tree by David Miller on Monday, March 24, 2008 - 6:56 pm. (7 messages)
To: <linux-kernel@...>
Date: Monday, March 24, 2008 - 6:38 pm

In the five days since I booted a 2.6.25-rc6 kernel on my development
machine, it has happened three times already that I came back to the
machine to find that the X server had exited, killing all applications
that still had a window open. This has never happened with previous
kernel versions (2.6.22.17, 2.6.24.3, 2.6.25-rc5), so I suspect some
relation to the latest rc kernel.

Syslog entries around the time of the crash (identical each time):

Mar 24 21:59:57 xenon kernel: [424469.161592] PM: Removing info for No
Bus:vcs7
Mar 24 21:59:57 xenon kernel: [424469.162591] PM: Removing info for No
Bus:vcsa7
Mar 24 21:59:57 xenon kernel: [424469.179556] uhci_hcd 0000:00:1d.1:
release dev 4 ep81-INT, period 8, phase 4, 93 us
Mar 24 21:59:57 xenon gconfd (ts-21705): Signal 15 erhalten,
ordungsgem=E4=DFes Herunterfahren
Mar 24 21:59:57 xenon gconfd (ts-21705): Beenden
Mar 24 21:59:57 xenon gdm[21341]: WARNING: gdm_slave_xioerror_handler:
Schwerwiegender X-Fehler - :0 wird neu gestartet
Mar 24 21:59:58 xenon kernel: [426716.212842] PM: Adding info for No
Bus:vcs7
Mar 24 21:59:58 xenon kernel: [426716.213300] PM: Adding info for No
Bus:vcsa7
Mar 24 21:59:58 xenon kernel: [426716.295997] PM: Removing info for No
Bus:vcs7
Mar 24 21:59:58 xenon kernel: [426716.296110] PM: Removing info for No
Bus:vcsa7
Mar 24 21:59:59 xenon kernel: [426716.434662] PM: Adding info for No
Bus:vcs7
Mar 24 21:59:59 xenon kernel: [426716.434742] PM: Adding info for No
Bus:vcsa7
Mar 24 22:00:00 xenon kernel: [426717.727427] uhci_hcd 0000:00:1d.1:
reserve dev 4 ep81-INT, period 8, phase 4, 93 us
Mar 24 22:00:00 xenon kernel: [424472.319521] uhci_hcd 0000:00:1d.1:
release dev 4 ep81-INT, period 8, phase 4, 93 us
Mar 24 22:00:00 xenon kernel: [426717.997926] uhci_hcd 0000:00:1d.1:
reserve dev 4 ep81-INT, period 8, phase 4, 93 us
Mar 24 22:00:02 xenon gconfd (root-15531): (Version 2.20.0) wird
gestartet, Prozesskennung 15531, Benutzer =BBroot=AB
Mar 24 22:00:02 xenon gconfd (root-15531): Die Adresse
=BBxml:reado...
To: Tilman Schmidt <tilman@...>
Cc: <linux-kernel@...>
Date: Monday, March 24, 2008 - 7:22 pm

anything in the X.org logfiles? does it happen with DRI off?

I don't think any of the DRM changes I made could cause this, so I'd
be looking towards something more generic..

--
To: Dave Airlie <airlied@...>
Cc: <linux-kernel@...>
Date: Monday, March 24, 2008 - 7:45 pm

Ah, of course. Sorry I forgot. Xorg.0.log.old ends with this:

| Backtrace:
| 0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81]
| 1: [0xb7fc9400]
| 2:
/usr/lib/xorg/modules//extensions/libGLcore.so(_swrast_Triangle+0x2d)
[0xb51ea5dd]
| 3: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb51f1075]
| 4: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb5206f2e]
| 5: /usr/lib/xorg/modules//extensions/libGLcore.so [0xb52080b8]
| 6:
/usr/lib/xorg/modules//extensions/libGLcore.so(_tnl_run_pipeline+0x153)
[0xb520cb83]
| 7:
/usr/lib/xorg/modules//extensions/libGLcore.so(_tnl_draw_prims+0x3f0)
[0xb51f8340]
| 8:
/usr/lib/xorg/modules//extensions/libGLcore.so(vbo_exec_vtx_flush+0x1fb)
[0xb52471eb]
| 9:
/usr/lib/xorg/modules//extensions/libGLcore.so(vbo_exec_FlushVertices+0x7=
8)
[0xb52488e8]
| 10: /usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_Flush+0x83)
[0xb5112663]
| 11: /usr/lib/xorg/modules//extensions/libglx.so [0xb7c6a107]
| 12: /usr/lib/xorg/modules//extensions/libglx.so [0xb7c4aa6d]
| 13: /usr/bin/X [0x815809e]
| 14: /usr/bin/X(Dispatch+0x1af) [0x808f68f]
| 15: /usr/bin/X(main+0x47e) [0x807717e]
| 16: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7d5dfe0]
| 17: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501]
|
| Fatal server error:
| Caught signal 11.  Server aborting

I have put the entire file at
http://gollum.phnxsoft.com/~ts/linux/Xorg.0.log.old

I guess it does, as my Xorg.0.log{,.old} says:

| (WW) MGA(0): Direct rendering disabled

HTH
Tilman
To: Tilman Schmidt <tilman@...>
Cc: <linux-kernel@...>
Date: Monday, March 24, 2008 - 8:53 pm

Looks like a GL screensaver kicks in and blows away your X server sw-GL..

not sure how a new kernel could cause this though..

memory layout changes maybe..

--
To: Tilman Schmidt <tilman@...>
Cc: <linux-kernel@...>
Date: Monday, March 31, 2008 - 12:53 am

any look finding what might have caused this?

it certainly looks like some heap or address space change affects Mesa.

Dave.
--
To: Dave Airlie <airlied@...>
Cc: <linux-kernel@...>
Date: Tuesday, April 1, 2008 - 5:02 am

Not really. It's obviously triggered by the screensaver. The machine
runs SuSE's default: gnome-screensaver with mode =3D random and a list
of 189 candidate screensavers, so I can't tell which one(s) of those
actually triggers the problem.

But perhaps that doesn't matter anyway. I recently collected another
X backtrace with kernel 2.6.25-rc7 which looks quite different in
the libglx section:

0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81]
1: [0xb7ef7400]
2: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7d8e8]
3: /usr/lib/xorg/modules//extensions/libglx.so(DoRender+0xdd) [0xb7b7645d=
]
4: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7657c]
5: /usr/lib/xorg/modules//extensions/libglx.so [0xb7b7aa6d]
6: /usr/bin/X [0x815809e]
7: /usr/bin/X(Dispatch+0x1af) [0x808f68f]
8: /usr/bin/X(main+0x47e) [0x807717e]
9: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7c8bfe0]
10: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501]

But I know next to nothing about GL, so I'm out of my depth there.
I'd welcome any suggestions on how to provoke the crash more
regularly so that I could attempt to bisect.

Thanks,
Tilman

--=20
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
To: Dave Airlie <airlied@...>
Cc: <linux-kernel@...>, <pavel@...>
Date: Monday, April 7, 2008 - 6:22 pm

This is still quite elusive. It only happens once every couple of days,
and the only thing that seems to be clear is that it's triggered by a GL
screensaver. So if it's ok I'll continue sharing X crash backtraces
while I go on groping for a lead. Here's my next one, this time with
kernel 2.6.25-rc8:

0: /usr/bin/X(xf86SigHandler+0x81) [0x80e6d81]
1: [0xb7f22400]
2:
/usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_set_enable+0x1343)
[0xb50aa673]
3: /usr/lib/xorg/modules//extensions/libGLcore.so(_mesa_Disable+0x5a)
[0xb50abc1a]
4: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba8b98]
5: /usr/lib/xorg/modules//extensions/libglx.so(DoRender+0xdd) [0xb7ba145d=
]
6: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba157c]
7: /usr/lib/xorg/modules//extensions/libglx.so [0xb7ba5a6d]
8: /usr/bin/X [0x815809e]
9: /usr/bin/X(Dispatch+0x1af) [0x808f68f]
10: /usr/bin/X(main+0x47e) [0x807717e]
11: /lib/libc.so.6(__libc_start_main+0xe0) [0xb7cb6fe0]
12: /usr/bin/X(FontFileCompleteXLFD+0x1e5) [0x8076501]

As for Pavel's suggestion of disabling heap randomization, given the
current crash rate I'd have to try that for several weeks before I could
be reasonably sure that it helped. I think it'll be more useful if I go
on looking for a way to reproduce the crash more reliably. Perhaps I
should start playing some GL based games. :-)

Hope this helps (slim chance, I know)
T.
To: Tilman Schmidt <tilman@...>
Cc: Dave Airlie <airlied@...>, <linux-kernel@...>, <pavel@...>
Date: Monday, April 7, 2008 - 6:55 pm

If you think it's one of the screensavers but don't know which one,
then open up a bash shell and do a

for i in /usr/lib/xscreensaver/*; do echo $i; $i ; done

hitting control-c to break out of the screensaver when you think it's
been cleared of any wrongdoing. They'll default to rendering in a
window rather than the root window, but perhaps that'll be good enough
to hit the bug (it also lets you see the name of the screensaver it
just ran).
--
To: Ray Lee <ray-lk@...>
Cc: Rafael J. Wysocki <rjw@...>, Dave Airlie <airlied@...>, <linux-kernel@...>, <pavel@...>
Date: Sunday, April 13, 2008 - 9:02 pm

Ok, I think I can now reproduce it, and it does not, in fact, appear
to be kernel related. Specifically, doing:

% /usr/lib/xscreensaver/antspotlight &amp; /usr/lib/xscreensaver/atlantis

in a console window (ie. running *two* GL screensavers in parallel)
reliably kills my X server on kernels 2.6.22.17, 2.6.24.4 and 2.6.25-rc9,=

with a backtrace much like the ones I posted. So it seems to be an X serv=
er
problem that I just happened to encounter for the first time while trying=

out 2.6.25-rc6.

Thanks,
Tilman
To: Ray Lee <ray-lk@...>
Cc: Dave Airlie <airlied@...>, <linux-kernel@...>, <pavel@...>
Date: Wednesday, April 9, 2008 - 7:57 am

Good idea. I did one pass through the screensavers that way, letting
each one run a couple of seconds until it seemed to become repetitive.
None of this crashed my X server, which at least seems to tell me that
it is not one particular screensaver always triggering the problem.

But: twice during that test I was called away, left the thing running,
and came back to find the X server had exited again, apparently after
the "real" screensaver had kicked in. Twice in a day, that's much more
than ever before. So you may have found me a way to trigger the crash
with greater probability.

Thanks,
T.

--=20
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=C3=B6ffnet mindestens haltbar bis: (siehe R=C3=BCckseite)
To: Tilman Schmidt <tilman@...>
Cc: <linux-kernel@...>
Date: Sunday, April 6, 2008 - 3:44 am

Its one of the GL screensavers, not sure how to narrow it down, use
ldd to figure out which ones are using GL.

I've no idea how the kernel is influencing this.

Dave.
--
To: <linux-kernel@...>
Date: Saturday, April 19, 2008 - 7:30 am

Hi,

I'm running kernel 2.6.25 here on my AmigaOne G4 PowerPC machine and the
Xorg server dies since version 2.6.25-rc8, if DRI is enabled. I'm using a
Radeon9200 in PCIGART mode. The test was performed on a Debian Lenny
system.



Any ideas what could be wrong here?

Thanks!

regards,

Gerhard

BTW: Please put me on CC:!
-- 
Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! 
http://games.entertainment.gmx.net/de/entertainment/games/free
--
To: Dave Airlie <airlied@...>
Cc: <linux-kernel@...>
Date: Sunday, April 6, 2008 - 10:34 am

Sorry for hijacking this thread, but I have a similar problem with the
Xorg X server. I'm using kernel 2.6.25-rc8 on my AmigaOne (PowerPC) and
the X server dies right on startup, but only if DRI/DRM is enabled.
I tested this with both Xorg 7.1 (Debian Etch version) and Xorg 7.2
(Lenny). The X server was working fine with kernel 2.6.25-rc6. I have
to mention that the AmigaOne uses uncached memory for DMA operations.
Benjamin Herrenschmidt fixed DRI recently for non cache coherent systems
and his patches are working fine (I applied them to kernel 2.6.25-rc6).
Please take a look at the attached Xorg.0.log file or at the excerpt
below:

(**) RADEON(0): Initializing backing store
(==) RADEON(0): Backing store disabled
(**) RADEON(0): DRI Finishing init !
(II) RADEON(0): X context handle = 0x1

Backtrace:
0: /usr/bin/X(xf86SigHandler+0x94) [0x100934c4]
1: [0x100374]
2: [0x102272a8]
3: /usr/lib/xorg/modules/extensions//libdri.so(DRIFinishScreenInit+0xb0) [0xf9948b0]
4: /usr/lib/xorg/modules/drivers//radeon_drv.so(RADEONDRIFinishScreenInit+0x68) [0xf8a699c]
5: /usr/lib/xorg/modules/drivers//radeon_drv.so(RADEONScreenInit+0xf74) [0xf895f78]
6: /usr/bin/X(AddScreen+0x21c) [0x1002d79c]
7: /usr/bin/X(InitOutput+0x284) [0x1006d364]
8: /usr/bin/X(main+0x294) [0x1002e004]
9: /lib/libc.so.6 [0xfc75b10]
10: /lib/libc.so.6 [0xfc75cd0]

Fatal server error:
Caught signal 11.  Server aborting

(**) RADEON(0): RADEONLeaveVT

regards,

Gerhard

BTW: Please put me on CC:!
-- 
Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! 
http://games.entertainment.gmx.net/de/entertainment/games/free
To: Gerhard Pircher <gerhard_pircher@...>
Cc: <linux-kernel@...>
Date: Monday, April 7, 2008 - 6:27 pm

On Mon, Apr 7, 2008 at 12:34 AM, Gerhard Pircher

This is not the same problem at all.

So no thread hijacking for you.
.
Dave.
--
To: Dave Airlie <airlied@...>
Cc: Gerhard Pircher <gerhard_pircher@...>, <linux-kernel@...>
Date: Tuesday, April 8, 2008 - 5:01 pm

Hm, so looks like I'm not seeing some ghosts on mine either. 2.6.25-rc8 
xorg runs under etch, but enters a tight (no syscalls) loop woth 100% CPU 
load under lenny. A 2.6.24 based kernel works under both etch and lenny... 
Haven't had time to try to localize the problem. Glint xorg driver, DRM 
and AGP are off, CONFIG_FB_PM2=y.

Thanks
Guennadi
---
Guennadi Liakhovetski
--
To: Dave Airlie <airlied@...>
Cc: Gerhard Pircher <gerhard_pircher@...>, <linux-kernel@...>, <adaplas@...>, <linux-fbdev-devel@...>
Date: Thursday, April 24, 2008 - 12:03 pm

Ok, here's more info. It DOES seem to be a regression. I spent half a day 
yesterday trying to trace this down, with no success. Multiple bisect 
attempts ended up with non-bootable kernels on the very first iteration. 
Today I just wanted to post all possible logs / configs to Bugzilla and - 
it "suddenly" worked... And the change that made it work was - I disabled 
CONFIG_FB_PM2 in the kernel...

So, I think, this shall be qualified as a regression, although, I have no 
idea where. pm2fb.c itself hasn't (effectively for little endian) changed 
since 2.6.24. To make it visual

		etch		lenny

v2.6.24		works		works
v2.6.25		works		fails

/me thinks he should trade his ap400 to the Linux Kernel Testers Inc. for 
LOTS of money.

Thanks
Guennadi
---
Guennadi Liakhovetski
--
To: Guennadi Liakhovetski <g.liakhovetski@...>, <airlied@...>
Cc: <linux-kernel@...>
Date: Wednesday, April 9, 2008 - 4:54 am

Strange! I guess I should recompile the kernel and test again. :(
I also experienced a high load under Lenny, but IIRC this showed up under
an older kernel (2.6.18), too (after X server is started, standard
The X server only dies on my machine, if DRM is on. I read somewhere that
the lastest DRM patches need a newer Xorg version (~7.3). Could that be
the problem with kernel 2.6.25-rc8?

Thanks!

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
--
To: Gerhard Pircher <gerhard_pircher@...>
Cc: Guennadi Liakhovetski <g.liakhovetski@...>, <airlied@...>, <linux-kernel@...>
Date: Thursday, April 24, 2008 - 12:40 pm

i have too this problem, the Xserver is dieing, but undependent from
kernel version, i think it is a xserver problem

Package: xserver-xorg-video-ati
Priority: optional
Section: x11
Installed-Size: 880
&lt;&gt;
Architecture: i386
Version: 1:6.6.3-2

under etch and 2.6.22.22 and 2.6.24.y kernel is ok
under lenny and 2.6.22.22 or 2.6.24.y or 2.6.25, in first start the


-- 
Thanks,
Oliver
--
To: Oliver Pinter <oliver.pntr@...>
Cc: <linux-kernel@...>, <airlied@...>, <g.liakhovetski@...>
Date: Thursday, April 24, 2008 - 4:37 pm

It looks like one of these patches fixes the crash of the X server on my
machine (applied on v2.6.25 release):
http://lkml.org/lkml/2008/4/19/244
http://lkml.org/lkml/2008/4/19/243

Also system load seems to be lower. The only problem is that the X server
heavily crashes on shutdown (resp. when executing /etc/init.d/gdm stop)
now. Even the console gets corrupted. I attached the kernel log file that
shows the kernel oopses.

regards,

Gerhard
-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx
To: Dave Airlie <airlied@...>
Cc: Tilman Schmidt <tilman@...>, <linux-kernel@...>
Date: Monday, March 31, 2008 - 1:45 pm

...

Mmm.. I've had one X-server death so far with 2.6.25-rc7.
Didn't keep the logfile, though.

-ml
--
To: Dave Airlie <airlied@...>
Cc: Tilman Schmidt <tilman@...>, <linux-kernel@...>
Date: Wednesday, March 26, 2008 - 2:52 pm

You can try to disable heap randomization...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Previous thread: Please verify your subscription. by Matthew Hopkins on Monday, March 24, 2008 - 6:20 pm. (1 message)

Next thread: Re: + netdev-cassini-use-shorter-list_splice_init-macro-for-brevity.patch added to -mm tree by David Miller on Monday, March 24, 2008 - 6:56 pm. (7 messages)
speck-geostationary