Re: [Regression: 2.6.25-rc5: Blank Screen: Intel 945]

Previous thread: Bug?Devices:Network:Wi-Fi:Broadcom,Bug?Devices:Audio:Conexant by Denis Lotarev on Thursday, March 13, 2008 - 9:54 am. (1 message)

Next thread: [GIT PATCH] PCI fix for 2.6.25-rc5 git tree by Greg KH on Thursday, March 13, 2008 - 10:24 am. (2 messages)
From: Jesse Barnes
Date: Thursday, March 13, 2008 - 10:09 am

Well, the blank screen is probably due to a pipe programming timing problem.  
IGD devices have been getting less & less sensitive to it over time (830s 
would hang if you looked at them the wrong way) but it's still not that hard 
to mis-program them.

Can you narrow down the problem at all?  Did it occur just after a kernel 
upgrade?  Or did you also upgrade your X driver?  There haven't been any real 
changes in intelfb since Oct. when Krzysztof added interlaced mode support, 

The framebuffer drivers provide a simple interface to applications wanting to 
draw to the screen.  It's usually used in embedded devices and for boot 
splash screens.  Generally you don't want to run X through the framebuffer 

I think the only thing you'll lose is the boot splash screen.  But you could 
also try using vesafb.  It works with more hardware than intelfb (including 
laptops) and we've fixed some vesafb related bugs in the X driver recently, 
so it may work better for you.

Jesse
--

From: Justin Madru
Date: Thursday, March 13, 2008 - 9:22 pm

Not that I can think of. Other than a custom kernel I use standard 
Ubuntu 7.10. I was testing out the kernel starting with 2.6.23-rc4 and 
I've tested each rc, recompiling when a new one came out (except for rc1 
and rc2, they didn't compile). So, when 2.6.25-rc3 came out I recompiled 
and came across this error.

I've removed all fb devices and the blank screen _almost_ never happens 
(also, I _don't_ have to disable the splash screen). There is one 
_big_exception_ the problem still happens when I'm at my University (and 
_very_ consistently). There, I have to disable the splash screen, and 
that ~mostly~ fixes the problem.

I can't believe it could be related, but when I boot the computer at the 
Univ. I get a flood of kernel messages  related to the wireless all the 
time, but still connect:

 > kernel: wlan0_rename: RX too short data frame payload

_Could_ a wireless problem be related to a graphics's problem? I 
wouldn't think so? But, it would explain why I get the blank screen 
mostly when I get the wireless problem. But, then again it could be 
Well, if the fb is not necessary, then I think I'll just leave it out. 
Unless you think using the vesafb would help. But, I thought that the X 
server doesn't interact with the fb driver.

Maybe it's not related to the fb but the other intel driver:

CONFIG_AGP=y
CONFIG_AGP_INTEL=y
CONFIG_DRM=y
CONFIG_DRM_I915=m


I've updated my config if you need to see the newest version:
http://jdserver.homelinux.org/linux/config-2.6.25-rc5

Justin

--

From: Jesse Barnes
Date: Friday, March 14, 2008 - 10:29 am

It would be interesting if you could get register dumps at a couple of 
different points, using the intel_reg_dumper tool in 
git://git.freedesktop.org/git/xorg/driver/xf86-video-intel (in 
src/reg_dumper).  You'll probably have to modify your boot scripts though.  
It would be good to get them:
  - at startup time before the splash screen
  - sometime while the splash screen is running
  - after X starts and you see the blank screen

Since I think this is actually an X driver bug, can you file a bug at 


Well, it's possible that the wireless issue is affecting the pipe programming 
timing enough to expose the bug (stuff like this is usually a problem with 
either not waiting for a register program to take effect and/or programming 

Yeah, looks like it's off.  Ubuntu may be falling back to using the X vesa 
driver or something though...

Jesse
--

From: Justin Madru
Date: Friday, March 14, 2008 - 8:48 pm

I couldn't get it to compile. What am I suppose to do? I ran autogen.sh 
in the top level dir and it gave the error message:

./configure: line 20486: syntax error near unexpected token `XINERMA,'
./configure: line 20486: `XORG_DRIVER_CHECK_EXT(XINERMA, xineramaproto)'

Then I also tried to run make and make -f Makefile.in (in the reg_dumper 
dir and top level dir), and it said:

Makefile.in:15 *** missing separator. Stop.

That part of Makefile.in contains:
@SET_MAKE@
VPATH = @srcdir@

What am I doing wrong? You'll have to forgive me, I'm not that 
How exactly would I do that? I would think I could just add command at 
the end of 3 files in /etc/init.d. Would that work? It should run my 
command when it starts that script and gets to the end. Or do boot 
scripts exit in the middle of the script, which would prevent my command 
I'm not sure. I just have a home wireless router (Hawking HWR54G). I 
have no idea what the Uni. router is.
On both, I connect to an unencrypted network, but at the Uni. it 
redirects to a login page where I have to log on before I can actually 
use the Internet. Is there anyway I could get more information about the 
Uni. router?

Justin
--

From: Jesse Barnes
Date: Saturday, March 15, 2008 - 10:35 am

[Adding Bryce to cc list, he may have a copy of intel_reg_dumper already built 
for Ubuntu.  Bryce, please see below for a few more questions, thanks.]


This usually means you don't have the xorg autoconf macros installed.  Your 

No problem, building X still isn't quite as easy as it should be, but it's 
only slightly more complicated than the typical './configure;make;make 
install' due to the dependencies between packages.  You can check out the X 

Hm, I haven't looked at the Ubuntu scripts before.  I know they're using 
upstart, but if they haven't divered too much from the old style init, you 
may be able to modify rc.sysinit to get the boot time register dump.

For the splash screen dump, you'd just have to add the intel_reg_dumper 
command to one of the other init scripts that runs while the splash screen is 
up (maybe the HAL daemon script or something).

Once your screen is blank, it's probably easiest to ssh into your machine and 
capture the dump that way.


Probably, though the messages in your log from your Univ. connection may be 
enough for the networking guys to figure things out.  It's probably best to 
track the wireless thing as a separate issue though, I'd recommend mailing 
linux-wireless@vger.kernel.org with the problem.

Thanks,
Jesse


--

From: Bryce Harrington
Date: Saturday, March 15, 2008 - 11:16 am

The following command will pull in all the dependencies you need for
building -intel:


If you're having troubles while the splash screen is displayed, you may
want to look at / fiddle with the settings in /etc/gdm/gdm.conf.

Also have a look in /etc/gdm/, where the Init, PreSession, and
PostSession scripts are located.  Those are additional places where
scripts can be tied in during this early startup phase.  (I don't know
if that would actually be of use here though.)

Bryce
--

From: Justin Madru
Date: Sunday, March 16, 2008 - 6:28 pm

I'm just compiling the intel_reg_dumper right? I don't have to recompile 
all of X to use it?


Ok, got that. I tried to compile and I found out I also had to install 
libpciaccess-dev, that's probably because I'm using the git tree and not 
the one that Ubuntu 7.10 uses. But I still can't compile the 
intel_reg_dumper I get the following error:

intel-driver/src/reg_dumper$ make
gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wpointer-arith 
-Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations 
-Wnested-externs -fno-strict-aliasing -I./.. -DREG_DUMPER -g -O2 -MT 
main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
main.c: In function ‘main’:
main.c:72: warning: implicit declaration of function ‘pci_device_map_range’
main.c:72: warning: nested extern declaration of ‘pci_device_map_range’
main.c:75: error: ‘PCI_DEV_MAP_FLAG_WRITABLE’ undeclared (first use in 
this function)
main.c:75: error: (Each undeclared identifier is reported only once
main.c:75: error: for each function it appears in.)
make: *** [main.o] Error 1

Justin
--

From: Bryce Harrington
Date: Tuesday, March 18, 2008 - 12:07 pm

<jbarnes> bryce: would it be possible for you to include
  intel_reg_dumper in your intel driver pkg?

jesse - I looked into this, and in fact debian has already enabled this
in the 2.2.1 driver, which we are carrying in Hardy.  Timo had to
disable it though since we currently carry libpciaccess in universe, not
main.

However, I note that even after installing libpciaccess, it still fails

Yes, the same error occurs with the 2.2.1 driver we have in Hardy right now:

bryce@chideok:~/src/xserver-xorg-video-intel/xserver-xorg-video-intel-2.2.1-build/src/reg_dumper$ make
gcc -DHAVE_CONFIG_H -I. -I../..     -Wall -Wpointer-arith -Wstrict-prototypes   -Wmissing-prototypes -Wmissing-declarations     -Wnested-externs -fno-strict-aliasing -I./.. -DREG_DUMPER -g -O2 -MT intel_reg_dumper-main.o -MD -MP -MF .deps/intel_reg_dumper-main.Tpo -c -o intel_reg_dumper-main.o `test -f 'main.c' || echo './'`main.c
main.c: In function 'main':
main.c:72: warning: implicit declaration of function 'pci_device_map_range'
main.c:72: warning: nested extern declaration of 'pci_device_map_range'
main.c:75: error: 'PCI_DEV_MAP_FLAG_WRITABLE' undeclared (first use in this function)
main.c:75: error: (Each undeclared identifier is reported only once
main.c:75: error: for each function it appears in.)
make: *** [intel_reg_dumper-main.o] Error 1

I could get it to build by ifdefing out th pci_device_map_range call,
and then adding -lpciaccess to the linker, however it segfaults without
that call.

Does LIBPCIACCESS require xserver 1.4?  (We have 1.3 in Hardy).

Bryce
--

From: Jesse Barnes
Date: Tuesday, March 18, 2008 - 1:00 pm

No, libpciaccess should be standalone, but the Intel driver requires 0.10 or 
better... Which version are you building against?

--

From: Bryce Harrington
Date: Tuesday, March 18, 2008 - 2:14 pm

I've put in a sync request for this to get updated for hardy.  (It still
will only be in universe, so won't be built automatically with -intel,
but at least users will be able to more easily compile it by hand.)

Meanwhile, here is a Ubuntu Hardy build of Debian's package:

  http://people.ubuntu.com/~bryce/Testing/libpciaccess/libpciaccess-dev_0.10-1_i386.deb
  http://people.ubuntu.com/~bryce/Testing/libpciaccess/libpciaccess0_0.10-1_i386.deb

Bryce

--

From: Jesse Barnes
Date: Tuesday, March 18, 2008 - 2:17 pm

Thanks a lot Bryce, hopefully this will help Justin build the register dumper 
so we can track down his problem...

Jesse
--

From: Justin Madru
Date: Wednesday, March 19, 2008 - 4:38 pm

Thanks for all your help so far! Well I tried to install the deb files 
that Bryce created, unfortunately they're dependent on a newer version 
of libc6. I think (_hope_) this is the one in Hardy _because_ that's 
what I'm doing right now; I've decided to upgrade to Hardy. (so sad 
because I have to download _1.5GB_)

Jesse, do you still think it's a bug in the X server driver instead of 
the kernel driver (dri I guess)? Because I can trigger the blank screen 
before the X server even starts. If I press ctr+alt+f1 at the _right_ 
time when the kernel is booting (or I guess when the boot splash is 
showing) it will instantly blank out. That doesn't seem like something 
related to X.

Anyways maybe it's multiple things, and by upgrading to Hardy it might 
fix it (which would prove that it's not a kernel bug, but some software 
version dependency bug - which would still be good to know).

Justin
--

From: Jesse Barnes
Date: Wednesday, March 19, 2008 - 4:48 pm

Depends on the boot splash program.  I think in some configurations it'll be X 

Good luck with the upgrade...

Jesse
--

From: Bryce Harrington
Date: Wednesday, March 19, 2008 - 7:00 pm

Mmm, well indeed that may be the kernel framebuffer.

Justin, one thing you could try is to set your system to boot up without
starting gdm.  Either do this via /etc/X11/default-display-manager
(comment it out, or set to xdm or something) or bypass it entirely by 
mv /etc/rc3.d/{S30,K30}gdm, and then append '3' to the end of the boot line
in grub.  (Maybe there's a better way to achieve this with upstart, but
my upstart-fu is limited.)

Bryce
--

From: Justin Madru
Date: Monday, March 24, 2008 - 8:07 pm

Well, the upgrade went ok, and compiling the reg_dumper using the libpciaccess .deb from Bryce worked.
Then I tried to add to the boot scripts a call to reg_dumper...
...To make a long story short...
I somehow killed my boot scripts! Anyways, I did a fresh reinstall of Ubuntu 8.4 Beta.

I'm still getting the blank screen problem with the 2.6.25-rc6 kernel,
so I guess it wasn't a Ubuntu software problem (or I hope not, because that could be really hard to find).

What I did was created a script that took a reg_dump every 6 secs for 1 min.
I made that as rc2.d/S01regdump so it should've been the very first thing called.
So, I hope there's enough "data points" to see what's happening.

Reg Dump Information
	http://jdserver.homelinux.org/linux/reg_dump.tar.bz2
Detailed System Information
	http://jdserver.homelinux.org/linux/sysinfo-2.6.25-rc6
Kernel Config
	http://jdserver.homelinux.org/linux/config-2.6.25-rc6

Hope that can help find the problem. If you need me to test anything else I'll try.

Justin

--

From: Jesse Barnes
Date: Tuesday, March 25, 2008 - 1:08 pm

Wow, that's a lot of dump files. :)

I was worried that in the "blank" case we may see the same register dump
as in the working case, but thankfully they're different.  In fact, in all
the dumps after 0 in the 2.6.25-blank case, both pipes are disabled and
the LCD itself is disabled.

The important bits:

@@ -24,7 +24,7 @@
 (II):          DVOB_SRCDIM: 0x00000000
 (II):          DVOC_SRCDIM: 0x00000000
 (II):           PP_CONTROL: 0x00000001 (power target: on)
-(II):            PP_STATUS: 0xc0000008 (on, ready, sequencing idle)
+(II):            PP_STATUS: 0x0000000a (off, not ready, sequencing idle)
 (II):         PFIT_CONTROL: 0x80002668
 (II):      PFIT_PGM_RATIOS: 0x00000000
 (II):      PORT_HOTPLUG_EN: 0x00000020
@@ -36,7 +36,7 @@
 (II):             DSPABASE: 0x00000000
 (II):             DSPASURF: 0x00000000
 (II):          DSPATILEOFF: 0x00000000
-(II):            PIPEACONF: 0x00000000 (disabled, single-wide)
+(II):            PIPEACONF: 0x000c0000 (disabled, single-wide)
 (II):             PIPEASRC: 0x027f01df (640, 480)
 (II):            PIPEASTAT: 0x80000203 (status: FIFO_UNDERRUN VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
 (II):         FBC_CFB_BASE: 0x00000000
@@ -59,16 +59,16 @@
 (II):              VSYNC_A: 0x01eb01e9 (490 start, 492 end)
 (II):            BCLRPAT_A: 0x00000000
 (II):         VSYNCSHIFT_A: 0x00000000
-(II):             DSPBCNTR: 0x95000000 (enabled, pipe B)
+(II):             DSPBCNTR: 0x15000000 (disabled, pipe B)
 (II):           DSPBSTRIDE: 0x00000500 (1280 bytes)
 (II):              DSPBPOS: 0x00000000 (0, 0)
 (II):             DSPBSIZE: 0x01df027f (640, 480)
 (II):             DSPBBASE: 0x00000000
 (II):             DSPBSURF: 0x00000000
 (II):          DSPBTILEOFF: 0x00000000
-(II):            PIPEBCONF: 0x80000000 (enabled, single-wide)
+(II):            PIPEBCONF: 0x000c0000 (disabled, single-wide)
 (II):             PIPEBSRC: 0x027f01df (640, 480)
-(II):            PIPEBSTAT: 0x00000202 (status: VSYNC_INT_STATUS ...
From: Justin Madru
Date: Wednesday, March 26, 2008 - 11:32 am

Ok, I have the X logs:
http://jdserver.homelinux.org/linux/Xorg.0.log-blank
http://jdserver.homelinux.org/linux/Xorg.0.log-good

Below is just a portion of the diff of those files.

--- Xorg.0.log-blank    2008-03-26 11:14:12.000000000 -0700
+++ Xorg.0.log-good    2008-03-26 11:14:35.000000000 -0700
@@ -20,7 +20,7 @@
 Markers: (--) probed, (**) from config file, (==) default setting,
     (++) from command line, (!!) notice, (II) informational,
     (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-(==) Log file: "/var/log/Xorg.0.log", Time: Wed Mar 26 09:59:56 2008
+(==) Log file: "/var/log/Xorg.0.log", Time: Wed Mar 26 10:02:12 2008
 (==) Using config file: "/etc/X11/xorg.conf"
 (==) ServerLayout "Default Layout"
 (**) |-->Screen "Default Screen" (0)
@@ -470,9 +470,9 @@
 (WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 
0xd0000009
 (WW) intel(0): PP_STATUS before: on, ready, sequencing idle
 (WW) intel(0): PP_STATUS after: on, ready, sequencing on
-(WW) intel(0): Register 0x71024 (PIPEBSTAT) changed from 0x00000202 to 
0x00000242
-(WW) intel(0): PIPEBSTAT before: status: VSYNC_INT_STATUS VBLANK_INT_STATUS
-(WW) intel(0): PIPEBSTAT after: status: VSYNC_INT_STATUS 
LBLC_EVENT_STATUS VBLANK_INT_STATUS
+(WW) intel(0): Register 0x71024 (PIPEBSTAT) changed from 0x80000202 to 
0x80000242
+(WW) intel(0): PIPEBSTAT before: status: FIFO_UNDERRUN VSYNC_INT_STATUS 
VBLANK_INT_STATUS
+(WW) intel(0): PIPEBSTAT after: status: FIFO_UNDERRUN VSYNC_INT_STATUS 
LBLC_EVENT_STATUS VBLANK_INT_STATUS
 (WW) intel(0): Register 0x68000 (TV_CTL) changed from 0x10000000 to 
0x000c0000
 (WW) intel(0): Register 0x68010 (TV_CSC_Y) changed from 0x00000000 to 
0x0332012d
 (WW) intel(0): Register 0x68014 (TV_CSC_Y2) changed from 0x00000000 to 
0x07d30104
@@ -735,11 +735,73 @@
 (II) intel(0): fbc disabled on plane a
 (II) intel(0): fbc disabled on plane a
 (II) intel(0): fbc disabled on plane a
-(II) intel(0): xf86UnbindGARTMemory: unbind key 0
-(II) intel(0): ...
From: Justin Madru
Date: Monday, March 31, 2008 - 12:24 pm

Well, I disabled gdm and tryed to trigger the blank screen. I did about ~16 reboots and it blanked out on me only 2 times.
I was using a script to reboot and/or startx. One time I'm not exactly sure if X was started, so it might have blanked out on a "startx".
The other time I'm fairly sure X wasn't started, so it blanked out on a terminal login.

Both of the blank outs were different than the ones with gdm started. Pressing ctrl+alt+f# changed nothing on the screen;
the screen seemed almost completely off (no or little backlight). A few seconds after pressing the power button the shutdown splash screen would show, but this time it was _very_ faint.

Usually, when gdm is enabled, pressing ctrl+alt+f# would "refresh" (or mode/resolution change) the screen, but it would still be blank.
Also the backlight still seamed to be on and at full brightness (although, still displaying black).

Well, I don't know what to say, it's the strangest of problems.

Justin

--

From: Jesse Barnes
Date: Tuesday, April 1, 2008 - 1:22 pm

Yeah, seems pretty weird.  Given that you see it w/o the fb stuff loaded as 
well and we still have a few open bugs against the intel X driver regarding 
VT switch & mode programming, I don't think this is a real kernel regression.  
It's more likely that some timing or memory layout changed subtly and is 
causing to to hit one of our existing bugs more frequently that you did 
before.  Can you file a bug against the intel X driver at 
bugs.freedesktop.org so we can track it there?  Unless we can find a way to 
reproduce it reliably it'll probably take a long time to fix, but we don't 
want to lose it either...

Thanks,
Jesse
--

From: Justin Madru
Date: Tuesday, April 8, 2008 - 9:56 pm

Well, I'll file a bug on bugs.freedesktop.org, but if you don't think 
it's a kernel regression then I'll wait until the final release of 
2.6.25 comes out (unless you _really_ need me to file it sooner).

I still think it's somehow related to something that changed in the 
kernel from v24 to v25 because I've never had it happen with a kernel 
version less that 2.6.25. You say it's a timing issue; I've searched and 
found two things that have changed in v25: Preemptive RCU and I/O Port 
Delay.

I've enabled both preemptive RCU and no I/O port delay. I've recompiled 
with both disabled and found that the blank screen _still_ happens. So, 
I'm figuring that _maybe_ by adding these options the kernel developers 
needed to change something that exposes something related to the intel 
X.org driver that's no longer necessarily true
(or something like that - do you get what I'm trying to say).



Or is there another X.org intel driver? And if so how are they 
(agp/drm/X.org) related?

Justin
--

From: Jesse Barnes
Date: Wednesday, April 9, 2008 - 8:24 am

Well, given what we've tested so far, it really doesn't seem like a 
framebuffer layer regression nor a DRM regression... I suppose you could try 
bisecting, but given that the problem doesn't happen everytime that might 

You could try using the X vesa driver instead of the intel driver...

Jesse
--

From: Justin Madru
Date: Sunday, April 13, 2008 - 8:46 pm

Ok, so I tried with the vesa X.org driver and it never had the blank screen problem (so far).
There were some screen distortions on mode change but no blank screen.
I noticed that when using the vesa X.org driver the i915 kernel module didn't load.

I think we're getting closer! It seems like it's not a kernel regression but a bug in the intel Xorg driver.
Or maybe an interaction between the i915 DRM kernel module (>=2.6.25) and the intel X.org driver,
because it only happens with the 2.6.25 kernel, and I'd think if it was only an X driver bug,
then it should've happened on other kernel versions.

Justin

--

From: Bryce Harrington
Date: Tuesday, March 18, 2008 - 1:53 pm

Looks like we still are on 0.8.  I can check into upping us to 0.10 this
afternoon or tomorrow - even though it's still in universe and we can't
include it yet by default for -intel, that'll make it simpler for users
that need to build it.  Stay tuned.

Bryce

--

Previous thread: Bug?Devices:Network:Wi-Fi:Broadcom,Bug?Devices:Audio:Conexant by Denis Lotarev on Thursday, March 13, 2008 - 9:54 am. (1 message)

Next thread: [GIT PATCH] PCI fix for 2.6.25-rc5 git tree by Greg KH on Thursday, March 13, 2008 - 10:24 am. (2 messages)