[Bug #11507] usb: sometimes dead keyboard after boot

Previous thread: Re: Honoring SO_RCVLOWAT in proto_ops.poll methods by Alan Cox on Sunday, September 21, 2008 - 1:13 pm. (15 messages)

Next thread: Re: [Bug #11555] rmmod ide-cd_mod: tried to init an initialized by Mariusz Kozlowski on Sunday, September 21, 2008 - 1:43 pm. (1 message)
From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:52 am

This message contains a list of some regressions from 2.6.26, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.26, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-09-21      169       45          36


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11611
Subject		: Commit 2344abbcbdb82140050e8be29d3d55e4f6fe860b breaks resume on nx6325
Submitter	: Rafael J. Wysocki <rjw@sisk.pl>
Date		: 2008-09-20 23:24 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=122195277606974&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11610
Subject		: Problem with kernel commit 664d080c41463570b95717b5ad86e79dc1be0877
Submitter	: Michal 'vorner' Vaner <vorner@ucw.cz>
Date		: 2008-09-21 17:35 (1 days old)
References	: http://marc.info/?l=linux-acpi&m=122201853409501&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11609
Subject		: oops in find_get_page
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-09-20 14:53 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=122192251101892&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11608
Subject		: 2.6.27-rc6 BUG: unable to handle kernel paging request
Submitter	: John Daiker <daikerjohn@gmail.com>
Date		: 2008-09-16 23:00 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=122160611517267&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11607
Subject		: 2.6.27-rc6 =C2=A0Bug in ...
From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:52 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11207
Subject		: VolanoMark regression with 2.6.27-rc1
Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date		: 2008-07-31 3:20 (53 days old)
References	: http://marc.info/?l=linux-kernel&m=121747464114335&w=4
Handled-By	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
		  Peter Zijlstra <a.p.zijlstra@chello.nl>
		  Dhaval Giani <dhaval@linux.vnet.ibm.com>
		  Miao Xie <miaox@cn.fujitsu.com>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11230
Subject		: Kconfig no longer outputs a .config with freshly updated defconfigs
Submitter	: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Date		: 2008-08-02 16:03 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=121769306319391&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11220
Subject		: Screen stays black after resume
Submitter	: Nico Schottelius <nico@schottelius.org>
Date		: 2008-07-31 21:05 (53 days old)
References	: http://marc.info/?l=linux-kernel&m=121753882422899&w=4


--

From: Pavel Machek
Date: Tuesday, September 30, 2008 - 3:25 pm

This is actually three problems in one :-(.

If you try to suspend with minimum config, will resume still take 30
seconds?

Is the problem still there in 2.6.27-rc7?

Is there chance to bisect it?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11215
Subject		: INFO: possible recursive locking detected ps2_command
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-07-31 9:41 (53 days old)
References	: http://marc.info/?l=linux-kernel&m=121749737011637&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11224
Subject		: Only three cores found on quad-core machine.
Submitter	: Dave Jones <davej@redhat.com>
Date		: 2008-08-01 18:15 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=121761475224719&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11271
Subject		: BUG: fealnx in 2.6.27-rc1
Submitter	: Jaswinder Singh <jaswinderlinux@gmail.com>
Date		: 2008-08-05 14:58 (48 days old)
References	: http://marc.info/?l=linux-netdev&m=121794762016830&w=4
		  http://lkml.org/lkml/2008/8/10/98
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11237
Subject		: corrupt PMD after resume
Submitter	: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Date		: 2008-08-02 9:51 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By	: Hugh Dickins <hugh@veritas.com>
		  Jeremy Fitzhardinge <jeremy@goop.org>
Patch		: http://marc.info/?l=linux-kernel&m=122001615314700&w=2


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11210
Subject		: libata badness
Submitter	: Kumar Gala <galak@kernel.crashing.org>
Date		: 2008-07-31 18:53 (53 days old)
References	: http://marc.info/?l=linux-ide&m=121753059307310&w=4
Handled-By	: Kumar Gala <galak@kernel.crashing.org>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11264
Subject		: Invalid op opcode in kernel/workqueue
Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date		: 2008-08-07 04:18 (46 days old)


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11272
Subject		: BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
Submitter	: Jaswinder Singh <jaswinderlinux@gmail.com>
Date		: 2008-08-05 15:12 (48 days old)
References	: http://marc.info/?l=linux-kernel&m=121794900319776&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11340
Subject		: LTP overnight run resulted in unusable box
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-08-13 9:24 (40 days old)
References	: http://marc.info/?l=linux-kernel&m=121861951902949&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11357
Subject		: Can not boot up with zd1211rw USB-Wlan Stick
Submitter	: uwe <kender@freenet.de>
Date		: 2008-08-16 14:17 (37 days old)


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11335
Subject		: 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2008-08-12 4:18 (41 days old)
References	: http://marc.info/?l=linux-kernel&m=121851477201960&w=4
		  http://lkml.org/lkml/2008/8/16/274
Handled-By	: Hugh Dickins <hugh@veritas.com>


--

From: Randy Dunlap
Date: Sunday, September 21, 2008 - 4:49 pm

-- 
~Randy
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11404
Subject		: BUG: in 2.6.23-rc3-git7 in do_cciss_intr
Submitter	: rdunlap <randy.dunlap@oracle.com>
Date		: 2008-08-21 5:52 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121929819616273&w=4
		  http://marc.info/?l=linux-kernel&m=121932889105368&w=4
Handled-By	: Miller, Mike (OS Dev) <Mike.Miller@hp.com>
		  James Bottomley <James.Bottomley@hansenpartnership.com>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -> 2.6.28
Submitter	: Christoph Lameter <cl@linux-foundation.org>
Date		: 2008-08-11 18:36 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11380
Subject		: lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
Submitter	: Ingo Molnar <mingo@elte.hu>
Date		: 2008-08-20 6:44 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=121921480931970&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11382
Subject		: e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
Submitter	: David Vrabel <david.vrabel@csr.com>
Date		: 2008-08-08 10:47 (45 days old)
References	: http://marc.info/?l=linux-kernel&m=121819267211679&w=4
Handled-By	: Christopher Li <chrisl@vmware.com>


--

From: David Miller
Date: Sunday, September 21, 2008 - 4:51 pm

From: "Rafael J. Wysocki" <rjw@sisk.pl>

Fixed by:

commit 78566fecbb12a7616ae9a88b2ffbc8062c4a89e3
Author: Christopher Li <chrisl@vmware.com>
Date:   Fri Sep 5 14:04:05 2008 -0700

    e1000: prevent corruption of EEPROM/NVM
    
    Andrey reports e1000 corruption, and that a patch in vmware's ESX fixed
    it.
    
    The EEPROM corruption is triggered by concurrent access of the EEPROM
    read/write. Putting a lock around it solve the problem.
    
    [akpm@linux-foundation.org: use DEFINE_SPINLOCK to avoid confusing lockdep]
    Signed-off-by: Christopher Li <chrisl@vmware.com>
    Reported-by: Andrey Borzenkov <arvidjaar@mail.ru>
    Cc: Zach Amsden <zach@vmware.com>
    Cc: Pratap Subrahmanyam <pratap@vmware.com>
    Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Cc: Bruce Allan <bruce.w.allan@intel.com>
    Cc: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
    Cc: John Ronciak <john.ronciak@intel.com>
    Cc: Jeff Garzik <jeff@garzik.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
--

From: Dave Airlie
Date: Sunday, September 21, 2008 - 11:59 pm

Just noticed I replied to davem and not to everyone.. so I did some
further hunting.

Okay so e1000e seems to have a problem in this area, that this *DOESN'T* fix.

I've reconstructed my boot timeline from message logs

Sep 3rd, I booted rawhide kernel 2.6.27-0.290.rc5.fc10.i686

I suspended/resume a few times in between with no issues.

Sep 8th I booted my own 2.6.27-rc5 kernel based from
ec0c15afb41fd9ad45b53468b60db50170e22346

This got a corrupted e1000e checksum and every kernel since has.

Dave.
--

From: David Miller
Date: Monday, September 22, 2008 - 12:01 am

From: "Dave Airlie" <airlied@gmail.com>


Ok.
--

From: Jiri Kosina
Date: Monday, September 22, 2008 - 3:15 pm

Have you restored the EEPROM contents after it got corrupted for the first 
time?

Once the EEPROM contents get corrupted, the card will then be broken
forever even on kernel that gets this fixed one day. 

This is pretty serious bug in fact, as it renders hardware of poor users 
unusable, and just patching kernel is then not enough to put things back 
to shape.

-- 
Jiri Kosina
SUSE Labs

--

From: David Miller
Date: Monday, September 22, 2008 - 3:28 pm

From: Jiri Kosina <jkosina@suse.cz>

The top priority is to root cause this, so that we can stop the
problem from happening as fast as possible, and I'm still waiting for
the SHA1 ID that was used for the last kernel Dave booted before the
problem occurred which is pretty damn critical for making forward
progress here.

It could even be some PCI or x86 layer change that caused the corruption,
we don't even know yet.
--

From: Dave Airlie
Date: Monday, September 22, 2008 - 6:26 pm

It was exactly 2.6.27-rc5 + Fedora at the time but we rarely touch
these areas, most of the extra code is in other places, and since
people are seeing it on !Fedora
also I would assume it wasn't these.

I think people have seen it on earlier kernels maybe but not sure.

really Intel needs to get a fix of some sort out so we can repair the
hw so we can root cause the probem.

--

From: David Miller
Date: Monday, September 22, 2008 - 6:59 pm

From: "Dave Airlie" <airlied@gmail.com>

So I went through the changes from 2.6.27-rc5 until the SHA1
ID ec0c15afb41fd9ad45b53468b60db50170e22346 and there were
definitely no E1000 or E1000E changes during that time.

Included in there is the HPET revert and other similarly themed
changes.

commit b4609472116bb806a95e98d04767189406c74c70
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Aug 29 14:38:03 2008 -0700

    Revert "x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3"
    
    This reverts commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd.

Some power management related changes stand out slightly:

commit 9d3593574702ae1899e23a1535da1ac71f928042
Author: John Kacur <jkacur@gmail.com>
Date:   Tue Sep 2 14:36:13 2008 -0700

    pm_qos_requirement might sleep

and

commit 74c4633da7994eddcfcd2762a448c6889cc2b5bd
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Tue Sep 2 14:36:11 2008 -0700

    rtc-cmos: wake again from S5

The rest of the changes in that range look completely benign.
--

From: Jiri Kosina
Date: Tuesday, September 23, 2008 - 7:29 am

Some recent comments on [1] seem to indicate that this is somehow coupled 
into prior problems/panics with Intel graphics. 

David, was this also your case, or did the EEPROM got garbled out of a 
sudden?

[1] https://bugzilla.novell.com/show_bug.cgi?id=425480

-- 
Jiri Kosina
SUSE Labs
--

From: Renato S. Yamane
Date: Tuesday, September 23, 2008 - 9:38 am

And...
<http://lwn.net/Articles/299787>
<http://bugzilla.kernel.org/show_bug.cgi?id=11382>
<https://bugzilla.redhat.com/show_bug.cgi?id=459202>
<https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555>

Best regards,
Renato
--

From: Dave Airlie
Date: Tuesday, September 23, 2008 - 2:03 pm

I have no evidence in my logs of a graphics panic, but I do do a lot
of graphics devel,
so it might be a possiblity but I'd hate to handwave it away at that.

--

From: David Miller
Date: Tuesday, September 23, 2008 - 3:05 pm

From: "Dave Airlie" <airlied@gmail.com>

Let's not handwave, but rather try to figure out if that is part of
the pattern.

Right now we don't have any real leads, so data acquisition is really
important at this phase.
--

From: David Newall
Date: Tuesday, September 23, 2008 - 11:02 pm

Isn't this reliably reproducible?  Assuming yes, Intel are such swell
guys that you might ask them to ship a few dozen cards to you to break
until you've tracked down the problem.  I mean, it's a lot easier to
find this sorts of fault when you can see it first hand than trying to
guess from third parties' reports, isn't it?  For some reasonable value
of "you", that so.
--

From: David Miller
Date: Tuesday, September 23, 2008 - 2:05 pm

From: Jiri Kosina <jkosina@suse.cz>

My current suspicion in all of this is either the GEM kernel patches
or recent X server.

However, the eeprom/nvram programming sequence seems non-trivial on
the e1000e.  You have to execute a set of precise register writes
and register polls to successfully write things out to the nvram.

This makes something like a random scribble out to MMIO space less
likely to cause this problem.

Is there some linear mapping of the nvram that could be written to
on these cards?
--

From: Dave Airlie
Date: Tuesday, September 23, 2008 - 2:09 pm

I don't think OpenSUSE was shipping any of the GEM bits.

--

From: David Miller
Date: Tuesday, September 23, 2008 - 3:07 pm

From: "Dave Airlie" <airlied@gmail.com>

Good data point, can someone confirm this?  Also, what X server version
is the effected OpenSUSE shipping?
--

From: Jeff Kirsher
Date: Tuesday, September 23, 2008 - 3:12 pm

OpenSuSE 11 ships x server version 7.3.

-- 
Cheers,
Jeff
--

From: Jiri Kosina
Date: Tuesday, September 23, 2008 - 3:19 pm

Opensuse 11 is fine.

The problem can be reproduced [not only] on opensuse 11.1 beta1, which has

	xorg-x11-7.4-1.6.x86_64.rpm

-- 
Jiri Kosina
--

From: David Miller
Date: Tuesday, September 23, 2008 - 9:12 pm

From: Jiri Kosina <jkosina@suse.cz>

I did some snooping around, and while doing so I noticed that the PCI
mmap code for x86 doesn't do one bit of range checking on the size, or
any other aspect of the request, wrt. the MMIO regions actually mapped
in the BARs of the PCI device.

Yikes!

It just does a reserve_memtype() on the address range, and says "ok".

So if, for example, the X server tries to mmap() more than an MMIO bar
actually maps, the kernel lets the user do this.

It would be very interesting to add the appropriate checks to
pci_mmap_page_range() in arch/x86/pci/i386.c, anyone who wants to do
this can use the code in arch/sparc64/kernel/pci.c:
__pci_mmap_make_offset() as a guide, and see what happens.

If the MMIO space regions of the video cards sit right before the
E1000E ones on the effected systems, that would pretty much
convince me that this is the kind of problem we are having here.

This also reminds me that there was that whole set of issues that
had to get worked out wrt. write-caching of mappings on x86.
--

From: Dave Airlie
Date: Tuesday, September 23, 2008 - 10:45 pm

I'm still dubious about this, wouldn't we see other wierdass side
effects if X was trashing the BARs on other devices?

I think tglx is on the right path, same problem as e1000, code is
stupid, it can reenter the nvram read/write code from irq
context, and pwn itself.

Dave.
--

From: David Miller
Date: Wednesday, September 24, 2008 - 12:36 am

From: "Dave Airlie" <airlied@gmail.com>

Sure.  My theory is that it's a recent xorg change causing this,
so I've been going through GIT history for xserver, libpciaccess,
and the intel driver for the past year looking for clues.

If there is usually a gap after the video device, there would just
be no response from the PCI bus, and the way that's handled is
chipset specific.  At least a while back, most x86 systems would
silently ignore writes and return all 1's in such a case, but
they may be generating bus error events these days.  I simply don't

The e1000e side here is reproducable way too easily for it to be the
same case, as far as I see it.

The e1000 driver has probably had this problem for years and we've
only recently had some concrete cases of it triggering.

Also, what utility are you running on your system that is even
accessing the NVRAM on the e1000e card?  Knowing that might help
us understand why this problem has appeared now.  Maybe there is
some diagnostic or monitoring tool that is now becoming prevalent
in these distributions where it triggers.

This problem started happening seemingly "all of a sudden", even to
people who have been keeping sort-of recent with their kernels, such
as yourself.

Yet we can't get any sense yet what range of kernel versions are in
use when the problem triggers.

I'm about to leave for a week or so in Paris for the netfilter
workshop, so I hope that someone other than myself will do some data
mining like I have instead of (merely) tossing theories around and
finger pointing.
--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 1:59 am

The only thing I can think off then is either the pciaccess conversion
of the intel Xorg driver,

The driver seems quite happy to access the NVRAM, I think Thomas has
some backtraces that show

I've seen it reported at least at 2.6.27-rc1 and maybe even one of
Fedora's -rc0 kernels.

--

From: David Miller
Date: Wednesday, September 24, 2008 - 2:01 am

From: "Dave Airlie" <airlied@gmail.com>

I don't dispute that the locking is dodgy and likely needs to be fixed
like e1000.

I'm asking what userland tool or kernel event is triggering the nvram
access.

It shouldn't even touch the thing after probing and initializing
the card.
--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 2:16 am

Hopefully tglx can supply some traces, I think getting an interrupt
during device startup
can possibly access the nvram

http://www.tglx.de/~tglx/wtf2.txt

seems to suggest bad things could happen.

Dave.
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 9:33 am

Actually another user has just reported [1] that his e1000e card got 
screwed up exactly at the point when the installer was probing the X 
configuration. So this really seems a lot like some lethal interaction 
between intel graphics and the network card.

Dave (Airlie, too many Daves on CC here really), do you by any chance see 
any recent change in kernel intel graphic parts of DRM be causing this 
breakage?

[1] https://bugzilla.novell.com/show_bug.cgi?id=425480#c69

-- 
Jiri Kosina
SUSE Labs
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 9:37 am

BTW, why is the PAT fix implented in commit 242e3df80 needed only for 
radeons?

-- 
Jiri Kosina
SUSE Labs
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 11:10 am

Further important observation -- as far as I can see, all affected 
machines by this bug whatsoever (and the number of reportes is increasing) 
were using i915 DRM.

-- 
Jiri Kosina
SUSE Labs
--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 1:18 pm

Good question, mainly because only radeons showed the illegal mapping crash,
which was mapping via sysfs _wc files and then doing a UC mapping in
the kernel over the
same address space would fail. However this was VRAM related and these
things don't have VRAM.

--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 1:07 pm

Okay some from the kernel if this isn't in 2.6.26, the drm has
introduced no patches
I can even remotely claim might affect this. So its either userspace
or PAT related.

Dave.
--

From: Parag Warudkar
Date: Wednesday, September 24, 2008 - 3:54 pm

Another data point in the support of this theory - I've been running
all various 2.6.27-rc releases (including rc7) on my HP machine which
has an embedded 82566 and Radeon x1650 graphics - and so far I have
not seen any problems.

Parag
--

From: Jonathan Corbet
Date: Wednesday, September 24, 2008 - 9:27 am

On Wed, 24 Sep 2008 00:36:38 -0700 (PDT)

A data point, just in case it helps...  I've not had time to update my
desktop system, so this all-Intel, ICH9, e1000e-based box has been stuck
at 2.6.27-rc3.  It has rawhide as of shortly after the floodgates
reopened (but with my own kernel); that means X server 1.5.0 and i810
2.4.2-3.

It's happy as a clam. I'm not sure how often this problem bites, but it
hasn't gotten me.

jon
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 9:56 am

Thanks for the information.

Seems like it quite often triggers during the very first probing of 
graphics card during the initial X startup. Karsten is currently writing a 
tool that will safely restore the EEPROM contents to the card. When this 
gets done, testing will get much easier and hopefully we'll be able to 
isolate whether it is e1000e driver (I currently don't think so), DRM 
kernel code, or xorg 7.4 causing this.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Theodore Tso
Date: Wednesday, September 24, 2008 - 1:47 pm

I'm running a 2.6.26-rc6 kernel on a X61s laptop, which is an
all-Intel ICH8, using the e1000e driver, and I haven't been been
bitten with the problem either.  I'm using an Ubuntu Hardy userspace,
which means I'm using an 1.4.0.90 X Server with an i915 drm version
1.6.0 20060119, and my e1000 EEPROM hasn't been blasted to oblivion
yet!

Personally, I don't plan on upgrading to a newer userspace until we
figure out what the heck is going on.  :-)

					- Ted
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 12:01 pm

If any of you guys has Lenovo thinkpad (T60p ideally) with 8086:104b 
revision 3 card, could you please send me the respective "ethtool -e" 
dump?

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Kyle McMartin
Date: Wednesday, September 24, 2008 - 12:10 pm

I've been working on a patch to detect (using a timer and checking at
 up/down) whether or not the flash has been corrupted, and, if it is
rewrite it with the saved good copy (which obviously only helps if
it's the same boot.)

Unfortunately, I don't have enough time to finish it before I go away
for the weekend, so I'll toss it over the wall and see if it sticks to
anything.

At a glance, one would need to add support for rewriting
adapter->hw.flash from ethtool if someone reprograms the good firmware
back, and writing the good flash back on down/remove if it detects
a change.

Bear in mind, super quick hack, and I haven't even run-tested it yet.

If nobody decides to run with it, I'll probably give it another poke
late tonight.

Definitely-not-signed-off-by-or-tested-by: Kyle

At the very least, if someone pokes in a hexdump of the firmware, at
least we might be able to see some of the method to the madness of the
corruption pattern.

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index ac4e506..08cce8c 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -168,6 +168,7 @@ struct e1000_adapter {
 	struct timer_list watchdog_timer;
 	struct timer_list phy_info_timer;
 	struct timer_list blink_timer;
+	struct timer_list flash_timer;
 
 	struct work_struct reset_task;
 	struct work_struct watchdog_task;
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 74f263a..ca3f645 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -863,6 +863,11 @@ struct e1000_hw {
 
 	u8 __iomem *hw_addr;
 	u8 __iomem *flash_address;
+	int flash_len;
+
+	u8 *flash;
+	u8 *flash_backup;
+	spinlock_t flashlock;
 
 	struct e1000_mac_info  mac;
 	struct e1000_fc_info   fc;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index d266510..13f05f8 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2535,6 +2535,7 @@ void e1000e_down(struct e1000_adapter ...
From: Jesse Brandeburg
Date: Wednesday, September 24, 2008 - 12:22 pm

Thanks Kyle!

attached is a patch to dump the eeprom to dmesg (first 64 bytes) at
boot for e1000e, which kind of goes along with your AWOOGA part of
your patch.
From: David Miller
Date: Wednesday, September 24, 2008 - 12:52 pm

From: Kyle McMartin <kyle@mcmartin.ca>

Looks interesting, I hope someone runs with it :-)

If the flash is seen as corrupt, we should print the current process
that is running at the time, and perhaps a pt_regs dump, as these
might provide the most important clues to diagnosing this.
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 3:37 pm

Thanks, looks interesting e1000e hack that might possibly be of some help.

BUT! please have a look at

	http://lkml.org/lkml/2008/9/24/133

Looks like this device got a lot of 0xff written somewhere in its config 
space, right? But it isn't Intel card at all.

-- 
Jiri Kosina
SUSE Labs
--

From: H. Peter Anvin
Date: Thursday, September 25, 2008 - 11:39 am

That looks like the device disappeared completely.

	-hpa
--

From: Kok, Auke
Date: Thursday, September 25, 2008 - 1:45 pm

I asked that person and he said reverting to an older kernel made it work again,
also there is a fix out as romieu pointed out as well, so this seems to be a
different bug for now.

Auke
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 4:15 pm

Absolutely. Or we can even do some dirty hackery in userspace, like 
LD_PRELOADing X server and checking mmaps() that are close to MMIO regions 

Unfortunately, looking at the lspci outputs that are in 
https://bugzilla.novell.com/show_bug.cgi?id=425480 it seems to me that the 
MMIO regions are quite far away from each other.

-- 
Jiri Kosina
SUSE Labs
--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 5:22 pm

Yup on my laptop these were far away and I wondered what could mangle
things that badly.

Well I'm out of the race, my attempts to re-write my eeprom using an
eeprom from an equivalent laptop
have totally failed and my BIOS won't boot anymore - so my laptop is == a brick.

Dave.
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 6:27 pm

Uh oh. Shouldn't we put something like the patch below in Linus' tree 
unless we get this sorted out? Otherwise more and more people who use -rc 
kernels will run into this, and will get their hardware [hopefully 
temporarily, but not all users are able to re-flash their network card 
EEPROMs, right] bricked.

I know that it is quite aggressive and is going to disable wired 
networking on a lot of systems that have been functioning properly, 
therefore RFC ...



From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] [RFC] E1000E: temporarily disable e1000e driver

E1000E: temporarily disable e1000e driver

There is a serious bug somewhere, that renders e1000e network cards 
unusable on certain hardware configurations by rewriting EEPROM with 0xff 
all over. Debugging this is not trivial, because:

- it is not yet even clear whether the bug is caused by userspace (new 
  version of xorg drivers, bad interaction with PAT, ...) or some bug in 
  kernel code; it's even not yet certain at which exact combination of 
  software versions and hardware configuration this started to trigger
- you have only one attempt to test potential fix. If the fix doesn't 
  work, the eeprom of the card is hosed

and therefore fixing this has potential to take some time.

The tool that will safely restore the previous contents of EEPROM is 
currently being written, but even this is not trivial (Dave Airlie has 
turned his notebook into brick while trying to restore the EEPROM 
contents).

Let's therefore mark this driver as broken (though it is very well 
possible that this particular driver is not at fault at all) until this 
gets resolved, so that users of -rc kernels don't get their network cards 
totally unusable.

References (information about sw/hw configurations of affected systems 
might be found in the ...
From: Frans Pop
Date: Wednesday, September 24, 2008 - 7:01 pm

Something else to worry about is bisections. People seeing an unrelated 
issue with .27 after release may well be asked to do a bisection and 
could then run into the issue even if it is fixed before the release.

Guess we'll need to wait and see what the root cause is to know if that's 

Extra datapoint. As far as I've seen this problem has not yet been 
reported by any people running Debian. This could point to X.Org as 
Debian currently has 7.3 while I think the reports so far have been with 
7.4.

I have been running .27-rc kernels myself on a HP 2510p laptop running 
Debian/lenny which does have the "bad" NIC (ICH9), but it's still working 
for me. I do have some vague resume from suspend problems, but for now 
I'm assuming those are unrelated.
I have been running the kernels both with and without PAT enabled.

Cheers,
FJP
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 10:24 am

Yes, I think that xorg/xorg i915 driver/libdrm/GEM/whatever are the 
biggest suspect currently, according to the data that has been gathered so 
far.

Still, what confuses me a little bit -- the EEPROM of the card is set to 
all 0xff, once the corruption happens. Isn't that a quite a coincidence, 
that bytes representing "nothing" in this context are used?

If being set to 0 (it's so easy to call memset(0) on a bogus pointer, 
there are usually lots of them in the code) or to random garbage, it would 
seem to be much more understandable, than 0xff.

-- 
Jiri Kosina
SUSE Labs

--

From: H. Peter Anvin
Date: Thursday, September 25, 2008 - 11:46 am

Typical card EEPROMs are serial - either I2C or SPI.  I believe the 
Intel cards use SPI EEPROMs, but I'm not sure.

[Disclaimer: I don't actually know SPI all that well; I know I2C better. 
  However, I'm pretty sure the following argument does apply to both.]

Consider a corruption which turns a read command into a write command -- 
often just a single bit difference.  Now, the EEPROM will expect data in 
to write, but nothing will be driving the data line, so it will 
typically be a 1.  As the host tries to read, it will therefore fill the 
EEPROM with all ones.

	-hpa
--

From: Jesse Barnes
Date: Thursday, September 25, 2008 - 11:56 am

We have confirmation that this isn't GEM related; according to the Novell bug 
at https://bugzilla.novell.com/show_bug.cgi?id=425480 people have hit the 
problem with kernels w/o GEM.

That doesn't rule out i915 (though I don't think any changes have gone in 
since 2.6.26 that would have caused this) or xf86-video-intel.  It's possible 
that X is getting confused about BAR mappings somehow, resulting in a 
clobbered e1000e NVRAM, but why would the kernel version matter in that case?  
The only thing that comes to mind would be PAT...

Recent versions of the X drivers (using recent libpciaccess code) will try to 
map the resourceN_wc file in sysfs.  It's possible that the map size we end 
up using is wrong, leading to the situation Dave described earlier where we 

Presumably one has to write all ones to the EEPROM BAR of the e1000 device to 
see that pattern?  Or is there some way of configuring the EEPROM such that 
it'll fail to respond to read cycles resulting in all ones for every read 
back (i.e. target abort)?

Jesse
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 1:22 pm

But the xorg intel driver shipped with xorg 7.4 already has support for 
GEM, right? So there could still be some bug in the GEM-aware driver 

Yes, booting with 'nopat' is on my list to try immediately after we are 

This we could catch easily even with strace, right?

-- 
Jiri Kosina
SUSE Labs
--

From: Jesse Barnes
Date: Thursday, September 25, 2008 - 12:36 pm

X.Org 7.4 came with xf86-video-intel 2.4.2 right?  That doesn't have any GEM 
bits in it either.

However, the "Factory" log at #425480 *does* indicate that a GEM aware 2D 
driver was loaded (the "[drm:i915_getparam] *ERROR* Unknown parameter 5" 
message indicates as much), but the kernel was definitely not GEM aware 
otherwise the call would have succeeded.  So that rules out GEM proper, but 
it could still be a bug in one of the non-GEM paths in the experimental 

Yep, that one's easy to catch.

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 1:35 pm

That was exactly the point I was trying to make, that these error paths 
will probably also need auditing, once we rule out the possibility of 
NVRAM being overwritten from kernelspace.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Dave Airlie
Date: Thursday, September 25, 2008 - 2:06 pm

Well the non-GEM paths are really the old codepaths we used in the
older drivers..

So unless we do something really dumb...

I'd target three areas PAT, pciaccess and e1000e itself.

Dave.
--

From: Jesse Brandeburg
Date: Thursday, September 25, 2008 - 2:42 pm

ubuntu has CONFIG_X86_PAT disabled for at least i386 arch, maybe that
is relevant.
--

From: Dave Airlie
Date: Thursday, September 25, 2008 - 2:45 pm

On Fri, Sep 26, 2008 at 7:42 AM, Jesse Brandeburg

It rules out PAT I suppose, they have seen the issue as well.

Dave.
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 3:45 pm

I wasn't able to rule out PAT from the suse bugreports POV, as we have PAT 
enabled both for 32bit and 64bit x86.

If Ubuntu has it disabled also for 64bit x86 (where do these guys have 
.config files to check?), I think we can definitely rule out PAT, as there 
has been at least one report from Ubuntu user on this very issue.

-- 
Jiri Kosina
SUSE Labs
--

From: Alexey Rempel
Date: Friday, September 26, 2008 - 12:06 am

I'm testing ubuntu intrepid also "afected system" but not affected card
and use e1000e driver. (i945g+ich7)
as i can see ubuntu at least now do not use pat on 32bit system.

config is attached.
From: H. Peter Anvin
Date: Thursday, September 25, 2008 - 3:57 pm

Okay, I just had a scary and hopefully stupid thought.

Especially Intel often has backchannels between the chipset and the 
Ethernet controller for management functions -- anything from WoL to 
IPMI -- generally over some kind of low-speed serial bus.

We're not in a situation where the EEPROM can be touched from the 
chipset via the SMBus or some other non-CPU channel?

	-hpa
--

From: Krzysztof Halasa
Date: Friday, September 26, 2008 - 11:55 am

I know next to nothing about SMBus and especially those other
backchannels, but the 82566 product brief :-) lists support for:
- Intel Active Management Technology (AMT) with "System Defence"
  (whatever that means)
- ASF 2.0
I think ASF (Alert Standard Format) is somehow related to IPMI and
uses I^2C or something similar (SMBus).

8254x manual says that the EEPROM is divided into 4 parts: one for
E1000 hw initialization, one for ASF (Ethernet in ASF mode?), one for
external BMC (TCO) (loaded by external BMC from the SMBus) and one for
software only (not used by hardware). Some chips only support #1 (and
#4 of course).

I understand the driver reads the EEPROM using EERD register (which,
according to the manual, requires no additional locking) or drives the
EEPROM directly, with a lock/unlock protocol (using EECD register).
Now some devices lack the lock/unlock bits, but they lack ASF/BMC as
well.

I imagine chips other than 8254x may be different here.

Do we have some "master" bugzilla entry or something like that for
these problems?
-- 
Krzysztof Halasa
--

From: Alan Cox
Date: Friday, September 26, 2008 - 12:39 pm

We have had historic problems where a very non standard EEPROM setup on
some ancient thinkpads ended up with bad stuff happening due to smbus
probing and the like. You would then however expect to see the bug occur
without loading the e1000* drivers (unless you needed an interaction
between the two)

--

From: Krzysztof Halasa
Date: Thursday, September 25, 2008 - 12:23 pm

Perhaps the entire chip has been erased with the "ERAL" (erase all)
command. Requires previously issued EWEN (erase/write enable).
Each command seems to require several writes to the EEPROM control
register.
-- 
Krzysztof Halasa
--

From: David Miller
Date: Thursday, September 25, 2008 - 1:06 pm

From: Jiri Kosina <jkosina@suse.cz>

Setting framebuffer bytes to 0xff is pretty common, for example
for color keys and anti-aliasing pixel values.
--

From: Jeff Garzik
Date: Wednesday, September 24, 2008 - 7:28 pm

That seems a bit drastic, particularly when the debugging was beginning 
to point to another culprit.

We have equal case at this point to disable r8169 and i915_drm, no?

	Jeff




--

From: Dave Airlie
Date: Wednesday, September 24, 2008 - 8:51 pm

No we actually are more likely unable to do anything from the kernel,
if its happening from userspace

firstly we need a reflash utility that is safe, otherwise people who
have the issue can't reproduce it,
and people who don't have the issue don't want to play with it.

I think e1000e may enable a BAR or something that causes the issue to
break this hw., I haven't seen it broken on any
machine where e1000e wasn't loaded yet. Again the r8169 might be the
same issue, but it maybe because the bar was enabled.

Dave.
--

From: David Miller
Date: Wednesday, September 24, 2008 - 9:00 pm

From: "Dave Airlie" <airlied@gmail.com>

All PCI device drivers in the kernel first do pci_enable_device()
which essentially enables all BARs.

The flash lives in BAR 1 of the E1000E, for example.
--

From: Jesse Brandeburg
Date: Wednesday, September 24, 2008 - 9:25 pm

on my ich9 based system the e1000e BAR1 regions are back to back with 
both the vga memory map and the audio mem, either of which could be the 
mangler, but more likely vga device (say X maybe) since it is mapped 

I'm really sorry to hear that, I wonder if the laptop has an "emergency 
bios update" mode like many PCs used to through a jumper.  Dave A., let 
us know if you make any recovery progress.

I plan to try some random writes tomorrow to my BAR1 space and see if my 
flash gets erased.

Jesse
--

From: Krzysztof Halasa
Date: Thursday, September 25, 2008 - 9:26 am

I guess it's more about the E1000's serial configuration EEPROM, the
registers seem to live in BAR0 (EECD and for reading perhaps EERD).
Corrupted EEPROM (and thus PCI config registers) can easily result in
a dead machine.

I will be writing a tool for writing 82541PI EEPROMs on a custom
board soon (unless there is one available, for Linux, of course),

I'm not sure it's the flash that is corrupted. Anyway booting the
laptop should be quite easy (physically disabling the EEPROM on boot
should do the trick), though it would require taking the machine
apart.
-- 
Krzysztof Halasa
--

From: Jesse Barnes
Date: Wednesday, September 24, 2008 - 5:26 pm

Moreover, we don't actually do any writing (that I know of) of the ROM image 
from the X drivers or the kernel.  In fact, in many cases X should be 
accessing the RAM copy of the ROM at 0xc0000 rather than via the ROM BAR.

That said, adding a check to the x86 code would be a good thing to do; I'll 
hack up a patch tomorrow unless someone beats me to it.

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Jiri Kosina
Date: Wednesday, September 24, 2008 - 5:33 pm

The problem here is that what we desperately need first is a method to 
restore the original EEPROM contents after it gets corrupted (David Airlie 
has, sadly, apparently bricked his notebook while trying to do so). 
Without this, we can put a lot of debugging/protecting patches into the 
kernel, but we won't be able to succesfully verify anything, because 
testing wouldn't be possible.

Added Jesse and Karsten to CC, as they are working on such a tool right 
now, as far as I know.

-- 
Jiri Kosina
SUSE Labs
--

From: Jesse Barnes
Date: Thursday, September 25, 2008 - 9:08 am

I should be able to test the mmap fix independently of the e1000 breakage at 
least... lemme try it out now...

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Jesse Barnes
Date: Thursday, September 25, 2008 - 12:43 pm

Here's a patch that adds range checking to the sysfs mappings at least.  This 
patch should catch the case where X (or some other process) tries to map 
beyond the specific BAR it's (supposedly) trying to access, making things 
safer in general.  FWIW both my F9 and development versions of X start up 
fine with this patch applied.

DaveM, will this work for you on sparc?  It looked like your code was allowing 
bridge window mappings, but that behavior should be preserved as long as your 
bridge devices reflect their window sizes correctly in their pdev->resources?

If we add similar code to the procfs stuff we wouldn't need to do any checking 
in the arches.

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..f4e8b4e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -502,6 +502,8 @@ pci_mmap_resource(struct kobject *kobj, struct 
bin_attribute *attr,
 	struct resource *res = (struct resource *)attr->private;
 	enum pci_mmap_state mmap_type;
 	resource_size_t start, end;
+	unsigned long map_len = vma->vm_end - vma->vm_start;
+	unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
 	int i;
 
 	for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -510,6 +512,13 @@ pci_mmap_resource(struct kobject *kobj, struct 
bin_attribute *attr,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
+	/*
+	 * Make sure the range the user is trying to map falls within
+	 * the resource
+	 */
+	if (map_offset + map_len > pci_resource_len(pdev, i))
+		return -EINVAL;
+
 	/* pci_mmap_page_range() expects the same kind of entry as coming
 	 * from /proc/bus/pci/ which is a "user visible" value. If this is
 	 * different from the resource itself, arch will do necessary fixup.
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 1:45 pm

Good. We will use this on affected machines after we start some real 

At least for debugging purposes I'd propose to put a printk() there with 
process name, and the range it tries to map.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Jiri Kosina
Date: Thursday, September 25, 2008 - 5:24 am

Actually there is no way of not shipping GEM when shipping xorg 7.4, isn't 
it?

So definitely GEM could be potential cause here, I think.

-- 
Jiri Kosina
SUSE Labs
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11407
Subject		: suspend: unable to handle kernel paging request
Submitter	: Vegard Nossum <vegard.nossum@gmail.com>
Date		: 2008-08-21 17:28 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121933974928881&w=4
Handled-By	: Rafael J. Wysocki <rjw@sisk.pl>
		  Pekka Enberg <penberg@cs.helsinki.fi>
		  Pavel Machek <pavel@suse.cz>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11439
Subject		: [2.6.27-rc4-git4] compilation warnings
Submitter	: Rufus & Azrael <rufus-azrael@numericable.fr>
Date		: 2008-08-26 9:37 (27 days old)
References	: http://marc.info/?l=linux-kernel&m=121974353815440&w=4
Handled-By	: Greg KH <gregkh@suse.de>
Patch		: http://marc.info/?l=linux-kernel&m=121976424221858&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11442
Subject		: btusb hibernation/suspend breakage in current -git
Submitter	: Rafael J. Wysocki <rjw@sisk.pl>
Date		: 2008-08-25 11:37 (28 days old)
References	: http://marc.info/?l=linux-bluetooth&m=121966402012074&w=4
Handled-By	: Oliver Neukum <oliver@neukum.org>
Patch		: http://marc.info/?l=linux-bluetooth&m=121967226027323&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11459
Subject		: kernel crash after wifi connection established
Submitter	: Alexey Kuznetsov <ak@axet.ru>
Date		: 2008-08-30 03:08 (23 days old)


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11501
Subject		: Failed to open destination file: Permission deniedihex2fw
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-09-04 18:34 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=122055342419068&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11505
Subject		: oltp ~10% regression with 2.6.27-rc5 on stoakley machine
Submitter	: Lin Ming <ming.m.lin@intel.com>
Date		: 2008-09-04 7:06 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=122051202202373&w=4
		  http://marc.info/?t=122089704700005&r=1&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>
		  Gregory Haskins <ghaskins@novell.com>
		  Ingo Molnar <mingo@elte.hu>
Patch		: http://marc.info/?l=linux-kernel&m=122194673932703&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11512
Subject		: sort-of regression due to "kconfig: speed up all*config + randconfig"
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-09-05 22:50 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=122065498013858&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11507
Subject		: usb: sometimes dead keyboard after boot
Submitter	: Frans Pop <elendil@planet.nl>
Date		: 2008-08-26 21:03 (27 days old)
References	: http://marc.info/?l=linux-kernel&m=121977815018224&w=2
Handled-By	: Alan Stern <stern@rowland.harvard.edu>
Patch		: http://www.spinics.net/lists/linux-usb/msg09735.html


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11476
Subject		: failure to associate after resume from suspend to ram
Submitter	: Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date		: 2008-09-01 13:33 (21 days old)
References	: http://marc.info/?l=linux-kernel&m=122028529415108&w=4
Handled-By	: Zhu Yi <yi.zhu@intel.com>
		  Dan Williams <dcbw@redhat.com>
		  Jouni Malinen <j@w1.fi>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11465
Subject		: Linux-2.6.27-rc5, drm errors in log
Submitter	: Gene Heskett <gene.heskett@verizon.net>
Date		: 2008-08-30 18:52 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=122012238925775&w=4
Handled-By	: Dave Airlie <airlied@gmail.com>


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11506
Subject		: oops during unmount - ext3? (2.6.27-rc5)
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-09-04 19:14 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=122055573123449&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11516
Subject		: severe performance degradation on x86_64 going from 2.6.26-rc9 -> 2.6.27= -rc5
Submitter	: Jason Vas Dias <jason.vas.dias@gmail.com>
Date		: 2008-09-07 13:59 (15 days old)


--

From: Jason Vas Dias
Date: Tuesday, September 23, 2008 - 2:49 am

Hi -=20
Yes, this bug is still a problem with both the latest 2.6.27-rc6 kernel (fr=
om Linus' tree 2008-09-21)
and with the latest fedora 10 kernel .

CPU Frequency switching is completely disabled both when powernow-k8 (the c=
orrect cpufreq module for my
x86_64 AMD TL-64x2 2.2GHz CPU) is installed as a module or is built-in , an=
d the CPU frequency remains
at its lowest setting; attempts to modify /sys/devices/system/cpu/cpu0/cpuf=
req/scaling_max_freq
and /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed are not honored, =
even though=20
/sys/devices/system/cpu/cpu0/cpufreq/governor is "userspace"=20
and scaling_min_freq < scaling_setspeed > scaling_max_freq .

I see no messages from powernow-k8 indicating that it is aware it was unabl=
e to set the speed, though
I do see a message if I attempt to set an invalid speed (eg 600000) .

With 2.6.26-rc9, I get a default CPU clock frequency of 2200000 ; with 2.6.=
27-rc6, it becomes 800000 and
is not switchable. For some reason, powernow-k8 does not autoload with UDEV=
; but I don't really need it if
the speed is already set to its highest level.

On 2.6.27-rc6. after it manages to boot, any low-latency drivers time out (=
eg. USB, Terminal, Keyboard, Network)=20
and the machine does not get through the boot-up sequence without becoming =
overloaded by the kernel's debugging log messages -
neither the network , the terminal or the keyboard work usably.=20

Building a kernel with USB completely disabled and turning off debug log me=
ssages allows the machine to boot=20
(after @ 15 minutes) but the speed is still at its lowest setting and canno=
t be changed.

Also, 2.6.27-rc6 is unable to reboot the machine: it can put the machine in=
to the "HALT" state, with nothing displayed
on the screen, but the machine does not power-off until manual reset with t=
he power-button. Then, after the machine
has powered-down, it cannot be powered up until the power-on button is depr=
essed for at least two sections an released ...
From: Thomas Gleixner
Date: Saturday, September 27, 2008 - 2:23 am

I have to admit that I'm confused. 

The dmesg output
[    0.000000] Linux version 2.6.27-rc6.jvd ...
....
[   26.204477] hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000 

says 26 seconds up to the point where user space should start.  Also
USB is active in that log and I dont see timeout messages at all.

I have a hard time to connect this to your problem description
(timeouts, USB off, 15 minutes)


Len, any opinon on this: 

[    0.000000] ACPI Error (tbfadt-0453): 32/64X address mismatch in "Pm2ControlBlock": 
               [00008800] [0000000000008100], using 64X [20080609] 

Thanks,

	tglx
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11543
Subject		: kernel panic: softlockup in tick_periodic() ???
Submitter	: Joshua Hoblitt <j_kernel@hoblitt.com>
Date		: 2008-09-11 16:46 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=122117786124326&w=4
Handled-By	: Thomas Gleixner <tglx@linutronix.de>
		  Cyrill Gorcunov <gorcunov@gmail.com>
		  Ingo Molnar <mingo@elte.hu>


--

From: Cyrill Gorcunov
Date: Sunday, September 21, 2008 - 11:01 pm

[Rafael J. Wysocki - Sun, Sep 21, 2008 at 08:54:19PM +0200]
| This message has been generated automatically as a part of a report
| of recent regressions.
| 
| The following bug entry is on the current list of known regressions
| from 2.6.26.  Please verify if it still should be listed and let me know
| (either way).
| 
| 
| Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11543
| Subject		: kernel panic: softlockup in tick_periodic() ???
| Submitter	: Joshua Hoblitt <j_kernel@hoblitt.com>
| Date		: 2008-09-11 16:46 (11 days old)
| References	: http://marc.info/?l=linux-kernel&m=122117786124326&w=4
| Handled-By	: Thomas Gleixner <tglx@linutronix.de>
| 		  Cyrill Gorcunov <gorcunov@gmail.com>
| 		  Ingo Molnar <mingo@elte.hu>
| 
| 

There are really multiple issues touched in report. nmi_watchdog
hangs, rtc device creation, NULL deref...

I've asked Joshua for more information. Since he must to use
netdev tree for a while maybe we could wait 'till next merge
window will be closed and check if nmi_watchdog does work.
So the work in progress.

		- Cyrill -
--

From: Thomas Gleixner
Date: Tuesday, September 23, 2008 - 3:50 am

The softlockup issue itself is fixed, but there are issues with
nmi_watchdog. I think we should remove the regression and keep the bug
alive to chase the other issues.

Thanks,

	tglx

--

From: Rafael J. Wysocki
Date: Tuesday, September 23, 2008 - 6:52 am

Well, for the sake of documentation I'd prefer to close this bug and create a
new non-regression one for the other issues if that's not a problem.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11549
Subject		: 2.6.27-rc5 acpi: EC Storm error message on bootup
Submitter	: <jmerkey@wolfmountaingroup.com>
Date		: 2008-09-02 21:27 (20 days old)
References	: http://marc.info/?l=linux-kernel&m=122039255517586&w=4
Handled-By	: Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: http://marc.info/?l=linux-kernel&m=122098180019264&w=4


--

From: jmerkey
Date: Sunday, September 21, 2008 - 2:07 pm

This bug is corrected by Alexey's patch and has passed all regression tests.

Jeff

--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11548
Subject		: kernel BUG at drivers/pci/intel-iommu.c:1373!
Submitter	: Chris Mason <chris.mason@oracle.com>
Date		: 2008-09-08 14:26 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=122088566310440&w=4


--

From: Chris Mason
Date: Tuesday, September 23, 2008 - 6:18 pm

I'm unable to reproduce this on 2.6.27-rc7.  I don't think it has been
fixed, but I'm having a hard time finding a reliable way to trigger it
on newer kernels.

-chris


--

From: Rafael J. Wysocki
Date: Wednesday, September 24, 2008 - 11:23 am

Thanks for the update.

For now, I'll close it as 'unreproducible'.  Please reopen if it happens again.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11550
Subject		: pnp: Huge number of "io resource overlap" messages
Submitter	: Frans Pop <elendil@planet.nl>
Date		: 2008-09-09 10:50 (13 days old)
References	: http://marc.info/?l=linux-kernel&m=122095745403793&w=4
Handled-By	: Rene Herman <rene.herman@keyaccess.nl>
		  Bjorn Helgaas <bjorn.helgaas@hp.com>
Patch		: http://marc.info/?l=linux-kernel&m=122098498125536&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11551
Subject		: Semi-repeatable hard lockup on 2.6.27-rc6
Submitter	: Steven Noonan <steven@uplinklabs.net>
Date		: 2008-09-10 18:07 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=122107007407994&w=4


--

From: Steven Noonan
Date: Sunday, September 21, 2008 - 1:39 pm

The machine with these symptoms was sent in for service on Friday. I
suspect there may have been dodgy hardware involved on this one. I
think this bug should be closed for the time being. Once I get the
machine back, I'll reopen the bug if I can still reproduce it.

- Steven
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11552
Subject		: Disabling IRQ #23
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-09-09 19:08 (13 days old)
References	: http://marc.info/?l=linux-kernel&m=122098735230906&w=4
		  http://marc.info/?l=linux-kernel&m=122107367715361&w=4
Handled-By	: David Brownell <david-b@pacbell.net>
		  Alan Stern <stern@rowland.harvard.edu>
Patch		: http://marc.info/?l=linux-kernel&m=122187222705195&w=4


--

From: Justin Mattock
Date: Sunday, September 21, 2008 - 4:16 pm

not sure if it should be;
From over here, I did a bad install
of isight-firmware-tools, causing hal and udev
to clash.  After making sure the package was either
using hal or udev, there is no message of disable irq #23.
If its not too much trouble is there a way to verify that this was
the case, i.g. if udev creates a dev, then hal creates the same device
will this cause ehci_hcd to have messages of this kind? If so
then thats what happened, if not then theres something else causing this.

-- 
Justin P. Mattock
--

From: Alan Stern
Date: Monday, September 22, 2008 - 3:53 am

You didn't read what I wrote earlier, did you?  The "HC died" message 
should NEVER occur!  It doesn't matter what games you play with hal and 
udev -- it should NEVER occur.  Not ever.

And since the "HC died" is what causes IRQ #23 to be disabled, that 
shouldn't happen either.

Alan Stern

--

From: Justin Mattock
Date: Monday, September 22, 2008 - 9:20 am

appologize for not fully understanidng,
I'm just getting confused with why and what is causing this
to occur. The only reason for playing with hal and udev
is to have this message appear, if I leave them out of the picture
the system runs fine.
Anyways, I'm up to trying anything at this point, and again
appologize for causing any heat.

-- 
Justin P. Mattock
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11568
Subject		: spontaneous reboot on resume with 2.6.27
Submitter	: Andy Wettstein <ajw1980@gmail.com>
Date		: 2008-09-14 20:00 (8 days old)


--

From: Andy Wettstein
Date: Monday, September 22, 2008 - 7:13 pm

Just verified it is still a problem with 2.6.27-rc7.

--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11569
Subject		: Don't complain about disabled irqs when the system has paniced
Submitter	: Andi Kleen <andi@firstfloor.org>
Date		: 2008-09-02 13:49 (20 days old)
References	: http://marc.info/?l=linux-kernel&m=122036356127282&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11590
Subject		: Nokia 5310 Xpress usb-storage not mounting
Submitter	: David Almaroad <dalmaroad@gmail.com>
Date		: 2008-09-18 21:35 (4 days old)


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11608
Subject		: 2.6.27-rc6 BUG: unable to handle kernel paging request
Submitter	: John Daiker <daikerjohn@gmail.com>
Date		: 2008-09-16 23:00 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=122160611517267&w=4


--

From: Chuck Ebbert
Date: Wednesday, September 24, 2008 - 5:46 pm

On Sun, 21 Sep 2008 20:54:23 +0200 (CEST)

As I said in the bugzilla entry:

  Oops: 000b

  Bit 3 is set -- the processor detected 1's in reserved bits of the page directory.

That can't be good...
--

From: Nick Piggin
Date: Wednesday, September 24, 2008 - 8:03 pm

54384.988151] BUG: unable to handle kernel paging request at ffff8800601dd000
[54384.992095] IP: [<ffffffff80375457>] clear_page_c+0x7/0x10
[54384.992095] PGD 202063 PUD 8067 PMD 65d54163 PTE 80002020601dd163
[54384.992095] Oops: 000b [1] SMP DEBUG_PAGEALLOC

I initially suspect PAT (maybe via DEBUG_PAGEALLOC)... but let's see if the
3rd line here is useful.

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
PGD:                                         001000000010000001100011

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
PUD:                                                 1000000001100111

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...Rs.actuwp
PMD:                                 01100101110101010100000101100011

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...gP.actuwp
PTE: 1000000000000000001000000010000001100000000111011101000101100011
     3210987654321098765432109876543210987654321098765432109876543210

Is this a 36-bit physical address CPU? In which case you have 2 bits in
the pte that are outside "maxphys". Or if it is a 40-bit CPU, then you
have just 1 bit outside maxphys, in which case I'd say it is memory
corruption (maybe a hardware bug, maybe a scribble from elsewhere). So
I'm wrong about PAT.

Interestingly, the PMD also has a 1 set in a reserved bit (page global),
but according to the Intel docs, the CPU doesn't check that bit, so it
is not faulting there.

Does the machine survive memtest? Is the bug reproduceable? If the
answer is no to either of these, I think we can take it off the
regression list. Otherwise, is it possible to track down to a specific
commit?

Thanks,
Nick

--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11610
Subject		: Problem with kernel commit 664d080c41463570b95717b5ad86e79dc1be0877
Submitter	: Michal 'vorner' Vaner <vorner@ucw.cz>
Date		: 2008-09-21 17:35 (1 days old)
References	: http://marc.info/?l=linux-acpi&m=122201853409501&w=4


--

From: Michal 'vorner' Vaner
Date: Sunday, September 21, 2008 - 4:10 pm

Hello


Yes, it still does this with newest kernel
(9824b8f11373b0df806c135a342da9319ef1d893). At last for me.

With regards

-- 
Please enter password:

Michal 'vorner' Vaner
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11609
Subject		: oops in find_get_page
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-09-20 14:53 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=122192251101892&w=4


--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 11:54 am

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11611
Subject		: Commit 2344abbcbdb82140050e8be29d3d55e4f6fe860b breaks resume on nx6325
Submitter	: Rafael J. Wysocki <rjw@sisk.pl>
Date		: 2008-09-20 23:24 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=122195277606974&w=4


--

From: Alexey Starikovskiy
Date: Sunday, September 21, 2008 - 2:57 pm

Hi Rafael,
Correct patch is the one attached to bugzilla entry,
not the one you mention.

Regards,
Alex.

--

Previous thread: Re: Honoring SO_RCVLOWAT in proto_ops.poll methods by Alan Cox on Sunday, September 21, 2008 - 1:13 pm. (15 messages)

Next thread: Re: [Bug #11555] rmmod ide-cd_mod: tried to init an initialized by Mariusz Kozlowski on Sunday, September 21, 2008 - 1:43 pm. (1 message)