Repost: NMI error and Intel S5000PSL Motherboards

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <unlisted-recipients@...>, <@...>
Cc: <rdunlap@...>, Jim Paris <jim@...>, <linas@...>, Alan Cox <alan@...>, linux-kernel <linux-kernel@...>
Date: Monday, October 1, 2007 - 12:09 am

This is a slightly edited repost of a note sent on Friday September 28, 
as we haven't heard back from anyone yet. (I know it was the weekend!) 
Sorry to post again but this issue caused great problems for us and I 
want to be sure we're choosing a decent solution.

Perhaps one of the people who so helpfully commented on this issue 
earlier last week can now give their opinion on the what should be 
concluded from our discovery that "CONFIG_PCIEAER=y" -- introduced in 
the 2.6.19 kernel and set as the default -- leads to NMI errors on the 
Intel S5000PSL motherboard.

I'm told Intel people were closely involved in the development of this 
PCIEAER feature -- so it seems even weirder that it causes problems for 
this Intel motherboard. But we have confirmed the problem with multiple 
Linux distributions.

We are hoping to get some insights into the real cause. Please see below 
where I outlined what seem to be the 3 possibilities.
Although running "scanpci" provoked the NMI errors 100 percent on 
demand, the NMI errors would also occur randomly every few weeks on a 
given system without doing anything special. I don't want anybody to 
think we are just trying to prevent a problem from occurring because we 
like running "scanpci".  "Scanpci" just turned out to be a reliable way 
to reproduce an otherwise random problem.


So, looking for some closure here, what do you think is the "root 
cause"? Is it:

1)  a defect with Intel's S5000PSL motherboards that is not seen when 
running 2.6.18 and earlier kernels but that is exposed by this feature 
added in 2.6.19? In which case, shouldn't we work to get Intel to 
investigate?

2)  a problem with the PCIEAER feature? And maybe "CONFIG_PCIEAER=y"  
should NOT be the default setting?

3)  just a bad interaction between a good motherboard and a good Linux 
feature that don't play well together? (in which case isn't this a 
"feature" that anybody compiling a kernel to run on the Intel S5000PSL 
motherboard should know not to enable?/

And in general is it a bad idea to set "CONFIG_PCIEAER to "no"". Or is 
it something that we can really live without?




Andrew
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
NMI error and Intel S5000PSL Motherboards, AndrewL733, (Wed Sep 26, 6:12 am)
Re: NMI error and Intel S5000PSL Motherboards, Jim Paris, (Wed Sep 26, 7:48 pm)
Re: NMI error and Intel S5000PSL Motherboards, Randy Dunlap, (Wed Sep 26, 8:03 pm)
Re: NMI error and Intel S5000PSL Motherboards, AndrewL733, (Fri Sep 28, 11:13 am)
Repost: NMI error and Intel S5000PSL Motherboards, AndrewL733, (Mon Oct 1, 12:09 am)
Re: NMI error and Intel S5000PSL Motherboards, AndrewL733, (Fri Sep 28, 11:11 am)
Re: NMI error and Intel S5000PSL Motherboards, Alan Cox, (Wed Sep 26, 7:16 am)
Re: NMI error and Intel S5000PSL Motherboards, Randy Dunlap, (Wed Sep 26, 12:58 am)
Re: NMI error and Intel S5000PSL Motherboards, Randy Dunlap, (Tue Sep 25, 10:59 pm)