Re: Why do so many machines need "noapic"?

Previous thread: [PATCH] prevent kswapd from freeing excessive amounts of lowmem by Rik van Riel on Wednesday, September 5, 2007 - 4:01 pm. (7 messages)

Next thread: use of asm/prom.h by Stephen Rothwell on Wednesday, September 5, 2007 - 4:39 pm. (1 message)
From: Chuck Ebbert
Date: Wednesday, September 5, 2007 - 4:30 pm

Some systems lock up without the noapic option. I found one
that will freeze while trying to set up the timer interrupt.
Passing 'nolapic' makes it freeze just after:

   Setting up timer through ExtINT... works

Sometimes it will boot up and then freeze during the startup
scripts. Passing the noapic option fixes all that, but it
then gets 1000 spurious interrupts per second on IRQ7 (which
only shows ehci using it.) Kernel version is 2.6.22.
-

From: Andi Kleen
Date: Thursday, September 6, 2007 - 4:31 am

Always boot with apic=debug

The messages means the primary timer setup methods already didn't work.
ExtInt is really a crappy fallback that was originally only
needed for some early SMP systems which where the timer was not wired
according to specs.

But the real problem is that the standard timer access method
through the local APIC didn't work.

I had a rewrite of the timer probing some time ago that tried
more combinations automatically. It had some problems so it 
never went in, but perhaps it's worth revisiting.

-Andi 
-

From: Chuck Ebbert
Date: Friday, September 7, 2007 - 12:34 pm

This is the first one I've actually had in front of me:

  HP TX1000 notebook
  Nvidia C51/MCP51 mobile chipset

Booting with "noapic" gives some very strange results. This is two
snapshots of /proc/interrupts taken one second apart. It almost looks
like timer interrupts are occurring on IRQ 0 and IRQ7 on different
CPUs:

           CPU0       CPU1       
  0:     446096       6224    XT-PIC-XT        timer
  1:        342          6    XT-PIC-XT        i8042
  2:          0          0    XT-PIC-XT        cascade
  5:       3099        865    XT-PIC-XT        sata_nv
  7:       8145     494718    XT-PIC-XT        ehci_hcd:usb2
  8:          0          0    XT-PIC-XT        rtc0
  9:        323          9    XT-PIC-XT        acpi
 10:        136         36    XT-PIC-XT        HDA Intel
 11:      43884       1091    XT-PIC-XT        ohci_hcd:usb1, eth0
 12:        104         19    XT-PIC-XT        i8042
 14:       1011         25    XT-PIC-XT        libata
 15:          0          0    XT-PIC-XT        libata
NMI:          0          0 
LOC:       6212     445951 
ERR:     403241
MIS:          0

           CPU0       CPU1       
  0:     447098       6233    XT-PIC-XT        timer
  1:        343          6    XT-PIC-XT        i8042
  2:          0          0    XT-PIC-XT        cascade
  5:       3100        865    XT-PIC-XT        sata_nv
  7:       8158     495847    XT-PIC-XT        ehci_hcd:usb2
  8:          0          0    XT-PIC-XT        rtc0
  9:        323          9    XT-PIC-XT        acpi
 10:        136         36    XT-PIC-XT        HDA Intel
 11:      43988       1094    XT-PIC-XT        ohci_hcd:usb1, eth0
 12:        104         19    XT-PIC-XT        i8042
 14:       1032         26    XT-PIC-XT        libata
 15:          0          0    XT-PIC-XT        libata
NMI:          0          0 
LOC:       6221     446953 
ERR:     404383

I can't capture the messages. Even when it boots it doesn't last
long enough to get them.

-

From: Prakash Punnoor
Date: Friday, September 7, 2007 - 10:17 pm

Do you have a hpet? If not, have you tried using acpi_use_timer_override wi=
th=20
apic?

bye,
=2D-=20
(=B0=3D                 =3D=B0)
//\ Prakash Punnoor /\\
V_/                 \_V
From: Chuck Ebbert
Date: Monday, September 10, 2007 - 12:12 pm

Yes, it has an hpet. And I tried every combination of options I could
think of.

But, even stranger, x86_64 works (only i386 fails.)
-

From: Andi Kleen
Date: Monday, September 10, 2007 - 12:44 pm

x86-64 has quite different time code (at least until the dyntick patches
currently in mm) 

Obvious thing would be to diff the boot messages and see if anything
jumps out (e.g. in interrupt routing).  

Or check with mm and if x86-64 is broken there too then it's likely
the new time code.

-Andi
-

From: Chuck Ebbert
Date: Monday, September 10, 2007 - 4:33 pm

This is Fedora 8 and it already has the highres-timers code in x86_64.
But I was still comparing 2.6.22 on i386 to 2.6.23-rc5-git1 + highres-timers
on x86_64. 2.6.23-rc5 on i386 seems okay too, so whatever is happening it
only occurs on 2.6.22 here.
-

From: Chuck Ebbert
Date: Thursday, September 13, 2007 - 9:38 am

I reported too soon that x86_64 works. It does not work, it just takes
a bit longer before it freezes. There are message threads all over the
place discussing this problem with the HP Pavilion tx 1000, and it seems
the best workaround is to use the "nolapic" option instead of "noapic".
Using that, it is totally stable _and_ there are no spurious interrupts
that would otherwise break USB. Interrupt setup is a bit strange, though:

           CPU0       CPU1       
  0:        241          0    XT-PIC-XT        timer
  1:          1        736   IO-APIC-edge      i8042
  2:          0          0    XT-PIC-XT        cascade
  5:         14      10028   IO-APIC-edge      sata_nv
  7:          0         57   IO-APIC-edge      ehci_hcd:usb1
  8:          0          0   IO-APIC-edge      rtc0
  9:          4       2463   IO-APIC-edge      acpi
 10:          2       2795   IO-APIC-edge      HDA Intel
 11:        740     478806   IO-APIC-edge      ohci_hcd:usb2, eth0
 12:         42      19911   IO-APIC-edge      i8042
 14:          5       7958   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
NMI:          0          0 
LOC:    4617310    4617213 
ERR:          0
-

From: Thomas Gleixner
Date: Tuesday, September 25, 2007 - 2:06 am

Chuck,


can you please send me 32 and 64 bit boot logs of mainline and fedora
kernels ?

	tglx


-

From: Andrew Morton
Date: Saturday, September 15, 2007 - 12:39 am

There are 48 bugs in bugzilla which mention "noapic"

http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwo...

And there are 173,000 on the internet ;)
http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search

We screwed this pooch a long time ago - years.  Perhaps if some of the many
noapic users could run a bisection search to work out when it broke we
could start fixing things.  But they all have a workaround so there's no
motivation.

-

From: Ingo Oeser
Date: Saturday, September 15, 2007 - 3:58 am

I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS),
one is from June 2006 (Gigabyte M57SLI-S4).

There are many reasons:

1. Bugs which have such a simple workaround don't get much attention.

2. Usually SMP boards are used for machines, which just HAVE to work,
   since they have been expensive. These are not consumer boards.

3. I usually had only USB problems (no IRQ), if ommiting noapic.
   USB technology is a cosumer grade technology and enterprise
   grade developers don't have much interest in it (until now?).

4. IRQ routing setup is often a BIOS issue. You might be able
   to fix that by upgrading your BIOS. That often needs a Windows
   tool. Linux people not always (want to) have access to Windows :-)

I reported the all the problems (starting 2001), no developer 
seemed interested.

I can report them against the latest RC6 kernel tomorrow and put them
into bugzilla, if we now REALLY care.


Best Regards

Ingo Oeser
-

From: Andrew Morton
Date: Saturday, September 15, 2007 - 4:08 am

I believe that about two years ago we broke something which caused quite a
large number of people to need noapic.  Is that the case with any of your
machines?  Do you know if they run 2.6.ancient without noapic?

Thanks.

-

From: Matthew Garrett
Date: Saturday, September 15, 2007 - 5:08 am

My recollection is that we shifted from "Enable the apic even if the 
BIOS disabled it" to "Only use the apic if the BIOS didn't disable it" 
around that time, which meant that distributions could actually turn on 
apic-on-up support without breaking everything. That might correspond to 
what you're seeing.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Dave Jones
Date: Monday, September 24, 2007 - 2:32 pm

On Sat, Sep 15, 2007 at 01:08:25PM +0100, Matthew Garrett wrote:
 > On Sat, Sep 15, 2007 at 04:08:02AM -0700, Andrew Morton wrote:
 > 
 > > I believe that about two years ago we broke something which caused quite a
 > > large number of people to need noapic.  Is that the case with any of your
 > > machines?  Do you know if they run 2.6.ancient without noapic?
 > 
 > My recollection is that we shifted from "Enable the apic even if the 
 > BIOS disabled it" to "Only use the apic if the BIOS didn't disable it" 
 > around that time, which meant that distributions could actually turn on 
 > apic-on-up support without breaking everything. That might correspond to 
 > what you're seeing.

If memory serves correctly, that was circa 2.6.10, back in these commits..

commit a068ea13d1db406e15c346e93530343f6e70184c
Author: Len Brown <len.brown@intel.com>
Date:   Sun Oct 10 05:21:08 2004 -0400

    [ACPI] If BIOS disabled the LAPIC, believe it by default.
    "lapic" is available to force enabling the LAPIC
    in the event you know more than your BIOS vendor.
    http://bugzilla.kernel.org/show_bug.cgi?id=3238

commit 2fcfece90db9643b6f30a7ad343898a2871e6a81
Author: Len Brown <len.brown@intel.com>
Date:   Sat Oct 9 20:12:45 2004 -0400

    [ACPI] Don't enable LAPIC when the BIOS disabled it.
    Doing so apparently breaks every Dell on Earth.
    http://bugzilla.kernel.org/show_bug.cgi?id=3238


But those changes relate to the local APIC, which 'noapic' shouldn't
have any effect on should it ?

	Dave

-- 
http://www.codemonkey.org.uk
-

From: Phillip Susi
Date: Thursday, September 27, 2007 - 3:03 pm

If the LAPIC is disabled, then you CAN'T use the IO-APIC right?  So then
wouldn't the noapic option have no effects since the apic is already
disabled?



-

From: Rafael J. Wysocki
Date: Saturday, September 15, 2007 - 11:42 am

Well, I think it broke soon after 2.6.9.

Please see http://bugzilla.kernel.org/show_bug.cgi?id=3639#c10
-

Previous thread: [PATCH] prevent kswapd from freeing excessive amounts of lowmem by Rik van Riel on Wednesday, September 5, 2007 - 4:01 pm. (7 messages)

Next thread: use of asm/prom.h by Stephen Rothwell on Wednesday, September 5, 2007 - 4:39 pm. (1 message)