Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Lee Schermerhorn
Date: Wednesday, September 12, 2007 - 8:09 am

On Wed, 2007-09-12 at 19:38 +0530, Balbir Singh wrote:

System panics within a few seconds of starting the test.

NaT == Not a Thing.  Kernel reports null pointer deref as such.  I
believe that NaT Consumption errors come from attempting to deref a
non-NULL pointer that points at non-existent memory.

I tried the workload again with an "unpatched kernel" -- i.e., no
automatic page migration nor replication, nor any other of my
experimental patches.  Still happens with memory controller configured
-- same stack trace.

Then I tried an unpatched 23-rc4-mm1 with memory controller NOT
configured, still panic'ed, but with a different symptom:  first a soft
lockup, then a NULL pointer deref--apparently in soft lockup detection
code.  Panics because it OOPses in interrupt handler.

Tried again, same kernel--mem controller unconfig'd:  this time I got
the original stack trace--NaT Consumption in shrink_active_list().
Then, softlockup with NULL pointer deref therein.  It's the null pointer
deref that causes the panic:  "Aiee, killing interrupt handler!"

So, maybe memory controller is "off the hook".

I guess I need to check the lists for 23-rc4-mm1 hot fixes, and try to
bisect rc4-mm1.


right.  I noticed that after I sent the mail.  

Also, config available at:
http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont



Later,
Lee

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Aut ..., Lee Schermerhorn, (Wed Sep 12, 8:09 am)
Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: ..., Lee Schermerhorn, (Wed Sep 12, 10:04 am)
[PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: U ..., Lee Schermerhorn, (Wed Sep 12, 12:46 pm)