Re: 2.6.25.1: Kernel BUG at mm/rmap.c:669, General Protection Faults, and generic hard locks

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Randy Johnson <theraptor2005@...>
Cc: <akpm@...>, <linux-mm@...>, <linux-kernel@...>
Date: Friday, May 16, 2008 - 7:09 am

On Mon, 12 May 2008, Randy Johnson wrote:

I expect your "mem=3200M" is just fine, I'm fond of "mem=" myself;
but be aware that you can get into trouble with it, and I've heard
"memmap=" recommended instead.  If you're unfamiliar with that,
try Documentation/kernel-parameters.txt or googling.


memtest86+ overnight was certainly the right thing to try;
but I'm not convinced by its success.  Maybe there's a pattern
in Matlab which is tickling a bad RAM issue more effectively
than memtest does (sometimes gcc hits problems which memtest
hasn't shown).  And since (sadly!) you have plenty of memory
to spare, it'd be well worth switching boards around: your
lowest bank does look suspect (and I'm guessing 2.6.25.1 just
places things differently from 2.6.22, some important data now
being placed on bad RAM where something unused went before).

I could perfectly well be wrong about all that: maybe you do have
a kernel bug corrupting your memory; but I've no idea where if so.


That's the most interesting line of it: page_mapcount(page) isn't
off-by one or something like that, instead its high byte has been
corrupted at some point from 0x00 to 0x8c.

(Unfortunately, what with all the printk'ing that's gone on, I'm not
at all confident whether or where the address of the page in question
is in the registers or stack displayed: the messages suit tracking
a relevant kernel bug rather than a random corruption.)

       ^
There it's doing the list_del(&page->lru) in buffered_rmqueue(),
and hitting a corrupted prev pointer: the top bit of the address has
been cleared, causing that and subsequent general protection faults
(same list pointer RCX and prev contents RDX each time).

But I'm afraid that tells me nothing about the cause of these
corruptions.  If you've gathered more crash logs during the week,
please do post the logs or send them to me privately, I'll try
to decipher what I can - but that may not help you much.

Hugh
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: 2.6.25.1: Kernel BUG at mm/rmap.c:669, General Protectio..., Hugh Dickins, (Fri May 16, 7:09 am)