Re: Config NO_BOOTMEM breaks my amd64 box

Previous thread: [GIT PULL v2] perf fixes by Frederic Weisbecker on Tuesday, March 30, 2010 - 8:58 pm. (5 messages)

Next thread: linux-next: manual merge of the driver-core tree with the sh tree by Stephen Rothwell on Tuesday, March 30, 2010 - 10:03 pm. (1 message)
From: James Morris
Date: Tuesday, March 30, 2010 - 9:49 pm

Please make NO_BOOTMEM default to n, at least for amd64, where I've found 
that it leads to all kinds of strange, undebuggable boot hangs and errors 
(with relatively current Fedora development userland).

Also, the help text for the item makes little sense to a non-expert in 
this area:


" ---help---
          Use early_res directly instead of bootmem before slab is ready.
                - allocator (buddy) [generic]
                - early allocator (bootmem) [generic]
                - very early allocator (reserve_early*()) [x86]
                - very very early allocator (early brk model) [x86]
          So reduce one layer between early allocator to final allocator."

I had no idea what all this meant, so trusted the default=y and then spent 
several hours wondering why everything was breaking, and would likley not 
have figured it out in linear time without a suggestion from Dave Airlie.


- James
-- 
James Morris
<jmorris@namei.org>
--

From: H. Peter Anvin
Date: Tuesday, March 30, 2010 - 11:26 pm

Have you tested it with the latest fixes that are now in Linus' tree (-rc3)?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: James Morris
Date: Tuesday, March 30, 2010 - 11:47 pm

Yes, it was happening with -rc3.

-- 
James Morris
<jmorris@namei.org>
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 9:25 am

please send out bootlog if possible.

BTW please try
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git

Thanks

Yinghai
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 11:59 am

Could you please send the bootlog that Yinghai asked for, plus also one that 
you get with NO_BOOTMEM turned off (for comparison)?

Also, when did you first hit this bug? This code has been upstream for almost 
a month, and it was in linux-next before that - so you should have hit this 
much sooner. A rough timeframe would suffice. I suppose you were booting 
upstream kernels during the merge window as well?

We can flip the default around if there's no fix available based on the 
bootlogs. (Plus the help text should definitely be improved.)

Thanks,

	Ingo
--

From: Dave Airlie
Date: Wednesday, March 31, 2010 - 1:57 pm

A default y config option causing regressions still at rc3? and you guys
keep going? This is the sort of shit Linus would flame me for a day or two for,

Can we get some f'ing consistency here?

--

From: Linus Torvalds
Date: Wednesday, March 31, 2010 - 2:02 pm

Yeah. I think we need to remove the crap.

I thought the problems were known, and fixed in -rc3. Clearly they 
weren't. And by now it's not about changing the default any more - by now 
it's about removing the known-crap code.

		Linus
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 2:40 pm

Yeah.

It would still be nice to get the before/after bootlogs, because we'd like to 

Ok, we can certainly do that too.

Should we scrap the whole x86 bootmem conversion to begin with? I'm not sure 
there's any fundamentally less risky way to it so if we try this again in .35 
we might run into similar regressions and i'd like to avoid that. I wouldnt 
mind not having to do that at all, it's been a lot of pain to pull it off and 
the lmb conversion looks even more intrusive.

Thanks,

	Ingo
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 2:47 pm

Note, without trying to defend the bootmem conversion itself, which didnt work 
out well, this is not some optional new driver feature that was default-y 
randomly but it was an infrastructure change that was to be made unconditional 
in .35.

The flag was basically a testing/debug flag to allow the old code to be used 
too, in case the new code was buggy. This is what helped James to report this 
today, instead of forcing James through a very difficult ~14-reboot bisection.

Thanks,

	Ingo
--

From: Dave Airlie
Date: Wednesday, March 31, 2010 - 2:14 pm

Are you testing this btw with initramfs/initrds? I suspect lots of testing
is being done by people on monolithic kernels, this is just a misc guess,
considering I couldn't boot from when this landed until rc3 with this option
on a basic 32-bit install on a dual-core 64-bit CPU, it suggested a
hole of some sort
in the test coverage.

Dave
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 3:02 pm

so -rc3 is working your setup?

Yinghai
--

From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 3:28 pm

Hi Dave,

The only bug report I remember getting from you had no details and was
in reply to another bug report which was, indeed, addressed, so we had
every reason to believe it was being dealt with with the patchset which
did indeed go into -rc3 (and does address a problem with initramfs in
particular cases.)

Clearly James Morris' problem is something unrelated, and regardless of
course of action we need to track it down.

If you also are having problems with -rc3 we would really appreciate as
much detail as possible -- boot logs at the very minimum -- so we have
a chance to at all track down the problems that do exist.

	-hpa
--

From: James Morris
Date: Wednesday, March 31, 2010 - 3:58 pm

I don't have the old boot logs, and have since upgraded the system 
further.  

IIRC, the boot was failing after not being able to find the root fs 
(ext3/lvm/raid0).  I thought it was a dracut issue, but it seemed to be 

In this case, in the last few days (also when I first saw or noticed the 
bootmem option).  I was booting relatively recent linus kernels during the 
merge window, although my main work was being done on an older upstream 
kernel.


-- 
James Morris
<jmorris@namei.org>
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 4:02 pm

Please, could you send any bootlog then that we could work from? That way we 
could check the memory layout and guess the rough shape of the early 

Ok - initrd unpack failing or initial mount failing is consistent with the 
initrd getting corrupted by overlapping early reservations due to allocator 

Ok, so it's not an old regression but possibly a bug in one of the fixes. Not 
good.

	Ingo
--

From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 4:35 pm

This would rather match the problem that was addressed by the patch in
-rc3.  Any help in reproducing it would be great.

	-hpa
--

From: James Morris
Date: Wednesday, March 31, 2010 - 4:43 pm

Upgraded to the latest rawhide userland -- I have not since tested with 

-- 
James Morris
<jmorris@namei.org>
--

From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 4:48 pm

That would be great.  The sooner the better, obviously.

	-hpa
--

From: James Morris
Date: Wednesday, March 31, 2010 - 6:00 pm

I'm not seeing any problems now, with current Linus and rawhide.  I'll 
leave bootmem off and see if anything comes up again.


-- 
James Morris
<jmorris@namei.org>
--

From: Ingo Molnar
Date: Thursday, April 1, 2010 - 5:52 am

(a current bootlog would still be nice)

Dave, can you reproduce any of these problems with Linus's latest?

	Ingo
--

From: Ingo Molnar
Date: Wednesday, April 7, 2010 - 11:32 pm

ping? Can you or Dave reproduce the bug with -rc3 or later kernels? (If not 
then it probably means that the bug you triggered was already fixed at the 
time you reported it, as hpa suspected.)

Thanks,

	Ingo
--

From: Yinghai
Date: Thursday, April 8, 2010 - 12:00 am

James already reported -rc3 fix the problem for him.

Dave implied -rc3 fixed problem for him

Thanks

Yinghai
--

From: Ingo Molnar
Date: Thursday, April 8, 2010 - 12:27 am

Hm, i'm confused, does this mean that it was all fixed upstream already when 
Dave and James sent their complaints?

Would be nice to have a confirmation from Dave for that (beyond 'implying' 
it), to not keep this thread open-ended.

Thanks,

	Ingo
--

From: Dave Airlie
Date: Thursday, April 8, 2010 - 7:43 pm

Okay I built a linus head and it booted on the previously broken machine.
with CONFIG_NO_BOOTMEM=y

Dave.
--

From: James Morris
Date: Thursday, April 8, 2010 - 1:05 am

I haven't seen it since.


- James
-- 
James Morris
<jmorris@namei.org>
--

From: Ingo Molnar
Date: Thursday, April 8, 2010 - 1:22 am

Great, thanks!

	Ingo
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 3:05 pm

in case, you have one 32bit system without RAM installed on node0. please check

Thanks

Yinghai

Subject: [PATCH] x86: Fix 32bit system without RAM on Node0

when 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
becom low ram.

Try to use MAX_NUMNODES like 64 numa does.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_32.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c
+++ linux-2.6/arch/x86/mm/init_32.c
@@ -875,7 +875,12 @@ void __init mem_init(void)
 	BUG_ON(!mem_map);
 #endif
 	/* this will put all low memory onto the freelists */
+#if defined(CONFIG_NO_BOOTMEM) && defined(MAX_NUMNODES)
+	/* In case some 32bit systems don't have RAM installed on node0 */
+        totalram_pages += free_all_memory_core_early(MAX_NUMNODES);
+#else
 	totalram_pages += free_all_bootmem();
+#endif
 
 	reservedpages = 0;
 	for (tmp = 0; tmp < max_low_pfn; tmp++)
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 3:13 pm

So we get into this branch if CONFIG_NO_BOOTMEM is enabled but MAX_NUMNODES is 

Btw., and i said this before, i absolutely hate the CONFIG_NO_BOOTMEM naming 
as well (a negative in the option), but it is was what expresses the 'this is 
where we want to go' state better and thus CONFIG_NO_BOOTMEM removal will be a 
straight removal instead of a removal of the inverse.

Thanks,

	Ingo
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 3:16 pm

yes. 

free_all_bootmem() will call
free_all_memory_core_early(NODE_DATA(0)->node_id);

Thanks

Yinghai Lu
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 3:41 pm

Well and that whole #ifdeffery is disgusting as well - even if the goal was to 
remove CONFIG_NO_BOOTMEM ASAP.

Please learn to use proper intermediate helper functions and at minimum put 
the conversion ugliness somewhere that doesnt intrude our daily flow in .c 
files. The best rule is to _never ever_ put an #ifdef construct into a .c 
file. It doesnt matter what the goal if the #ifdef is - such ugliness in code 
is never justified.

Thanks,

	Ingo
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 3:47 pm

if you agree that i can have one nobootmem.c in mm/

Thanks

Yinghai
--

From: Ingo Molnar
Date: Wednesday, March 31, 2010 - 3:56 pm

I think what we want is your lmb series, with CONFIG_NO_BOOTMEM eliminated 
altogether and x86 converted to pure (extended) lmb facilities, and without 
any traces of bootmem left in x86.

I.e. a really clean series with no CONFIG_NO_BOOTMEM kind of #ifdef crap left 
around. This means 'nobootmem.c' (albeit saner than an #ifdef jungle) would be 
moot as well.

We tried the dual model as it seemed prudent from a testing/conversion POV 
(and it certainly allowed people to turn the new code off), but it's rather 
ugly and we still have bugs left.

This means that if Linus likes that approach the conversion will be very 
binary and very painful. The other option would be to go back to bootmem and 
forget about the whole nobootmem and lmb thing.

	Ingo
--

From: Johannes Weiner
Date: Wednesday, March 31, 2010 - 5:01 pm

That does not make much sense as bootmem is not only used on the architecture
side but also in generic code.  So you either have to emulate the API on x86

I think this was an implementation thing rather than a problem with the model
per se.

As written above, you can hardly get away without emulating the bootmem API

I suppose it would be safest to replace early_res with lmb first to get
in sync with the other archs using it.

Step two would be to extend LMB and implement a bootmem emulation API on
top of it so that architectures can switch over to non-bootmem mode one
by one.  Then you can drop the real bootmem code and switch generic code
to use LMB natively, also site by site.  And finally, drop the emulation API.

If other architectures object to removing bootmem, there really is no point
for x86 to even try it.

For step one to work out, it's probably easiest to fully revert to the
.33 state than having to replace early_res while in its current state?
--

From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 4:34 pm

That would be better, or more commonly, use inlines.

I'm still totally puzzled about this patch as well as the comment:

+#if defined(CONFIG_NO_BOOTMEM) && defined(MAX_NUMNODES)
+	/* In case some 32bit systems don't have RAM installed on node0 */
+        totalram_pages += free_all_memory_core_early(MAX_NUMNODES);
+#else
 	totalram_pages += free_all_bootmem();
+#endif


Why is that "32 bits" specific?  Second, MAX_NUMNODES is defined
whenever <linux/numa.h> is included, so what on Earth is this supposed
to signify?  Are you trying to say MAX_NUMNODES > 1?  Or are you trying
to say CONFIG_NUMA?

Furthermore, I really don't see the connection between this and James
Morris' reported problem, which he reports as "amd64", which presumably
is an x86-64 kernel and not 32 bits...  James, is that correct?  Any
more details you can give about the system?  I *really* don't want to go
into cargo cult programming mode, that would suck eggs no matter what.

	-hpa



--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 4:54 pm

you are right, this one should be more clear.

Subject: [PATCH -v2] nobootmem, x86: Fix 32bit system without RAM on Node0

when 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
becom low ram.

Try to use MAX_NUMNODES like 64 numa does.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 mm/bootmem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -303,7 +303,7 @@ unsigned long __init free_all_bootmem_no
 unsigned long __init free_all_bootmem(void)
 {
 #ifdef CONFIG_NO_BOOTMEM
-	return free_all_memory_core_early(NODE_DATA(0)->node_id);
+	return free_all_memory_core_early(MAX_NUMNODES);
 #else
 	return free_all_bootmem_core(NODE_DATA(0)->bdata);

it happened one of my test setup, node0 ram disappear somehow.
and i found the 32bit numa doesn't work on that.

Thanks

Yinghai
--

From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 5:35 pm

... which is useful and valid, but I still think this isn't related to
James' problem, if James' problem wasn't actually fixed in -rc3.  That's
the part that I'm afraid I have to be confused about... all the known
problems except the above are fixed in -rc3, and I'd at least like to
have a validated bug report of any sort before saying it should all be
tossed.

This patch looks a lot better.  The whole use of MAX_NUMNODES as a
sentinel (which appears inherited from mm/page_alloc.c, and as such is a
pre-existing convention which is also invoked here) really could use a
comment, though.

	-hpa
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 6:07 pm

sure. will have updated one with coments there

Thanks

Yinghai
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 7:02 pm

on one system without RAM on nod0, got following dump with 32bit numa kernel

early_node_map[4] active PFN ranges
    1: 0x00000010 -> 0x00000099
    1: 0x00000100 -> 0x0007da00
    1: 0x0007e800 -> 0x0007ffa0
    1: 0x0007ffae -> 0x0007ffb0

Subtract (29 early reservations)
  #000 [0000001000 - 0000002000]
  #001 [0000089000 - 000008f000]
  #002 [0000091000 - 0000093500]
  #003 [0000094000 - 0000099000]
  #004 [0000099400 - 0000100000]
  #005 [0000200000 - 0000eb7644]
  #006 [0000eb8000 - 0000ec327c]
  #007 [007c400000 - 007c40e000]
  #008 [007c440000 - 007c44e000]
  #009 [007c480000 - 007c48e000]
  #010 [007c4c0000 - 007c4ce000]
  #011 [007c500000 - 007c50e000]
  #012 [007c540000 - 007c54e000]
  #013 [007c580000 - 007c58e000]
  #014 [007c5c0000 - 007c5ce000]
  #015 [007c674000 - 007cbfe000]
  #016 [007cbfe500 - 007cbfe530]
  #017 [007cbfe540 - 007cbfe5d0]
  #018 [007cbfe600 - 007cbfe620]
  #019 [007cbfe640 - 007cbfe660]
  #020 [007cbfe680 - 007cbfe684]
  #021 [007cbfe6c0 - 007cbfe6c4]
  #022 [007cbfe700 - 007cbfe77e]
  #023 [007cbfe780 - 007cbfe7fe]
  #024 [007cbfe800 - 007cbfec54]
  #025 [007cbfec80 - 007cbfeede]
  #026 [007cbfef00 - 007cbfef2d]
  #027 [007cbfef40 - 007e800000]
  #028 [007e9ca000 - 007ff95000]
(0 free memory ranges)
Initializing HighMem for node 0 (00000000:00000000)
Initializing HighMem for node 1 (00000000:00000000)
Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xff637000 - 0xfffff000   (10016 kB)
    pkmap   : 0xff200000 - 0xff400000   (2048 kB)
    vmalloc : 0xc07b0000 - 0xff1fe000   (1002 MB)
    lowmem  : 0x40000000 - 0xbffb0000   (2047 MB)
      .init : 0x40d39000 - 0x40db2000   ( 484 kB)
      .data : 0x40881924 - 0x40d38e1c   (4829 kB)
      .text : 0x40200000 - 0x40881924   (6662 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
swapper: page allocation failure. order:0, mode:0x0
Pid: 0, comm: ...
From: H. Peter Anvin
Date: Wednesday, March 31, 2010 - 8:18 pm

Please address the separate bug fix in a separate patch.


--
Sent from my mobile phone, pardon any lack of formatting.
From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 8:30 pm

ok.
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 8:45 pm

When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
become low ram.

this one fixes BOOTMEM path by loop bdata_list.

-v3: add more comments, and fix bootmem path too.
-v4: seperate from one big patch

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 mm/bootmem.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -312,7 +312,13 @@ unsigned long __init free_all_bootmem(vo
 	 */
 	return free_all_memory_core_early(MAX_NUMNODES);
 #else
-	return free_all_bootmem_core(NODE_DATA(0)->bdata);
+	unsigned long total_pages = 0;
+	bootmem_data_t *bdata;
+
+	list_for_each_entry(bdata, &bdata_list, list)
+		total_pages += free_all_bootmem_core(bdata);
+
+	return total_pages;
 #endif
 }
 
--

From: tip-bot for Yinghai Lu
Date: Thursday, April 1, 2010 - 3:57 pm

Commit-ID:  aa235fc712f379d4194cff9217f07026c452c141
Gitweb:     http://git.kernel.org/tip/aa235fc712f379d4194cff9217f07026c452c141
Author:     Yinghai Lu <yinghai@kernel.org>
AuthorDate: Wed, 31 Mar 2010 20:45:27 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 1 Apr 2010 14:41:19 -0700

bootmem, x86: Fix 32bit numa system without RAM on node 0

When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, the lowest populated node
becomes low RAM.

This one fixes BOOTMEM path by iterating over the bdata_list.

-v3: add more comments, and fix bootmem path too.
-v4: seperate from one big patch

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4BB416D7.6090203@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 mm/bootmem.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 2058cb7..ba37d62 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -312,7 +312,13 @@ unsigned long __init free_all_bootmem(void)
 	 */
 	return free_all_memory_core_early(MAX_NUMNODES);
 #else
-	return free_all_bootmem_core(NODE_DATA(0)->bdata);
+	unsigned long total_pages = 0;
+	bootmem_data_t *bdata;
+
+	list_for_each_entry(bdata, &bdata_list, list)
+		total_pages += free_all_bootmem_core(bdata);
+
+	return total_pages;
 #endif
 }
 
--

From: Yinghai Lu
Date: Wednesday, March 31, 2010 - 8:44 pm

on one system without RAM on nod0, got following dump with 32bit numa kernel

early_node_map[4] active PFN ranges
    1: 0x00000010 -> 0x00000099
    1: 0x00000100 -> 0x0007da00
    1: 0x0007e800 -> 0x0007ffa0
    1: 0x0007ffae -> 0x0007ffb0
...
Subtract (29 early reservations)
  #000 [0000001000 - 0000002000]
  #001 [0000089000 - 000008f000]
  #002 [0000091000 - 0000093500]
...
  #027 [007cbfef40 - 007e800000]
  #028 [007e9ca000 - 007ff95000]
(0 free memory ranges)
Initializing HighMem for node 0 (00000000:00000000)
Initializing HighMem for node 1 (00000000:00000000)
Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
...
Checking if this processor honours the WP bit even in supervisor mode...Ok.
swapper: page allocation failure. order:0, mode:0x0
Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
Call Trace:
 [<4087a5dc>] ? printk+0xf/0x11
 [<40286728>] __alloc_pages_nodemask+0x417/0x487
 [<402a9ce1>] new_slab+0xe2/0x1fe
 [<402aa5b2>] kmem_cache_open+0x185/0x358
 [<402abbc0>] T.954+0x1c/0x60
 [<40d52a29>] kmem_cache_init+0x24/0x113
 [<40d39738>] start_kernel+0x166/0x2e4
 [<40d3940e>] ? unknown_bootoption+0x0/0x18e
 [<40d390ce>] i386_start_kernel+0xce/0xd5
Mem-Info:
Node 1 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Node 1 Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
active_anon:0 inactive_anon:0 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:0 slab_reclaimable:0 slab_unreclaimable:0
 mapped:0 shmem:0 pagetables:0 bounce:0

When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
become low ram.

Try to use MAX_NUMNODES like 64 numa does.

Note: BOOTMEM path has the same problem.
      this bug exist before We have NO_BOOTMEM ...
From: tip-bot for Yinghai Lu
Date: Thursday, April 1, 2010 - 3:57 pm

Commit-ID:  337998587f802535896e9ed16d19f97915ccd368
Gitweb:     http://git.kernel.org/tip/337998587f802535896e9ed16d19f97915ccd368
Author:     Yinghai Lu <yinghai@kernel.org>
AuthorDate: Wed, 31 Mar 2010 20:44:09 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 1 Apr 2010 14:39:29 -0700

nobootmem, x86: Fix 32bit numa system without RAM on node 0

On one system without RAM on node0, got following boot dump with a 32
bit NUMA kernel:

early_node_map[4] active PFN ranges
    1: 0x00000010 -> 0x00000099
    1: 0x00000100 -> 0x0007da00
    1: 0x0007e800 -> 0x0007ffa0
    1: 0x0007ffae -> 0x0007ffb0
...
Subtract (29 early reservations)
  #000 [0000001000 - 0000002000]
  #001 [0000089000 - 000008f000]
  #002 [0000091000 - 0000093500]
...
  #027 [007cbfef40 - 007e800000]
  #028 [007e9ca000 - 007ff95000]
(0 free memory ranges)
Initializing HighMem for node 0 (00000000:00000000)
Initializing HighMem for node 1 (00000000:00000000)
Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
...
Checking if this processor honours the WP bit even in supervisor mode...Ok.
swapper: page allocation failure. order:0, mode:0x0
Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
Call Trace:
 [<4087a5dc>] ? printk+0xf/0x11
 [<40286728>] __alloc_pages_nodemask+0x417/0x487
 [<402a9ce1>] new_slab+0xe2/0x1fe
 [<402aa5b2>] kmem_cache_open+0x185/0x358
 [<402abbc0>] T.954+0x1c/0x60
 [<40d52a29>] kmem_cache_init+0x24/0x113
 [<40d39738>] start_kernel+0x166/0x2e4
 [<40d3940e>] ? unknown_bootoption+0x0/0x18e
 [<40d390ce>] i386_start_kernel+0xce/0xd5
Mem-Info:
Node 1 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Node 1 Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
active_anon:0 inactive_anon:0 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:0 slab_reclaimable:0 slab_unreclaimable:0
 mapped:0 shmem:0 pagetables:0 ...
From: Stefan Richter
Date: Wednesday, March 31, 2010 - 3:51 am

I too noticed this absolutely catastrophic "help" text but forgot to
send a bug report.

Either this option can be explained and the text fixed, or it cannot be
explained and shouldn't be an option in the first place.
-- 
Stefan Richter
-=====-==-=- --== =====
http://arcgraph.de/sr/
--

Previous thread: [GIT PULL v2] perf fixes by Frederic Weisbecker on Tuesday, March 30, 2010 - 8:58 pm. (5 messages)

Next thread: linux-next: manual merge of the driver-core tree with the sh tree by Stephen Rothwell on Tuesday, March 30, 2010 - 10:03 pm. (1 message)