The old node-agnostic code tried allocating on all nodes starting from
the one with the lowest range. alloc_bootmem_core retried without the
goal if it could not satisfy it and so the goal was only respected at
all when it happened to be on the first (lowest page numbers) node (or
theoretically if allocations failed on all nodes before to the one
holding the goal).Introduce a non-panicking helper that starts allocating from the node
holding the goal and falls back only after all these tries failed.Make all other allocation helpers benefit from this new helper.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
CC: Ingo Molnar <mingo@elte.hu>
CC: Yinghai Lu <yhlu.kernel@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
---mm/bootmem.c | 90 ++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 52 insertions(+), 38 deletions(-)--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -408,6 +408,7 @@ static void * __init alloc_bootmem_core(
unsigned long size, unsigned long align,
unsigned long goal, unsigned long limit)
{
+ unsigned long fallback = 0;
unsigned long min, max, start, step;BUG_ON(!size);
@@ -441,10 +442,11 @@ static void * __init alloc_bootmem_core(max -= PFN_DOWN(bdata->node_boot_start);
start -= PFN_DOWN(bdata->node_boot_start);
+ fallback -= PFN_DOWN(bdata->node_boot_start);if (bdata->last_success > start) {
- /* Set goal here to trigger a retry on failure */
- start = goal = ALIGN(bdata->last_success, step);
+ fallback = start;
+ start = ALIGN(bdata->last_success, step);
}while (1) {
@@ -491,10 +493,39 @@ find_block:
return region;
}+ if (fallback) {
+ start = ALIGN(fallback, step);
+ fallback = 0;
+ goto find_block;
+ }
+
+ return NULL;
+}
+
+static void * __init ___alloc_bootmem_nopanic(unsigned long size,
+ unsigned long align,
+ unsigned long goal,
+ unsigned long limit)
+{
+ bootmem_data_t *bdata;
+
+re...
Hmm, my ia64 (NUMA) box can't boot up with this patch.
I'll chase its cause deeply tomorrow.--
Yasunori Goto--
Hi,
Changed this to break, because we don't need to search any further if
the current node already starts at/above the limit (remember, we walk a
list sorted by ->node_boot_start here).I also made the checks more intuitively understandable.
Could you try the following fix on top of this patch?
Hannes
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -513,10 +513,10 @@ restart:
list_for_each_entry(bdata, &bdata_list, list) {
void *region;- if (goal && goal < bdata->node_boot_start)
- continue;
- if (limit && limit < bdata->node_boot_start)
+ if (goal && bdata->node_low_pfn <= PFN_DOWN(goal))
continue;
+ if (limit && bdata->node_boot_start >= limit)
+ break;region = alloc_bootmem_core(bdata, size, align, goal, limit);
if (region)
--
I thought this fallback was wrong at first,
because fallback may point 0 at this time,
it doesn't point start_pfn of this node.But even if here is commented out, kernel can't boot up yet.
I'd like to straggle more, but may be need more time,
because, IA64 doesn't have early_printk, and console is not enable
at here.....P.S.
I was very confused by local variable namimng in alloc_bootmem_core.
I suppose start, max, and end, should be named like
sidx, eidx, and midx. They are not pfn, but index of bitmap.However, new_start and new_end should be named as new_start_offset and
new_end_offset. They are not index, but offset from start address of
the node.Probably, it will be easier to read, I think.
Bye.
--
Yasunori Goto--
Hi,
Oh, that should go out, sorry. It is a left-over from another way to do
Hm, just to make sure: this is the patch that breaks booting, right? If
you apply all patches in the series before this one, the machine boots
fine?Could you boot a working image with bootmem_debug in the command line?
Perhaps seeing the usual bootmem usage on this box gives a hint what isYes, that too. I would also rename last_offset to last_eidx and
last_success to last_sidx. What do you think?Hannes
--
Last_sidx is ok. But, last_offset seems to be used to manage some
allocated smaller chunks than one page. I'm not sure last_eidx is ok.Bye.
--
Yasunori Goto--
Hi,
Sorry, my fault.
How about last_offset -> last_end_off to reflect that it is the offset
of the last allocations end?And last_succes -> hint_idx to reflect that it is an index we start
searching from but it is not strict and we fall back if we find nothing
starting from there. Also free_bootmem* sets it as a hint from where we
could start searching.I also would set last_success/hint_idx to the _end_ of the successful
allocation (instead of the beginning of it) in alloc_bootmem_core
because we do not want to search for a new free block from the beginning
of the last allocation but rather right after it.What do you think?
Hannes
--
I gotcha! :-)
max -= PFN_DOWN(bdata->node_boot_start);
start -= PFN_DOWN(bdata->node_boot_start);
+ fallback -= PFN_DOWN(bdata->node_boot_start);if (bdata->last_success > start) {
- /* Set goal here to trigger a retry on failure */
- start = goal = ALIGN(bdata->last_success, step);
+ fallback = start; -------------------------------- (*)
+ start = ALIGN(bdata->last_success, step);
}(*) is root cause. "fallback" is set as 0, because start is index of bitmap
at here. When normal zone is allocated first, and DMA zone is required by
alloc_bootmem_low() later and first page is free yet, fallback is set as 0.+ if (fallback) {
+ start = ALIGN(fallback, step);
+ fallback = 0;
+ goto find_block;
+ }
+As a result, this retry code is skipped, and alloc_bootmem_low() fails.
So, when I change here from fallback to a retry_flag, my box can boot up.Thanks.
--
Yasunori Goto--
Hi,
Ah, you got there too :-)
Here is the original Email I wrote but waited to send out until after I
slept:---
Hi Yasunori,
here is hopefully the fix to your bootup problem. Let me explain:
This is the last but one bootmem allocation on your box:
Jun 5 11:50:43 localhost kernel: bootmem::alloc_bootmem_core nid=0 size=40000 [16 pages] align=80 goal=100000000 limit=0
Jun 5 11:50:43 localhost kernel: bootmem::__reserve nid=0 start=1020588 end=1020598 flags=1(->last_success is at 1020588 now)
And this is the last one:
Jun 5 11:50:43 localhost kernel: bootmem::alloc_bootmem_core nid=0 size=8000 [2 pages] align=80 goal=0 limit=ffffffff
Goal is zero. So sidx is set to zero as well. last_success is bigger
than sidx, so fallback gets the value of sidx and sidx is updated to
last_succes. But now sidx (1020588) is bigger than the limit (3ffff)
and we fail the first search iteration because of that. We should
fall back to the original sidx stored in fallback now but take a look
at what value fallback must have for this branch to be taken...I am pretty sure this is the bug you encountered. I spotted it in the
code but your logfile proves my theory.Hannes
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -444,7 +444,11 @@ static void * __init alloc_bootmem_core(
midx = max - PFN_DOWN(bdata->node_boot_start);if (bdata->last_success > sidx) {
- fallback = sidx;
+ /*
+ * Handle the valid case of sidx being zero and still
+ * catch the fallback below.
+ */
+ fallback = sidx + 1;
sidx = ALIGN(bdata->last_success, step);
}@@ -493,7 +497,7 @@ find_block:
}if (fallback) {
- sidx = ALIGN(fallback, step);
+ sidx = ALIGN(fallback - 1, step);
fallback = 0;
goto find_block;
}
--
Haha, I didn't notice your previous mail when I posted it. :-).
Ok. I'll test newer patch set.
Bye
--
Yasunori Goto--
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Alan Stern | Re: 2.6.22-rc2-mm1 |
| Satyam Sharma | Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures |
| William Lee Irwin III | Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] |
git: | |
| Dale Farnsworth | Re: [PATCH 03/39] mv643xx_eth: shorten reg names |
| Jarek Poplawski | Re: HTB accuracy for high speed |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
