Boot initialisation has always been a bit of a mess with a number
of ugly points. While significant amounts of the initialisation
is architecture-independent, it trusts of the data received from the
architecture layer. This was a mistake in retrospect as it has resulted in
a number of difficult-to-diagnose bugs.This patchset is an RFC to add some validation and tracing to memory
initialisation. It also introduces a few basic defencive measures and
depending on a boot parameter, will perform additional tests for errors
"that should never occur". I think this would have reduced debugging time
for some boot-related problems. The last part of the patchset is a similar
fix for the patch "[patch] mm: sparsemem memory_present() memory corruption"
that corrects a few more areas where similar errors were made.I'm not looking to merge this as-is obviously but are there opinions on
whether this is a good idea in principal? Should it be done differently or
not at all?--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
very nice stuff!
Acked-by: Ingo Molnar <mingo@elte.hu>
or rather:
Very-Strongly-Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
--
There are a number of different views to how much memory is currently
active. There is the arch-independent zone-sizing view, the bootmem allocator
and SPARSEMEMs view. Architectures register this information at different
times and is not necessarily in sync particularly with view to some SPARSEMEM
limitations.This patch introduces mminit_validate_physlimits() which is able to validate
and correct PFN ranges with respect to SPARSEMEM limitations. Ordinarily
they will be fixed silently but if mminit_debug_level is MMINIT_VERIFY or
higher, a message will be printed to dmesg.This fixes the same problem as fixed by "[patch] mm: sparsemem
memory_present() memory corruption fix" in a slightly different way. This
patch would obviously be rebased on top of that fix.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---mm/bootmem.c | 1 +
mm/internal.h | 9 +++++++++
mm/page_alloc.c | 2 ++
mm/sparse.c | 24 ++++++++++++++++++++++++
4 files changed, 36 insertions(+)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0030_display_zonelist/mm/bootmem.c linux-2.6.25-rc9-0040_defensive_pfn_checks/mm/bootmem.c
--- linux-2.6.25-rc9-0030_display_zonelist/mm/bootmem.c 2008-04-11 21:32:29.000000000 +0100
+++ linux-2.6.25-rc9-0040_defensive_pfn_checks/mm/bootmem.c 2008-04-16 14:45:08.000000000 +0100
@@ -91,6 +91,7 @@ static unsigned long __init init_bootmem
bootmem_data_t *bdata = pgdat->bdata;
unsigned long mapsize;+ mminit_validate_physlimits(&start, &end);
bdata->node_bootmem_map = phys_to_virt(PFN_PHYS(mapstart));
bdata->node_boot_start = PFN_PHYS(start);
bdata->node_low_pfn = end;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0030_display_zonelist/mm/internal.h linux-2.6.25-rc9-0040_defensive_pfn_checks/mm/internal.h
--- linux-2.6.25-rc9-0030_display_zonelist/mm/internal.h 2008-04-16 14:44:46.000000000 +0100
+++ linux-2.6.25-rc9-0040_defensive_pfn_checks/mm/internal.h 2008-04-16 14:45:08.000000...
small request: please emit a WARN_ON_ONCE() as well, so that
ditto - all errors should be fixed up and we should try to continue as
far as possible, but emitting a WARN_ON_ONCE() will be very useful in
making sure people notice the warning.Ingo
--
This patch prints out the zonelists during boot for manual verification by
the user. This is useful for checking if the zonelists were somehow corrupt
during initialisation.Note that this patch will not work in -mm due to differences in how zonelists
are used. This is specific to how 2.6.25-rc9 works but a similar version for -mm
would be straight-forward enough.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---mm/internal.h | 1 +
mm/mm_init.c | 40 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 1 +
3 files changed, 42 insertions(+)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0020_memmap_init_debug/mm/internal.h linux-2.6.25-rc9-0030_display_zonelist/mm/internal.h
--- linux-2.6.25-rc9-0020_memmap_init_debug/mm/internal.h 2008-04-16 14:44:32.000000000 +0100
+++ linux-2.6.25-rc9-0030_display_zonelist/mm/internal.h 2008-04-16 14:44:46.000000000 +0100
@@ -67,6 +67,7 @@ enum mminit_levels {
MMINIT_TRACE
};+extern void mminit_verify_zonelist(void);
extern void mminit_verify_pageflags(void);
extern void mminit_verify_page_links(struct page *page, enum zone_type zone,
unsigned long nid, unsigned long pfn);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0020_memmap_init_debug/mm/mm_init.c linux-2.6.25-rc9-0030_display_zonelist/mm/mm_init.c
--- linux-2.6.25-rc9-0020_memmap_init_debug/mm/mm_init.c 2008-04-16 14:44:32.000000000 +0100
+++ linux-2.6.25-rc9-0030_display_zonelist/mm/mm_init.c 2008-04-16 14:44:46.000000000 +0100
@@ -10,6 +10,46 @@ int __initdata mminit_debug_level;#define MMINIT_BUF_LEN 256
+/* Note that the verification of correctness is required from the user */
+void mminit_verify_zonelist(void)
+{
+ int nid;
+
+ if (mminit_debug_level < MMINIT_VERIFY)
+ return;
+
+ for_each_online_node(nid) {
+ pg_data_t *pgdat = NODE_DATA(nid);
+ struct zone *zone;
+ struct zone **z;
+ int zoneid;
+
+ for (zoneid = 0; zoneid < MAX_ZONELISTS; zoneid++) {
+ zone = &...
This patch prints out information on how the page flags are being used and
verifies they are correct if mminit_debug_level is MMINIT_VERIFY or higher.
When the page flags are updated with section, node and zone information, an
additional check is made to ensure the values can be retrieved correctly. The
final check made with respect to pages is that pfn_to_page() and page_to_pfn()
are returning sensible values.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---mm/internal.h | 3 ++
mm/mm_init.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++-
mm/page_alloc.c | 6 ++++
3 files changed, 73 insertions(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0010_mminit_debug_framework/mm/internal.h linux-2.6.25-rc9-0020_memmap_init_debug/mm/internal.h
--- linux-2.6.25-rc9-0010_mminit_debug_framework/mm/internal.h 2008-04-16 14:44:19.000000000 +0100
+++ linux-2.6.25-rc9-0020_memmap_init_debug/mm/internal.h 2008-04-16 14:44:32.000000000 +0100
@@ -67,6 +67,9 @@ enum mminit_levels {
MMINIT_TRACE
};+extern void mminit_verify_pageflags(void);
+extern void mminit_verify_page_links(struct page *page, enum zone_type zone,
+ unsigned long nid, unsigned long pfn);
extern void mminit_debug_printk(unsigned int level, const char *prefix,
const char *fmt, ...);
#endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-0010_mminit_debug_framework/mm/mm_init.c linux-2.6.25-rc9-0020_memmap_init_debug/mm/mm_init.c
--- linux-2.6.25-rc9-0010_mminit_debug_framework/mm/mm_init.c 2008-04-16 14:44:19.000000000 +0100
+++ linux-2.6.25-rc9-0020_memmap_init_debug/mm/mm_init.c 2008-04-16 14:44:32.000000000 +0100
@@ -4,11 +4,74 @@
*/
#include <linux/kernel.h>
#include <linux/init.h>
+#include "internal.h"int __initdata mminit_debug_level;
#define MMINIT_BUF_LEN 256
+void __init mminit_verify_pageflags(void)
+{
+ unsigned long shift = 0;
+ if (mminit_debug_level < MMINIT_VERIFY)
+ return;...
FLAGS_RESERVED no longer exists in mm. Its dynamically calculated.
It may be useful to print out NR_PAGEFLAGS instead and show the area in
the middle of page flags that is left unused and that may be used by
arches such as sparc64.--
That's a good point. I'll do that on a version I rebase to -mm. V2 will
still be based on mainline.Thanks.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
This patch creates a new file mm/mm_init.c which memory initialisation should
be moved to over time to avoid further polluting page_alloc.c. This patch
introduces a simple mminit_debug_printk() function and an (undocumented)
mminit_debug_level commmand-line parameter for setting the level of tracing
and verification that should be done.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---mm/Makefile | 2 +-
mm/internal.h | 9 +++++++++
mm/mm_init.c | 40 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 16 ++++++++++------
4 files changed, 60 insertions(+), 7 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-clean/mm/internal.h linux-2.6.25-rc9-0010_mminit_debug_framework/mm/internal.h
--- linux-2.6.25-rc9-clean/mm/internal.h 2008-04-11 21:32:29.000000000 +0100
+++ linux-2.6.25-rc9-0010_mminit_debug_framework/mm/internal.h 2008-04-16 14:44:19.000000000 +0100
@@ -60,4 +60,13 @@ static inline unsigned long page_order(s
#define __paginginit __init
#endif+/* Memory initilisation debug and verification */
+enum mminit_levels {
+ MMINIT_NORMAL,
+ MMINIT_VERIFY,
+ MMINIT_TRACE
+};
+
+extern void mminit_debug_printk(unsigned int level, const char *prefix,
+ const char *fmt, ...);
#endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-rc9-clean/mm/Makefile linux-2.6.25-rc9-0010_mminit_debug_framework/mm/Makefile
--- linux-2.6.25-rc9-clean/mm/Makefile 2008-04-11 21:32:29.000000000 +0100
+++ linux-2.6.25-rc9-0010_mminit_debug_framework/mm/Makefile 2008-04-16 14:44:19.000000000 +0100
@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o $(mmu-y)
+ page_isolation.o mm_init.o $(mmu-y)obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
obj-$(CONFIG_BOUNCE) += bounce.o
diff -rup -X /usr/src/patchset-0.6/bin//dontd...
another small suggestion: could you please also add a Kconfig method of
enabling it, dependent on KERNEL_DEBUG, default-off (for now). The best
would be not a numeric switch but something that gets randomized by
"make randconfig". I.e. an on/off switch kind of things.Ingo
--
Makes sense. I've this and your other suggestions incorporated. It'll be
part of V2.--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Andrew Morton | Re: 2.6.23-rc6-mm1 |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH take 2] pkt_sched: Protect gen estimators under est_lock. |
