login
Header Space

 
 

Linux: Using 1GB of RAM Without HighMem

January 13, 2006 - 6:47pm
Submitted by Jeremy on January 13, 2006 - 6:47pm.
Linux news

Jens Axboe began an lkml thread saying, "it does annoy me that any 1G i386 machine will end up with 1/8th of the memory as highmem." He then provided a patch that adds a kernel configuration option to control how memory is divided among kernel space and user space [story]. The new option explains that accessing high memory is a little more costly than accessing low memory, as high memory needs to be mapped into the kernel first. The patch had an informative and interesting evolution, receiving much feedback including suggestions from both Linus Torvalds and Ingo Molnar [">interview]. Toward the end of the thread, Jens explained in more detail what the kernel option provides:

"Basically the option boils down to how much virtual address space you want to assign to the kernel and user space. The kernel can always access all of memory, but in some cases part of that memory will be available as high memory that needs to be mapped in first (see references to kmap() and kmap_atomic() in the kernel). So whether changing the mapping or using highmem is the best option for you, depends entirely on what you run on that machine. If you require a huge user address space, then you don't want to change away from the 3/1 user/kernel default setting. However, if you don't need the full 3G of adress space to user apps, then you are better off increasing the kernel address space range to get rid of the high memory mapping."


From: Jens Axboe [email blocked]
To:  linux-kernel
Subject: 2G memory split
Date:	Tue, 10 Jan 2006 13:58:53 +0100

Hi,

It does annoy me that any 1G i386 machine will end up with 1/8th of the
memory as highmem. A patch like this one has been used in various places
since the early 2.4 days at least, is there a reason why it isn't merged
yet? Note I just hacked this one up, but similar patches abound I'm
sure. Bugs are mine.

Signed-off-by: Jens Axboe [email blocked]

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index d849c68..0b2457b 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -444,6 +464,24 @@ config HIGHMEM64G
 
 endchoice
 
+choice
+	depends on NOHIGHMEM
+	prompt "Memory split"
+	default DEFAULT_3G
+	help
+	  Select the wanted split between kernel and user memory. On a 1G
+	  machine, the 3G/1G default split will result in 128MiB of high
+	  memory. Selecting a 2G/2G split will make all of memory available
+	  as low memory. Note that this will make your kernel incompatible
+	  with binary only kernel modules.
+
+	config DEFAULT_3G
+		bool "3G/1G user/kernel split"
+	config DEFAULT_2G
+		bool "2G/2G user/kernel split"
+
+endchoice
+
 config HIGHMEM
 	bool
 	depends on HIGHMEM64G || HIGHMEM4G
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index 73296d9..be5f6b6 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -110,10 +110,22 @@ extern int page_is_ram(unsigned long pag
 #endif /* __ASSEMBLY__ */
 
 #ifdef __ASSEMBLY__
+#if defined(CONFIG_DEFAULT_3G)
 #define __PAGE_OFFSET		(0xC0000000)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET		(0x80000000)
+#else
+#error" Bad memory split"
+#endif
 #define __PHYSICAL_START	CONFIG_PHYSICAL_START
 #else
+#if defined(CONFIG_DEFAULT_3G)
 #define __PAGE_OFFSET		(0xC0000000UL)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET		(0x80000000UL)
+#else
+#error "Bad memory split"
+#endif
 #define __PHYSICAL_START	((unsigned long)CONFIG_PHYSICAL_START)
 #endif
 #define __KERNEL_START		(__PAGE_OFFSET + __PHYSICAL_START)

-- 
Jens Axboe


From: Ingo Molnar [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 14:29:57 +0100 * Jens Axboe [email blocked] wrote: > Hi, > > It does annoy me that any 1G i386 machine will end up with 1/8th of > the memory as highmem. A patch like this one has been used in various > places since the early 2.4 days at least, is there a reason why it > isn't merged yet? Note I just hacked this one up, but similar patches > abound I'm sure. Bugs are mine. yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits were possible. It was a larger patch to enable all this across x86, but the Kconfig portion was removed a bit later because people _frequently_ misconfigured their kernels and then complained about the results. so for now the trivial solution is to change the "C" to "8" in the following line in include/asm-i386/page.h: > #define __PAGE_OFFSET (0xC0000000) instead of editing your .config :-) Maybe we could try the Kconfig solution again, but it'll need alot better documentation, dependency on KERNEL_DEBUG and some heavy warnings all around. Ingo
From: Jens Axboe [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 14:37:29 +0100 On Tue, Jan 10 2006, Ingo Molnar wrote: > > * Jens Axboe [email blocked] wrote: > > > Hi, > > > > It does annoy me that any 1G i386 machine will end up with 1/8th of > > the memory as highmem. A patch like this one has been used in various > > places since the early 2.4 days at least, is there a reason why it > > isn't merged yet? Note I just hacked this one up, but similar patches > > abound I'm sure. Bugs are mine. > > yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits > were possible. It was a larger patch to enable all this across x86, but > the Kconfig portion was removed a bit later because people _frequently_ > misconfigured their kernels and then complained about the results. How is this different than all other sorts of misconfigurations? As far as I can tell, the biggest "problem" for some is if they depend on some binary module that will of course break with a different page offset. For simplicity, I didn't add more than the 2/2 split, where we could add even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this). > so for now the trivial solution is to change the "C" to "8" in the > following line in include/asm-i386/page.h: > > > #define __PAGE_OFFSET (0xC0000000) > > instead of editing your .config :-) :-) That is what I have been doing, but that requires me to carry this patch along with me all the time. So it annoys me! I would have posted a simple patch moving it to 0xB0000000 which would solve the problem for me as well, but I didn't because I'm sure people would be screaming at me... > Maybe we could try the Kconfig solution again, but it'll need alot > better documentation, dependency on KERNEL_DEBUG and some heavy warnings > all around. The help text could definitely be improved, it was a 30 second hackup. Why would you want to make it depend on DEBUG? -- Jens Axboe
From: Jens Axboe [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 15:39:31 +0100 On Tue, Jan 10 2006, Byron Stanoszek wrote: > On Tue, 10 Jan 2006, Jens Axboe wrote: > > >>yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits > >>were possible. It was a larger patch to enable all this across x86, but > >>the Kconfig portion was removed a bit later because people _frequently_ > >>misconfigured their kernels and then complained about the results. > > > >How is this different than all other sorts of misconfigurations? As far > >as I can tell, the biggest "problem" for some is if they depend on some > >binary module that will of course break with a different page offset. > > > >For simplicity, I didn't add more than the 2/2 split, where we could add > >even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this). > > I prefer setting __PAGE_OFFSET to (0x78000000) on machines with 2GB of RAM. > This seems to let the kernel use the full 2GB of memory, rather than just > 1920-1984 MB (at least back in 2.4 days). A newer version, trying to cater to the various comments in here. Changes: - Add 1G_OPT split, meant for 1GiB machines. Uses 0xB0000000 - Add 1G/3G split - Move the 2G/2G a little, so the full 2GiB of ram can be mapped. - Improve help text (I hope :) - Make option depend on EXPERIMENTAL. - Make the page.h a lot more readable. --- Add option for configuring the page offset, to better optimize the kernel for higher memory machines. Enables users to get rid of high memory for eg a 1GiB machine. Signed-off-by: Jens Axboe [email blocked] diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index d849c68..fcad8f7 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -444,6 +464,32 @@ config HIGHMEM64G endchoice +choice + depends on NOHIGHMEM && EXPERIMENTAL + prompt "Memory split" + default DEFAULT_3G + help + Select the wanted split between kernel and user memory. + + If the address range available to the kernel is less than the + physical memory installed, the remaining memory will be available + as "high memory". Accessing high memory is a little more costly + than low memory, as it needs to be mapped into the kernel first. + + Note that selecting anything but the default 3G/1G split will make + your kernel incompatible with binary only modules. + + config DEFAULT_3G + bool "3G/1G user/kernel split" + config DEFAULT_3G_OPT + bool "3G/1G user/kernel split (for full 1G low memory)" + config DEFAULT_2G + bool "2G/2G user/kernel split" + config DEFAULT_1G + bool "1G/3G user/kernel split" + +endchoice + config HIGHMEM bool depends on HIGHMEM64G || HIGHMEM4G diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h index 73296d9..7da50a1 100644 --- a/include/asm-i386/page.h +++ b/include/asm-i386/page.h @@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag #endif /* __ASSEMBLY__ */ +#if defined(CONFIG_DEFAULT_3G) +#define __PAGE_OFFSET_RAW (0xC0000000) +#elif defined(CONFIG_DEFAULT_3G_OPT) +#define __PAGE_OFFSET_RAW (0xB0000000) +#elif defined(CONFIG_DEFAULT_2G) +#define __PAGE_OFFSET_RAW (0x78000000) +#elif defined(CONFIG_DEFAULT_1G) +#define __PAGE_OFFSET_RAW (0x40000000) +#else +#error "Bad user/kernel offset" +#endif + #ifdef __ASSEMBLY__ -#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET __PAGE_OFFSET_RAW #define __PHYSICAL_START CONFIG_PHYSICAL_START #else -#define __PAGE_OFFSET (0xC0000000UL) +#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW) #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START) #endif #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START) -- Jens Axboe
From: Ingo Molnar [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 15:44:12 +0100 * Jens Axboe [email blocked] wrote: > + Select the wanted split between kernel and user memory. > + > + If the address range available to the kernel is less than the > + physical memory installed, the remaining memory will be available > + as "high memory". Accessing high memory is a little more costly > + than low memory, as it needs to be mapped into the kernel first. make it _ALOT_ more clear that mere mortals should not touch this option! Also, you do not mention the userspace-VM fragmentation issues. Plus, if a user uses a 2G/2G split with more than 2G of RAM, the kernel should print a warning that it's running with a non-default split. Do the same if the user uses a non-default split with less than 960MB of RAM. > + > + Note that selecting anything but the default 3G/1G split will make > + your kernel incompatible with binary only modules. it's not 'will' but 'may', and even then, tons of .config things can break bin-only modules, so just skip this paragraph. looks good to me otherwise, with the text fixes it's: Acked-by: Ingo Molnar [email blocked] Ingo
From: Jens Axboe [email blocked] Subject: [PATCH] Address space split configuration Date: Tue, 10 Jan 2006 16:03:31 +0100 On Tue, Jan 10 2006, Ingo Molnar wrote: > > * Jens Axboe [email blocked] wrote: > > > + Select the wanted split between kernel and user memory. > > + > > + If the address range available to the kernel is less than the > > + physical memory installed, the remaining memory will be available > > + as "high memory". Accessing high memory is a little more costly > > + than low memory, as it needs to be mapped into the kernel first. > > make it _ALOT_ more clear that mere mortals should not touch this > option! Also, you do not mention the userspace-VM fragmentation issues. > Plus, if a user uses a 2G/2G split with more than 2G of RAM, the kernel > should print a warning that it's running with a non-default split. Do > the same if the user uses a non-default split with less than 960MB of > RAM. I added the < 960MiB warning, but not for the 2G/2G as the option depends on NOHIGHMEM right now. I also changed the help text again, hope you are happy with it now. > > + > > + Note that selecting anything but the default 3G/1G split will make > > + your kernel incompatible with binary only modules. > > it's not 'will' but 'may', and even then, tons of .config things can > break bin-only modules, so just skip this paragraph. Killed. > looks good to me otherwise, with the text fixes it's: > > Acked-by: Ingo Molnar [email blocked] Thanks! Updated patch below. --- Add option for configuring the page offset, to better optimize the kernel for higher memory machines. Enables users to get rid of high memory for eg a 1GiB machine. Signed-off-by: Jens Axboe [email blocked] Acked-by: Ingo Molnar [email blocked] diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index d849c68..20d1423 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -444,6 +464,35 @@ config HIGHMEM64G endchoice +choice + depends on NOHIGHMEM && EXPERIMENTAL + prompt "Memory split" + default DEFAULT_3G + help + Select the wanted split between kernel and user memory. + + If the address range available to the kernel is less than the + physical memory installed, the remaining memory will be available + as "high memory". Accessing high memory is a little more costly + than low memory, as it needs to be mapped into the kernel first. + Note that increasing the kernel address space limits the range + available to user programs, making the address space there + tighter. + + If you are not absolutely sure what you are doing, leave this + option alone! + + config DEFAULT_3G + bool "3G/1G user/kernel split" + config DEFAULT_3G_OPT + bool "3G/1G user/kernel split (for full 1G low memory)" + config DEFAULT_2G + bool "2G/2G user/kernel split" + config DEFAULT_1G + bool "1G/3G user/kernel split" + +endchoice + config HIGHMEM bool depends on HIGHMEM64G || HIGHMEM4G diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c index 7df494b..67f1da0 100644 --- a/arch/i386/mm/init.c +++ b/arch/i386/mm/init.c @@ -597,6 +597,12 @@ void __init mem_init(void) high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1; #endif +#if !defined(CONFIG_DEFAULT_3G) + /* if the user has less than 960MB of RAM, he should use the default */ + if (max_low_pfn < (960 * 1024 * 1024 / PAGE_SIZE)) + printk(KERN_INFO "Memory: less than 960MiB of RAM, you should use the default memory split setting\n"); +#endif + /* this will put all low memory onto the freelists */ totalram_pages += free_all_bootmem(); diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h index 73296d9..7da50a1 100644 --- a/include/asm-i386/page.h +++ b/include/asm-i386/page.h @@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag #endif /* __ASSEMBLY__ */ +#if defined(CONFIG_DEFAULT_3G) +#define __PAGE_OFFSET_RAW (0xC0000000) +#elif defined(CONFIG_DEFAULT_3G_OPT) +#define __PAGE_OFFSET_RAW (0xB0000000) +#elif defined(CONFIG_DEFAULT_2G) +#define __PAGE_OFFSET_RAW (0x78000000) +#elif defined(CONFIG_DEFAULT_1G) +#define __PAGE_OFFSET_RAW (0x40000000) +#else +#error "Bad user/kernel offset" +#endif + #ifdef __ASSEMBLY__ -#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET __PAGE_OFFSET_RAW #define __PHYSICAL_START CONFIG_PHYSICAL_START #else -#define __PAGE_OFFSET (0xC0000000UL) +#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW) #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START) #endif #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START) -- Jens Axboe
From: Linus Torvalds [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 08:14:47 -0800 (PST) On Tue, 10 Jan 2006, Jens Axboe wrote: > > A newer version, trying to cater to the various comments in here. > Changes: Can we do one final cleanup? Do all the magic in _one_ place, namely the x86 Kconfig file. Also, I don't think the NOHIGHMEM dependency is necessarily correct. A 2G/2G split can be advantageous with a 16GB setup (you'll have more room for dentries etc), but you obviously want to have HIGHMEM for that.. Do it something like this: choice depends on EXPERIMENTAL prompt "Memory split" default DEFAULT_3G help Select the wanted split between kernel and user memory. If the address range available to the kernel is less than the physical memory installed, the remaining memory will be available as "high memory". Accessing high memory is a little more costly than low memory, as it needs to be mapped into the kernel first. Note that selecting anything but the default 3G/1G split will make your kernel incompatible with binary only modules. config DEFAULT_3G bool "3G/1G user/kernel split" config DEFAULT_3G_OPT bool "3G/1G user/kernel split (for full 1G low memory)" config DEFAULT_2G bool "2G/2G user/kernel split" config DEFAULT_1G bool "1G/3G user/kernel split" endchoice config PAGE_OFFSET hex default 0xC0000000 default 0xB0000000 if DEFAULT_3G_OPT default 0x78000000 if DEFAULT_2G default 0x40000000 if DEFAULT_1G and then asm-i386/page.h can just do #define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET) and you're done. If you ever want to change the offsets, you're only changing the Kconfig file, and as you can tell, the syntax is actually much _nicer_ that using the C preprocessor, since these kinds of choices is exactly what the Kconfig language is all about. Please? Linus
From: Linus Torvalds [email blocked] Subject: Re: 2G memory split Date: Tue, 10 Jan 2006 09:28:53 -0800 (PST) On Tue, 10 Jan 2006, Mark Lord wrote: > > So, the patch would now look like this: Yes, except I think we need to make the "depends on" include !X86_PAE: > +choice > + depends on EXPERIMENTAL depends on EXPERIMENTAL && !X86_PAE since PAE depends on the 3G/1G split (we could make it work for a pure 2G/2G split, but that's a separate issue, and then we'd need to change the CONFIG_PAGE_OFFSET defaults to be something like default 0x80000000 if VMSPLIT_2G && X86_PAE (but that's definitely not appropriate for now - that's a separate issue, after somebody has verified that PAE and 2G:2G works) Also, I think the arch/i386/mm/init.c snippet should just be removed. If we make the split configurable, I don't see that we should warn about a configuration where you have less memory than the point where the split makes sense. A distribution (either something like Fedora _or_ just a internal company "standard image") migth decide to use 2G:2G, but not all machines might have lots of memory. Warning about it would be silly. Anyway, this should go into -mm, and I'd rather have it stay there for a while. I've got tons of stuff for 2.6.16 already, I'd prefer to not see this kind of thing too.. Linus
From: Mark Lord [email blocked] Subject: [PATCH ] VMSPLIT config options (with default config fixed) Date: Tue, 10 Jan 2006 14:16:19 -0500 Okay, fixed the ordering of the "default" lines so that the Kconfig actually works correctly. Best for Andrew to soak this one in -mm. Signed-off-by: Mark Lord [email blocked] diff -u --recursive --new-file --exclude='.*' linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig --- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500 +++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500 @@ -448,6 +448,43 @@ endchoice +choice + depends on EXPERIMENTAL && !X86_PAE + prompt "Memory split" + default VMSPLIT_3G + help + Select the desired split between kernel and user memory. + + If the address range available to the kernel is less than the + physical memory installed, the remaining memory will be available + as "high memory". Accessing high memory is a little more costly + than low memory, as it needs to be mapped into the kernel first. + Note that increasing the kernel address space limits the range + available to user programs, making the address space there + tighter. Selecting anything other than the default 3G/1G split + will also likely make your kernel incompatible with binary-only + kernel modules. + + If you are not absolutely sure what you are doing, leave this + option alone! + + config VMSPLIT_3G + bool "3G/1G user/kernel split" + config VMSPLIT_3G_OPT + bool "3G/1G user/kernel split (for full 1G low memory)" + config VMSPLIT_2G + bool "2G/2G user/kernel split" + config VMSPLIT_1G + bool "1G/3G user/kernel split" +endchoice + +config PAGE_OFFSET + hex + default 0xB0000000 if VMSPLIT_3G_OPT + default 0x78000000 if VMSPLIT_2G + default 0x40000000 if VMSPLIT_1G + default 0xC0000000 + config HIGHMEM bool depends on HIGHMEM64G || HIGHMEM4G diff -u --recursive --new-file --exclude='.*' linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h --- linux-2.6.15/include/asm-i386/page.h 2006-01-02 22:21:10.000000000 -0500 +++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500 @@ -110,10 +110,10 @@ #endif /* __ASSEMBLY__ */ #ifdef __ASSEMBLY__ -#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET CONFIG_PAGE_OFFSET #define __PHYSICAL_START CONFIG_PHYSICAL_START #else -#define __PAGE_OFFSET (0xC0000000UL) +#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET) #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START) #endif #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
From: Jens Axboe [email blocked] Subject: Re: [PATCH ] VMSPLIT config options (with default config fixed) Date: Tue, 10 Jan 2006 20:27:09 +0100 On Tue, Jan 10 2006, Mark Lord wrote: > Okay, fixed the ordering of the "default" lines > so that the Kconfig actually works correctly. > > Best for Andrew to soak this one in -mm. > > Signed-off-by: Mark Lord [email blocked] Signed-off-by: Jens Axboe [email blocked] -- Jens Axboe
From: "J.A. Magallon" [email blocked] Subject: Re: [PATCH ] VMSPLIT config options (with default config fixed) Date: Wed, 11 Jan 2006 02:13:18 +0100 On Tue, 10 Jan 2006 14:16:19 -0500, Mark Lord [email blocked] wrote: > Okay, fixed the ordering of the "default" lines > so that the Kconfig actually works correctly. > > Best for Andrew to soak this one in -mm. > > Signed-off-by: Mark Lord [email blocked] > Working nice on top of 2.6.15-mm2. Even with the 'evil binary' nVidia driver 8178 ;). In fact, I have been using the 1Gb-lowmem patch on -mm and the nVidia driver since long ago, without problems. I really like to see this in -mm, and finally in mainline. My only objection is about the menu entry names and help. I think people building a kernel would not exactly understand what all this is about (even I think I don't have it realle clear). Is there any doc which states clearly somthing like: - no highmem is the fastest - 4Gb introduces one indirection, so it is slower...(really ?) - 64Gb introduces two (PAE ?) mixed with - 3G/1G standard maping: - nor user nor kernel can use any memory above 860 Mb - user processes (my numbercruncher) can not allocate more than XGb - 2G/2G: idem: - max memory seen by my linux system (not kernel, but kernel+userspace, - how much can I allocate for a single process (how big my problem can be ?) If there is already a doc like that, it would be very interesting to have pointer/link to it in the help text. For example, when I read this: + If the address range available to the kernel is less than the + physical memory installed, the remaining memory will be available + as "high memory". Accessing high memory is a little more costly + than low memory, as it needs to be mapped into the kernel first. Does this mean that with 3/1 standard split, I still can use the lost 128 Mb for something ? I though I can't. Don't be too hard with me, just anxious to finally understand this... -- J.A. Magallon <jamagallon()able!es> \ Software is like sex: werewolf!able!es \ It's better when it's free Mandriva Linux release 2006.1 (Cooker) for i586 Linux 2.6.15-jam2 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
From: Jens Axboe [email blocked] Subject: Re: [PATCH ] VMSPLIT config options (with default config fixed) Date: Wed, 11 Jan 2006 11:15:24 +0100 On Wed, Jan 11 2006, J.A. Magallon wrote: > I really like to see this in -mm, and finally in mainline. It's in -mm now. > My only objection is about the menu entry names and help. I think > people building a kernel would not exactly understand what all this is > about (even I think I don't have it realle clear). If they don't, they should not touch the option... > Is there any doc which states clearly somthing like: > > - no highmem is the fastest > - 4Gb introduces one indirection, so it is slower...(really ?) > - 64Gb introduces two (PAE ?) > > mixed with > > - 3G/1G standard maping: > - nor user nor kernel can use any memory above 860 Mb > - user processes (my numbercruncher) can not allocate more than XGb > - 2G/2G: idem: > - max memory seen by my linux system (not kernel, but kernel+userspace, > - how much can I allocate for a single process (how big my problem > can be ?) > > If there is already a doc like that, it would be very interesting to > have pointer/link to it in the help text. I think the help text is good enough, but it would definitely be nice with a fuller description of what exactly low and high memory is and the implications of the various settings. > For example, when I read this: > > + If the address range available to the kernel is less than the > + physical memory installed, the remaining memory will be available > + as "high memory". Accessing high memory is a little more costly > + than low memory, as it needs to be mapped into the kernel first. > > Does this mean that with 3/1 standard split, I still can use the lost > 128 Mb for something ? I though I can't. It tells you that the remaining memory is available as high memory, so it's not lost of course. It also tells you that accessing this high memory is indeed possible, but it's a little more costly since it needs to be mapped temporarily into the kernel address space. > Don't be too hard with me, just anxious to finally understand this... No worries, perhaps you will be the one writing the Documentation/ bit to accompany this then :-) Basically the option boils down to how much virtual address space you want to assign to the kernel and user space. The kernel can always access all of memory, but in some cases part of that memory will be available as high memory that needs to be mapped in first (see references to kmap() and kmap_atomic() in the kernel). So whether changing the mapping or using highmem is the best option for you, depends entirely on what you run on that machine. If you require a huge user address space, then you don't want to change away from the 3/1 user/kernel default setting. However, if you don't need the full 3G of adress space to user apps, then you are better off increasing the kernel address space range to get rid of the high memory mapping. For the "typical" case of 1GB machine, using the _OPT setting to just move the offset slightly is a really good choice as it only removes a little bit of the user address range. -- Jens Axboe



Related Links:

ck-sources has this option al

January 13, 2006 - 10:34pm
Anonymous (not verified)

ck-sources has this option already.
http://members.optusnet.com.au/ckolivas/kernel/

Huh?

January 13, 2006 - 10:37pm
Anonymous (not verified)

Ingo: "...make it _ALOT_ more clear that mere mortals should not touch this option!"

Jens: "For the "typical" case of 1GB machine, using the _OPT setting to just move the offset slightly is a really good choice as it only removes a little bit of the user address range."

Huh? So lets say I buy a 1GB machine tomorrow, and install my GNOME desktop with all the average desktop apps on it. Should I go with the default or change the VM split per this patch? What are the performance numbers? What is the implication of a "slightly smaller" user address range?

Distributions are the key

January 13, 2006 - 11:57pm
Anonymous (not verified)

This option is great yes but distributions like SuSE are not going to use it. Why, because they have to take into account for machines with more memory. Any chance we can have features that they WILL use because they don't use Preemption/4Kstacks as it is. Us desktop users need more help here.

Why, because they have to tak

January 14, 2006 - 6:48am

Why, because they have to take into account for machines with more memory
Changing the memory split doesn't mean you have to disable highmem. You can have a 1/3 2/2 or 3/1 split and still enable highmem. It just changes the amount of lowmem you have. My 1GB lowmem patch disables the highmem option simply because the audience I'm targetting has 1GB ram and wants to get all their memory available without enabling highmem.

VMSPLIT comments from -mm commit

January 18, 2006 - 12:03am
Mark Lord (not verified)

Here (below) are the supplementary comments I gave Andrew for the code submit to -mm. On my own 2GB machine, I use the 2GB:2GB split with HIGHMEM disabled, which gives optimal performance for me. These options are most useful for personal machines, not servers: on large servers, database s/w sometimes needs a large virtual address space to avoid cumulative memory fragmentation over time -- so best to leave it with the 3GB default split.

-ml

- - - - - - -

Enable selection of different user/kernel VM splits for i386, including an optimized mode for 1GB physical RAM, which gives the kernel a direct (non HIGHMEM) mapping to the entire 1GB rather than just the first 896MB.

There is a similarly a similarly optimized mode for machines with exactly 2GB of physical RAM.

This can speed up the kernel by avoiding having to create/destroy temporary HIGHMEM mappings, and by not having to include HIGHMEM support at all on such machines. The flip side is that there's less virtual addressing left for userspace in these alternatives, and some binary-only kernel modules may misbehave unless rebuilt with the same VMSPLIT option as the main kernel image.

Original idea/patch from Jens Axboe, modified based on suggestions from Linus et al.

Non-Issue...

January 14, 2006 - 4:01am
Anonymous (not verified)

Just buy a 64-bit machine. Highmem is for wussies... ;)

Shouldn't 32bit be able to ha

January 17, 2006 - 8:20pm
Anonymous (not verified)

Shouldn't 32bit be able to handle 4GB as well?

Yes, but there are some subtleties.

January 18, 2006 - 12:46am

First: Caveat emptor. I'm not a VM hacker. I'm describing my understanding, and it may be flawed. It also only applies to x86. Other architectures may have different strategies.

The issue arises because the kernel must (in the absence of highmem or patches like 4G/4G) have direct mappings for all of the physical RAM at all times. This is in contrast to user space, which need not.

The user/kernel split divides the 4G address space into kernel-only addresses and user-only addresses. Rather than use segmented addressing, the kernel prefers a flat addressing model that uses the same overall address map for userspace and kernel. Where the split gets put affects how much virtual memory userspace can use, and how much physical memory kernel space can access directly.

Throw these two things together and you get into the situation we're in today: If there's too much physical memory, then we must resort to highmem. If the user/kernel split doesn't leave enough space for the kernel's physical mappings, then it needs to multiplex.

What highmem does is break the traditional kernel assumption that the kernel has direct mappings to all of the physical RAM. This is where the temporary mapping comes in. The kernel can still use all of the RAM, but it must jump through an extra step if it comes from the highmem pool. Thus, my understanding is that highmem mostly gets used for user-mode pages that aren't needed--basically as cache space. Ick.

The kernel doesn't use a 2G/2G split by default, as that would preclude processes from using more than 2G worth of virtual address space.

What I don't get is why the split isn't 2.9G/1.1G, giving an extra 128M to the kernel, so that 1GB machines (which are growing in numbers) don't run into this. Is there something magical about 1GB boundaries?

On a separate note: x86 can actually handle up to 64G, but that's by using highmem. Also, patches like Ingo Molnar's 4G/4G give userland and kernel completely disjoint address spaces, allowing the system to use, theoretically, up to 4G physical RAM without highmem. This comes at some cost, though, as all communication between user and kernel space needs to happen through temporary mappings, if I'm not mistaken.

(And, as I noted above, I could be well mistaken. VM hackers, if you care, please correct me!)

kernel auto-config

January 23, 2006 - 8:45am

What I don't get is why the split isn't 2.9G/1.1G, giving an extra 128M to the kernel, so that 1GB machines (which are growing in numbers) don't run into this. Is there something magical about 1GB boundaries?

why just don't make another Kconfig option to autoconfigure kernel depending on amount of available RAM ?

A few reasons I can think of.

January 23, 2006 - 10:43am

1) It potentially breaks compatibility for those binary-only kernel modules we love to hate but so far haven't successfully eliminated, and

2) Except for a couple obvious corner cases--1GB RAM comes to mind--it doesn't provide enough obvious benefit as compared to the huge can of worms it opens with respect to problem reproducability. The fewer configurations you have to choose from, the easier it is to make VM and related bugs reproduceable.

2.9G/1.1G split

January 25, 2006 - 5:29pm

I think this is what the VMSPLIT_3G_OPT option does. It changes the PAGE_OFFSET from 0xC0000000 to 0xB0000000 - this changes the split to 2.75G/1.25G, allowing the kernel to see the whole 1GiB physical ram.

The CONFIG_2G does basically the same thing (maybe it should be called CONFIG_2G_OPT for consistency?) setting the PAGE_OFFSET to 0x78000000, which change the split to 2.125/1.875, allowing the kernel to see a whole 2GiB physical ram.

Please correct me if I'm wrong.

Benchmarks?

January 14, 2006 - 7:01am

Does anyone know any good benchmarks about using vs. not using highmem? What kind of overheads you should expect? Just thinking about the 128MB highmem that I have.

The right thing.

January 14, 2006 - 8:24am
Anonymous (not verified)

Let me expose my ignorance.

Sometimes you do things no because of performance, but because it is the right way to do it. Kinda like rewriting a program in a more clear way, even without any efficiency gains. This is akin to the general practice of not using goto.

MHO.

The "no goto rule"

January 14, 2006 - 9:57am
Anonymous (not verified)

That noone follows in the Linux kernel... (Because of efficiency concerns)

Actually, no!

January 14, 2006 - 1:46pm
Anonymous (not verified)

It's for readability reasons. The compiler would produce the same code with nested if()s.

if you had more than just 2-3

January 14, 2006 - 5:32pm
Anonymous (not verified)

if you had more than just 2-3 goto's in a function, it becomes a lot easier to read with a "goto" than having to have a bunch of "if" "else" statements.

Yes, that was exactly my poin

January 14, 2006 - 7:17pm
Anonymous (not verified)

Yes, that was exactly my point...

They apparently do it for both reasons.

January 14, 2006 - 9:40pm

...or at least they did it at one time. I remember Alan Cox saying that on certain paths they had gotos in order to coax GCC into producing the code they wanted.

I can't imagine that being stable acrossa wide range of GCC releases, especially 2.x to EGCS to 2.9x to 3.x to 4.x. Each of those jumps brought some pretty major shifts in the code GCC generates.

Interesting. Nowadays you pro

January 15, 2006 - 8:38am
Anonymous (not verified)

Interesting. Nowadays you probably can pretty much control codegeneration with likely() and unlikely(), can't you?

Yes and no.

January 18, 2006 - 12:58am

likely() and unlikely() will bias code generation around conditional branches (aka. if-statements). For some of the funky state machines you see, especially those in TCP, there's no amount of likely() that will cause GCC to lay out the switch-case-statements-from-hell the way you'd like it to. Or, at least, that's the claim.

Honestly, I think such microoptimizations are too brittle to go in code as complex as some of that. My coworkers would think it's a tad ironic I'd say such a thing, given I'm known for optimizing loops down to crazy small numbers, from C code, but that's a different story altogether.

(My day job is as a DSP architect for Texas Instruments. And no, don't ask me for WiFi driver support. I'm disappointed in my company for not helping that effort more, but it doesn't surprise me. Besides, I don't know the particular team involved personally, and it's really an issue between the program manager and TI Legal as opposed to the engineers.)

What about the alternative?

January 14, 2006 - 4:06pm

My understanding is that the alternative is to give up 64K, and tell the kernel to only use 960K of 1024K. Sure, you're giving up about 6% of your memory, but are you *really* going to notice? Except for on pathelogical loads, I can't imagine that 64K will make a noticeable, measurable difference. Or am I smoking something?

You are. They're talking abou

January 14, 2006 - 7:47pm
Anonymous (not verified)

You are. They're talking about megabytes (MiB), not kilobytes (KiB).
Do you mind about sharing some? ;)

Quibble, correction and suggestion.

January 14, 2006 - 9:36pm

First, a quibble. If you're going to use those fancy new binary prefixes, use them correctly.

GiB, MiB and KiB are gibibytes (pronounced roughly "jibba bytes"), mebibytes ("mebba bytes") and kibibytes ("kibba bytes"). They refer to 2^30, 2^20 and 2^10 bytes respectively.

GB, MB and KB are gigabytes (usually pronounced "gigga bytes" (hard G), but "jigga bytes" is also correct), megabytes and kilobytes. When referring to RAM, they most often refer to the same quantities that GiB, MiB and KiB refer to. In most other situations (esp. mass storage), they more often refer to 10^9, 10^6 and 10^3 bytes, respectively.

I personally find the GiB/MiB/KiB system annoying because the acronym form is ugly and the prefixes make you sound like an imbecile when spoken aloud. When there's no risk of ambiguity, I use the older GB, MB and KB to refer to 2^30, 2^20 and 2^10. Just don't call MiB "megabytes" again, ok? It's mebibytes, as stupid as that word sounds.

Anyway, that little rant aside, I must say you are correct. You do have to give up 64M, not 64K. Oops.

Still, there's a different reason I brought it up. Why is it necessary to align the user-kernel split on a 1GB boundary? If you're within 64MB of not having highmem, why not just move the split by 64MB then? Why do you have to shift all the way to 2G/2G? It seems to me you could make the split 1.1G/2.9G, and solve this highmem issue for everyone with 1GB or less of RAM. Make it the default in the mainline kernel, and this whole debate largely evaporates.

Or am I missing something fundamental?

You have your prefixes and th

January 15, 2006 - 3:00am
Anonymous (not verified)

You have your prefixes and their bases reversed. KB, MB and GB refer to powers of 2, and KiB, MiB and GiB refer to powers of 10.

Oh my god, just read it up p

January 15, 2006 - 6:01am
Righteous preacher (not verified)

Oh my god,
just read it up plz:
http://en.wikipedia.org/wiki/Mebibyte
http://en.wikipedia.org/wiki/Binary_prefix

kilo (= greek "thousand") = 10^3, prefix "k"
mega (= greek "million) = 10^6, prefix "M"
giga (= greek "billion") = 10^9, prefix "G"
....etc...etc...

kibi (= nuenglish "kilobinary") = 2^10, prefix "Ki"
mebi = 2^20, prefix "Mi"
gibi = 2^30, prefix "Gi"
.... etc. etc.

KiB = To smoke. MiB = Men in

January 17, 2006 - 10:02am
Anonymous (not verified)

KiB = To smoke.
MiB = Men in Black.
GiB = The small chunks of meat/blood that fly away from a game character or monster when hit or killed.

KB = 1024 bytes
MB = 1024 KB
GB = 1024 MB

kB = 1000 bytes
mB = 1000 kB
gB = 1000 mB

Kb = 1000 bits
Mb = 1000 Kb
Gb = 1000 Mb

> kB = 1000 bytes > mB = 100

January 17, 2006 - 11:01am
Anonymous (not verified)

> kB = 1000 bytes
> mB = 1000 kB
> gB = 1000 mB
LOL

Why do you laugh? Sounds sens

January 17, 2006 - 12:28pm
Anonymous (not verified)

Why do you laugh? Sounds sensible, IMHO.

it does, until....

January 19, 2006 - 1:08pm
Anonymous (not verified)

It works, until you realize that lowercase m is used for milli (10^-3) and engineers will use lowercase k for kilo when the unit (i.e. meters) is also in lowercase...

Either way, this is ridiculous! :-)

Sorry

January 15, 2006 - 9:16pm
Anonymous (not verified)

No sorry, YOU have your prefixes and their bases reversed.

Technical Info.

January 15, 2006 - 9:04am
Anonymous (not verified)
  • The DDR400 module consumes aprox. 5 W per GiB.
  • The DDR2-533 module consumes aprox. 2.5 W per GiB (more frequency but more latencies).
  • With the typical option 1G/3G of IA32, the machine uses around 960 MiB available and around 64 MiB unavailable forever because the big mystery hole.
  • If your IA32 has 1 GiB of RAM (1x1GiB,2x512MiB) then your good option will be 1.1G/2.9G.
  • If your IA32 has 2 GiB of RAM (1x2GiB,2x1GiB,4x512MiB) then your good option will be 1.9G/2.0G (the hole of unavailable RAM is unevitable).
  • If your IA32 has 3 GiB of RAM (3x1GiB) then your good option will be 4.0G/4.0G (the slower spooling of pages if forced, there is no another better alternative).
  • If your IA32 has 4 GiB of RAM (2x2GiB,4x1GiB) then your good option will be 4.0G/4.0G (the slower spooling of pages if forced, there is no another better alternative).

Go to Athlon64!!!

  • If your A64 has 1 or 2 or 4 of RAM (1 GiB modules) then your good option will be 131072G/131072G and use x86-64 only mode!!!
  • And please, remove the 2 GB limit of the GNU/Linux's old swapd.
  • Imagine, 300 GB of virtual memory used by GCJ-4.x and GCC-4.x!!! (remember, TreeSSA is good).

Bugfix.

January 15, 2006 - 11:14am
Anonymous (not verified)

1.9G/2.0G (the hole of unavailable RAM is unevitable)

Yes, it's evitable! No hole! No spooling! It's fine!!!

1.9G/2.1G for a 2 GiB machine:

  • 1.9GiB+-0.0x for userspace.
  • 2.1GiB+--0.0x for kernelspace.

Its performance is the BEST!!!

Be careful, not fine, there are more details.

January 15, 2006 - 11:20am
Anonymous (not verified)
  • +-0.0x GiB for SVGA hole? SLI dual SVGA holes? ...
  • +-0.0x GiB for BIOSes holes?
  • +-0.0x GiB for NICs holes?
  • +-0.0x GiB for Chipsets holes? (IDE, USB-2.0, SATA, FW, PCI, PCI-E, ...)

Need a definitive "map" for GNU/Linux!!!

January 15, 2006 - 11:35am
Anonymous (not verified)

We need an explanation of the exhaustive map of the RAM physical addresses of the peripherals.

We need reserve the hardware's zones for a full 2 or 4 GiB generic machine (i386 and x86-64). So there won't are problems in the future System.map's addressing.

Hardware zones:

  • 0x00000000 .. 0x00000FFF
: the reserved physical 4KiB-page of the pointer NULL (to catch kernel's exception).
  • 0x000A0000 .. 0x000AFFFF
  • : the reserved 64 KiB VGA zone.
  • 0x000B0000 .. 0x000BFFFF
  • : the reserved 32+32KiB VGA MonoText & ColorText zone.
  • ...
  • Free zones:

    • 0x00001000 .. 0x0000FFFF
    .
  • ...
  • Enjoy you ;)

    confused...

    January 24, 2006 - 6:50pm
    tom (not verified)

    now, i'm so confused...

    my machine has.. 1,25 GB RAM. now, what to do with it?
    3G/1G split + highmem?
    2G/2G without highmem?
    throw the 256MB out the window and use -ck's true 1G lowmem?

    Use 2/2 without highmem

    January 24, 2006 - 7:04pm

    Use 2/2 without highmem

    re: Use 2/2 without highmem

    January 24, 2006 - 9:32pm
    tom (not verified)

    > Use 2/2 without highmem

    OK. thx Con!
    After 1 1/2 hours reading, i also understand why :P

    All the same to me

    February 26, 2006 - 6:31am
    Anonymous (not verified)

    KB, MB, GB... all the same to me :-)

    for me too :)) KB, MB, GB -

    April 10, 2007 - 3:11am
    thomas editor (not verified)

    for me too :)) KB, MB, GB - There is no difference

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.
    speck-geostationary