Re: 2.6.32-rc3: low mem - only 378MB on x86_32 with 64GB. Why?

Previous thread: Linux 2.6.32-rc3 build failure by Morten P.D. Stevens on Monday, October 5, 2009 - 8:22 am. (1 message)

Next thread: [BISECTED] "conservative" cpufreq governor broken by Steven Noonan on Monday, October 5, 2009 - 9:32 am. (23 messages)
From: Jeff Chua
Date: Monday, October 5, 2009 - 8:57 am

I've 3 systems with 4GB, 16GB and 64GB all running 32bit with these set:

 	CONFIG_X86_32=y
 	CONFIG_X86=y
 	CONFIG_ZONE_DMA=y
 	# CONFIG_ZONE_DMA32 is not set
 	CONFIG_SPARSEMEM_MANUAL=y
 	CONFIG_SPARSEMEM=y
 	CONFIG_HIGHMEM64G=y
 	CONFIG_HIGHMEM=y
 	CONFIG_X86_PAE=y
 	CONFIG_KSM=y
 	CONFIG_HIGHPTE=y


# using "free -lm" ...

# with 4GB
              total       used       free     shared    buffers     cached
Mem:          3983       3862        120          0        112       3542
Low:           850        738        112
High:         3132       3123          8
-/+ buffers/cache:        207       3775
Swap:         8008          0       8008


# with 16GB
              total       used       free     shared    buffers     cached
Mem:         16244       6570       9673          0        320       5559
Low:           750        717         33
High:        15493       5853       9640
-/+ buffers/cache:        690      15554
Swap:        26709          0      26709


# with 64GB
              total       used       free     shared    buffers     cached
Mem:         63995        524      63470          0          4        365
Low:           378         32        345
High:        63616        492      63124
-/+ buffers/cache:        154      63840
Swap:        28003          0      28003


Question is ... is there anyway to increase "low mem" without resorting to 
migrating to 64bit? (Look... it only has 378MB total low mem vs 850MB on 
the 4GB system). I've oracle installed on the 64GB system and it keeps 
getting OOMs!

I thought CONFIG_HIGHPTE (Allocate 3rd-level pagetables from highmem) is 
supposed to help with low mem as stated here ...

  CONFIG_HIGHPTE:
 	The VM uses one page table entry for each page of physical memory.
 	For systems with a lot of RAM, this can be wasteful of precious
 	low memory.  Setting this option will put user-space page table
 	entries in high memory.


Anything I should do? Slow death using sysctl like this ...
 ...
From: Linus Torvalds
Date: Monday, October 5, 2009 - 11:12 am

Don't.


All your low memory is used for the 'struct page' arrays that describe 

Use a 64-bit CPU and kernel or limit your memory to 8GB or so. No ifs, 
buts and maybe's.

			Linus
--

From: Linus Torvalds
Date: Monday, October 5, 2009 - 5:30 pm

As Byron said, it really should be sufficient to just upgrade the kernel 
(I presume that you already have a CPU that is 64-bit capable: not very 
many boards with old CPU's can even fit 64GB of ram).

In fact, I wish more people did that, just so that we'd get better 
coverage of the 32-bit compat code. We occasionally find issues there, 
although I think it's getting rarer.

			Linus
--

From: Tvrtko Ursulin
Date: Tuesday, October 6, 2009 - 3:06 am

Maybe convince some distro to offer this setup as an option at least? I always 
wondered why no one has, or maybe I missed it.

Tvrtko

--

From: Frans Pop
Date: Tuesday, October 6, 2009 - 4:01 am

Debian has a 64-bit kernel for its 32-bit "i386" architecture (and also for 
its 64-bit "amd64" architecture obviously):
- for stable: http://packages.debian.org/lenny/linux-image-2.6.26-2-amd64
- for unstable: http://packages.debian.org/sid/linux-image-2.6.30-2-amd64

Cheers,
FJP
--

From: Lennart Sorensen
Date: Wednesday, October 14, 2009 - 2:32 pm

I have been running that Debian setup for 5 years now.  It's great.
Other than 32 bit iptables not liking the 64bit kernel, I can't currently
recall any issues at any point.  Well and configure scripts often have to
be run with the 'linux32' wrapper to make them not try something stupid.

-- 
Len Sorensen
--

From: David Woodhouse
Date: Tuesday, October 6, 2009 - 5:59 am

We've been running PPC64 with 32-bit userspace for a long time, so most
of the issues with the 32-bit compat code ought to be dealt with.

There may be a few arch-specific issues for x86_64/i386, but not many.

-- 
dwmw2

--

From: Linus Torvalds
Date: Tuesday, October 6, 2009 - 7:26 am

The Intel Xorg guys used to do everything in 32-bit kernels because they 
were also testing that they didn't break compatibility (which is a big 
deal with that whole crazy DRM-in-kernel/direct-rendering-in-user-space 
thing) and seemingly didn't realize that the compat layer was _supposed_ 
to mean that they could run a 64-bit kernel and still have a working 
32-bit land.

It's driver interfaces like that that tend to break. ioctl's etc. But I 
have heard less noise about it lately, so I do think it's mostly working.

		Linus
--

From: Benjamin Herrenschmidt
Date: Sunday, October 11, 2009 - 2:08 am

I'm running a 32-bit Ubuntu karmic distro with a 64-bit kernel on my
thinkpad and so far, I yet have to encounter a single obvious problem
due to the compat layer. Everything seems to work fine including 3D,
compiz fancyness in X etc... :-)

Cheers,
Ben.



--

From: Linus Torvalds
Date: Tuesday, October 6, 2009 - 7:35 am

It's supposed to be easier these days, but I guess distros don't compile 
in support for 64-bit mode by default. Just using another machine may be 
the simplest approach, if you have any 64-bit distro around.

		Linus
--

From: Dave Hansen
Date: Thursday, October 8, 2009 - 9:35 am

Take a quick look in your .config in the "Executable file formats /
Emulations" section.  Make sure that you have ia32 emulation enabled:

CONFIG_IA32_EMULATION=y

That might not have been turned on if you used a .config from an old
32-bit kernel.  

-- Dave

--

From: Valdis.Kletnieks
Date: Saturday, October 10, 2009 - 11:10 am

When the MIPS, PowerPC, and Sparc architectures went from 32 to 64 bits,
they *did* take a bit of a performance hit because it basically doubled
the memory bandwidth usage.  However, they all had a reasonably large
number of registers in 32-bit mode.  When the x86 went 64-bit, the register
pressure relief from the additional registers usually more then outweighs
the additional memory bandwidth (basically, if you're spending twice as
much time on each load/store, but only doing it 40% as often, you come out
ahead...)
From: Linus Torvalds
Date: Saturday, October 10, 2009 - 11:37 am

That's mainly stack traffic, and x86 has always been good at it. More 
registers makes for simpler (and fewer) instructions due to less reloads, 
but for kernel loads, it's not the biggest advantage.

If you have 8GB of RAM or more, the biggest advantage _by_far_ for the 
kernel is that you don't spend 25% of your system time playing with 
k[un]map() and the TLB flushing that goes along with it. You also have 
much more freedom to allocate (and thus cache) inodes, dentries and 
various other fundamental kernel data structures.

Also, the reason MIPS and Sparc had a slowdown for 64-bit code was only 
partially the bigger cache footprint (and that depends a lot on the app 
anyway: many applications aren't that pointer-intensive. The kernel is 
_very_ pointer-intensive, but even for something like that, most data 
structures tend to blow up by 50%, not 100%).

The other reason for slowdown is that generating those pointers (for 
function calls in particular) is more complex, and x86-64 is better at 
that than MIPS and Sparc. That complex instruction encoding with 
variable-size instructions means that you don't have to try to fit all 
constants in the instruction stream either in the fixed-sized instruction, 
or by doing indirect data access to memory through a GP register. 

So x86-64 not only had the register expansion advantage, it had less of a 
code generation downside to 64-bit mode to begin with. Want to have large 
constants in the code? No problem. Sure, it makes your code bigger, but 
you can still have them predecoded in the instruction stream rather than 
have to load them from memory. Much nicer for everybody.

And for the kernel, the bigger virtual address space really is a _huge_ 
deal. HIGHMEM accesses really are very slow.  You don't see that in user 
space, but I really have seen 25% performance differences between 
non-highmem builds and CONFIG_HIGHMEM4G enabled for things that try to put 
a lot of data in highmem (and the 64G one is even more expensive). ...
From: Benjamin Herrenschmidt
Date: Sunday, October 11, 2009 - 2:06 am

My experience is that most distros have a compiler capable of generating
a 64-bits kernel, if not 64-bits userspace (the later depends on whether
the "other" bits such as libgcc, glibc, etc... are there for 64-bits,
which is also generally available, though optional).

I regulary compile 64-bit kernels with a 32-bit Ubuntu or Debian on i386

Cheers,
Ben.

--

From: Linus Torvalds
Date: Sunday, October 11, 2009 - 10:34 am

At least not Fedora x86. Doing "gcc -m64" results in

	sorry, unimplemented: 64-bit mode not compiled in

on my laptop.

			Linus
--

From: Byron Stanoszek
Date: Monday, October 5, 2009 - 11:42 am

I have some installations using a 64-bit kernel and a 32-bit OS (including
Oracle). This setup works really well and lets you use all available memory.
You don't even have to upgrade your OS; just swap out the kernel.

  -Byron

--

From: Dave Hansen
Date: Monday, October 5, 2009 - 12:15 pm

Heh.  You've really squeezed yourself into a bad situation.  Go get a
64-bit kernel... please.  You should be able to run 32-bit userspace
with a 64-bit kernel.  Do you have some 32-bit kernel component that you
are relying on?

The kernel has a structure called 'struct page'.  We allocate one of
those for each 4k page of physical memory on x86.  But, each 'struct
page' is/was 32-bytes (is it still??).  That means that on a 64GB
system, you've used at *least* 512MB of your 896MB of lowmem before
you're even out of early boot.  That's just one structure. 

The practical options are to use a different VMSPLIT or to use the RHEL
4/4 kernel.  The VMSPLIT option is in mainline and it will let chop up
the user/kernel virtual address boundary in different ways.  Looking at
arch/x86/Kconfig it doesn't look like mainline's code works with PAE.
It's theoretically possible, but not very practical.  I think I hacked
up a custom kernel for a customer to do this once a long time ago, but
it was painful.

The RHEL 4/4 kernel is a big fat hack.  I think they called it "hugemem"
or something.  It gives the kernel (and userspace) ~4GB of of vaddr
space each, but costs you some extra context switch time.  It lived in
-mm for a while and never made it to mainline.

I see it mentioned here:

	http://www.redhat.com/rhel/previous_versions/rhel3/

and I don't know if it was continued in other RHEL releases.

You can get around the 896MB limit, but it's painful.  You'll almost
certainly need a hacked kernel.

-- Dave

--

From: Yuhong Bao
Date: Tuesday, October 6, 2009 - 9:50 am

<alpine.LFD.2.01.0910051110200.3432@localhost.localdomain>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Even Windows limits the amount of RAM to 16 GB when both PAE and it's 3G/1G=
 split mode is enabled for precisely the same reason.(It defaults to 2G/2G =
split mode)
See this for more info:http://blogs.msdn.com/oldnewthing/archive/2004/08/18=
/216492.aspx

Yuhong Bao 		 	   		  =0A=
_________________________________________________________________=0A=
Hotmail: Free=2C trusted and rich email service.=0A=
http://clk.atdmt.com/GBL/go/171222984/direct/01/=
--

Previous thread: Linux 2.6.32-rc3 build failure by Morten P.D. Stevens on Monday, October 5, 2009 - 8:22 am. (1 message)

Next thread: [BISECTED] "conservative" cpufreq governor broken by Steven Noonan on Monday, October 5, 2009 - 9:32 am. (23 messages)