Re: 2.6.20 OOM with 8Gb RAM

Previous thread: [KJ][PATCH]ROUND_UP macro cleanup in arch/sparc,sparc64 by Milind Arun Choudhary on Thursday, April 12, 2007 - 10:14 am. (1 message)

Next thread: Please pull from 'for_linus' branch by Kumar Gala on Thursday, April 12, 2007 - 10:46 am. (1 message)
From: Cameron Schaus
Date: Thursday, April 12, 2007 - 10:38 am

I am running the latest FC5-i686-smp kernel, 2.6.20, on a machine with
8Gb of RAM, and 2 Xeon processors.  The system has a 750Mb ramdisk,
and one process allocating and deallocating memory that is also
writing lots of files to the ramdisk.  The process also reads and
writes from the network.  After the process runs for a while, the
linux OOM killer starts killing processes, even though there is lots
of memory available.

The system does not ordinarily use swap space, but I've added swap to
see if it makes a difference, but it only defers the problem.

The OOM dump below shows that memory in the NORMAL_ZONE is exhausted,
but there is still plenty of memory (6Gb+) in the HighMem Zone.  I can
provide .config and dmesg data if these would be helpful.

Why is the OOM killer being invoked when there is still memory
available for use?

java invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
java invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
 [<c0455f84>] out_of_memory+0x69/0x191
 [<c0457460>] __alloc_pages+0x220/0x2aa
 [<c046c80a>] cache_alloc_refill+0x26f/0x468
 [<c046ca76>] __kmalloc+0x73/0x7d
 [<c05bb4ce>] __alloc_skb+0x49/0xf7
 [<c05e483d>] tcp_sendmsg+0x169/0xa04
 [<c05fd76d>] inet_sendmsg+0x3b/0x45
 [<c05b57d5>] sock_aio_write+0xf9/0x105
 [<c0455708>] generic_file_aio_read+0x173/0x1a3
 [<c046fd11>] do_sync_write+0xc7/0x10a
 [<c04379fd>] autoremove_wake_function+0x0/0x35
 [<c05e413e>] tcp_ioctl+0x10a/0x115
 [<c05e4034>] tcp_ioctl+0x0/0x115
 [<c05fd406>] inet_ioctl+0x8d/0x91
 [<c0470564>] vfs_write+0xbc/0x154
 [<c0470b62>] sys_write+0x41/0x67
 [<c0403ef6>] sysenter_past_esp+0x5f/0x85
 =======================
DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Normal ...
From: Andrew Morton
Date: Thursday, April 12, 2007 - 12:15 pm

On Thu, 12 Apr 2007 11:38:30 -0600

All of ZONE_NORMAL got used by ramdisk, and networking wants to
allocate a page from ZONE_NORMAL.  An oom-killing is the correct
response, although probably not effective.

ramdisk is a nasty thing - cannot you use ramfs or tmpfs?
-

From: Cameron Schaus
Date: Thursday, April 12, 2007 - 2:30 pm

Sure enough, changing the ramdisk to a tmpfs did the trick.  No more OOM 
(at least for now).

Thanks!
Cam


-

From: Jason Lunz
Date: Friday, April 13, 2007 - 3:39 pm

What do you mean by "nasty thing"? I've heard that about loopback too.

If I want to run a system entirely from ram with a compressed filesystem
image mounted on /, is it better to store that image in a ramdisk, or on
a tmpfs and mount it via loopback?

Jason
-

From: Andrew Morton
Date: Friday, April 13, 2007 - 3:46 pm

On Fri, 13 Apr 2007 18:39:36 -0400

It's just weird - it exploits internal knowledge of VFS behaviour, diddles
with pagecache within a fake disk strategy handler, etc.

Furthermore, because it pretends to be a block device, the VFS will not use
highmem pages when accessing the ramdisk.  So the 8GB machine will go splat
with only 800MB of ramdisk.


loopback does some pretty weird thigns too, but it has more of an excuse:
it is a specialised layering thing, whereas ramdisk is, umm, just a

Store it all in ramfs, no loopback needed?
-

From: William Lee Irwin III
Date: Friday, April 13, 2007 - 3:54 pm

After all this time, bdevs are still lowmem etc. Crying shame.


-- wli
-

From: Andrew Morton
Date: Friday, April 13, 2007 - 4:32 pm

On Fri, 13 Apr 2007 15:54:33 -0700

One would need to hunt down every use of b_data in filesystems and switch
them to kmap the page.

Possibly it could be done on a per-fs basis.
-

From: William Lee Irwin III
Date: Friday, April 13, 2007 - 4:40 pm

Queued right behind a dozen other massive sweeps for me to do.


-- wli
-

From: Jason Lunz
Date: Friday, April 13, 2007 - 4:01 pm

I used to put everything in a tmpfs on /, and that certainly works. But
most files in a typical image are rarely used and it's a pity to have
lots of little files taking up a 4k page each.

You get pretty big savings by compressing the system into a squashfs and
mounting that, so the question becomes: where to put the squashfs?
ramdisk or loopback mount it from tmpfs/ramfs?

iirc, the problems with loopback have to do with writeout, which isn't a
problem here since squashfs is readonly.

Jason
-

Previous thread: [KJ][PATCH]ROUND_UP macro cleanup in arch/sparc,sparc64 by Milind Arun Choudhary on Thursday, April 12, 2007 - 10:14 am. (1 message)

Next thread: Please pull from 'for_linus' branch by Kumar Gala on Thursday, April 12, 2007 - 10:46 am. (1 message)