sorry about the double post but it looks to fit better in the kernel section rather then the general section
I have a question about swap. Will swap be used if the process is active and using all of the physical ram?
It was my understanding that swap will _not_ be used if the process is active. We have an application that uses all of the physical ram then gets an OOM error and crashes but we have 4 gb of swap and i just wanted to be sure i am understanding it right or not understanding it right.
Brian
other limits?
yes, the memory is paged, i.e. some pages of your process can be in ram an some on disk. even if the process uses every piece of its memory in turn, the kernel will just do what it has to do to support this behavior, swap long unused pages out and swap the needed pages in constantly, just the system will get very slow and other apps will be completely swapped out.
how much ram do you have? is your app compiled for 32 bit and runs against the 32 bit address space limit? which error do you get, does the application print 'out of memory' and die, or does the kernel's special OOM killer kill the app?
We have a 32 bit version of
We have a 32 bit version of Red Hat enterprise with a 32 bit app and 6GB of ram. When the program is running without HA it uses 5.7G of ram. once we turn on HA it starts to replicate the database and in the program logs we get "could not allocate memory" and the program crashes. Watching the program we have about 50-70mb of free ram when it crashes and 4gb swap but the swap is very rarely used. normally about 100-200mb of swap used. and i thought that if you program actively using all the memory then it would not go to swap. We have another server that runs the program most of the time and it has 16gb of ram and about 12gb is used all the time. The server that we are having problem with is our backup server with limited resources.
-
(a) your program just eats too much memory in general and should be fixed
(b) the "free ram" number is often interpreted wrong by many users (even more so when the system is idle and no bigtime memory-eating programs are active). But when thing start to swap, i.e. you see the LED going, then it's probably full.
Yes the program does use way
Yes the program does use way to much memory but there is nothing i can do about it, it is closed-source and we have to use it. But i am trying to find out if i am correct when saying the program will not use swap if all the memory being used is active.
not use swap
if all your program's memory is used constantly that only lowers its probability of being swapped out, swapping still takes place, only the just swapped out memory has to be swapped in on use. but no program uses exactly 100% of its memory.
if you didn't enable memory overcommit, memory can run out if it is logically allocated but not physically used completely. a program might allocate 1GB memory but only touch .3GB of the memory pages, then the unused .7GB take space in the swap space to have a guarantee that they are available once the program starts to use the rest. in this case you have to add more swap space even if the existing swap is only used virtually.
how does your app manage to use 6GB of ram? 32 bit processes only have 4GB of theoretical address space, of which only about 2GB are usable. if a process tries to allocate more than these 2GB it gets an oom error of course.
Didn't see the double post
Didn't see the double post in time ;P
From my understanding, the out_of_memory() function
is invoked by __alloc_pages() when free memory is low and no pages can be reclaimed.
The system selects a process that:
- owns a lot of pages
- has a low static priority
- doesn't have root priveleges
calculates if it's a good candidate to free memory, and kills it if it is.
Your app may be the one selected and killed (?)
I realise it's a proprietary product and you do not have source.
Maybe setting it's priority higher?
I have never heard that "swap will_not_be used if the process is active",
but could be wrong :)
jy
no oom killer
according to the op 'in the program logs we get "could not allocate memory"', i.e. the program manages to print a message into its logfile. that means that the program isn't rudely killed by the kernel's oom killer. presumably just malloc() returned NULL.
The program uses 12GB of ram
The program uses 12GB of ram on a server with 16Gb. Then the same program doing the same thing is used as a secondary but it only have 6Gb of ram. There is not one single process that uses it all the program just spawns multiple sessions. It has a 7Gb database and watch 6k+ vpns. It is poorly written java and we keep getting the "could not allocate memory" when we try to replicate the database on top of using the program it gets the error.
The process itself is not using 6gb of ram but the group of 6 or so processes that the program spawns use up the 6gb and from the looks of it. very little (100-200mb) is swapped and looking at the meminfo only about 200mb is cached. The rest is all active. BTW the server has 8 processors and they all stay about 60% util.
Brian
sorry about that. Let me go
sorry about that. Let me go into a little more detail. The program spawns mulitple processes that all do different things. From logging 700+ devices, polling 6k+ vpn tunnels, writing to the database, backing up any device any time there is a change, manageing users, etc. I can like the program does a hell of a lot of things. The main server we run it on is 8x processors with 16gb of ram and it uses 12 or so Gb of ram. The server we have the issue with is the backup server. The program can run fine for months as long as we dont replicate the database. When it is running we have about 150mb free ram and about 100mb cached in meminfo (not including the swap). only we turn on the other server and they begin to talk it starts to copy the database and about 1/2 way through we get the "could not allocate memory" error. The database is about 9Gb.
hope this helps a little more
Sounds like hitting the 32-bit limit to me
From what you're describing, there's no memory shortage (swap + buffers + cache + free is much greater than 0). I'd suspect that the replication process is trying to use more than 3GB of address space (a limit you can easily hit by allocating 2GB in chunks via malloc). The only fix for this is to rewrite the program to not allocate huge amounts of address space, or to go to 64-bit processes (not just kernel, but user process as well).
Assuming the process doesn't die when it hits this limit, you can check by looking at the virtual memory allocated by it (top will show you this in the VIRT column); if it's more than 1GB (1024MB), you're probably not out of memory total, but only out of address space for the process.
Linux is a paged system, so as long as there is free address space and memory, allocations will succeed (for user programs - kernel has other limits, such as needing contiguous physical pages).
You need to go to the program author, and get them to fix things, or to supply you a 64-bit version, which can allocate lots of memory before running out of address space (on this laptop CPU, I have 48 bits available for virtual addresses, so even with the way Linux splits that, I'm still able to allocate terabytes in one process).
Overcommit
It sounds like a difference in overcommit modes, really. I don't know specifically what HA is or implies (some sort of high availability mode?), but I imagine enabling it disables overcommit.
(I guess it makes sense to disable overcommit in a HA mode, the idea being that once you've qual'd a system with overcommit disabled, you know you won't get a surprise OOM because a task decided to fault in a larger portion of its allocation.)
If it is indeed the case that HA disables overcommit, then probably what's happening is that there's a bunch of allocated memory that's not being used and therefore shows up as "free." When the DB asks for more, it gets told no, leading to problems. If the DB is scaling its memory requests based on what it perceives to be the available memory, then you'll always have a problem no matter what, since it seems it's scaling its requests in a manner that assumes overcommit.
I believe there's a /proc entry that indicates and controls whether the kernel permits overcommit. I'd go look for it and check it. Perhaps you can disable strict allocations (no overcommit) while replicating the database and reenable it afterwards?
--
Program Intellivision and play Space Patrol!