I am seeking assistance to this memory problem in Linux:
My distro: MontaVista version 2.6.10 (mvl401-ep852)
In "steady state", I execute the "free" command to display info about free and used memory on my system:
> free
total used free shared buffers
Mem: 127396 42884 84512 0 0
Swap: 0 0 0
Total: 127396 42884 84512
I noted I have 84512 KB free. I then telnet into the kernel (via telnet ) and note the following:
Two new processes are created for my telnet:
> ps PID Uid VmSize Stat Command 250 root 884 S in.telne 251 root 1536 S -bash
and memory is consumed:
> free
total used free shared buffers
Mem: 127396 43380 84016 0 0
Swap: 0 0 0
Total: 127396 43380 84016
Then, I "exit" (terminate) my telnet session. Process 250 is removed but 251 has become a Zombie process:
> ps PID Uid VmSize Stat Command 251 root Z [bash]
Also, all but 8K is freed (84512 - 84504 = 8 KB):
> free
total used free shared buffers
Mem: 127396 42892 84504 0 0
Swap: 0 0 0
Total: 127396 42892 84504
I can repeat this process continuously and another 8K is used up never to be freed again (until I reboot the system). I have read up on Zombie processes and understand that it is a terminated process that has not been reaped by its parent. So, one cannot kill a Zombie process since it is already dead. And, I cannot kill the parent as it is already killed. I also read that it then becomes the responsiblity of the "init" process to watch over the Zombie processes and eventually zap them if the parent has not done so. But this doesn't seem to be happening in my system (the Zombie's are still here after almost a full day). I have concern from the end-user that memory is being eaten away every time someone logs in (albeit it is small, but it adds up).
I have patched the kernel with a recent patch tar file with no luck. Is this a known issue? I have tried this on another system and don't get this problem (memory is free'd cleanly). I'm open for any resolutions or opinions. Much thanks.
use rspace
don't you think this problem lies in user space? are there patches for init or is this a known bug of the version of init you are using? you can try this with own programs, the parent process has to wait() or waitpid() for the zombie or ignore SIGCHLD altogether. if it doesn't, the zombie won't go away.
telnet daemon
I have been thinking that this might be a "telnet" issue (thereby makeing it a kernel issue, not userspace...but just my humble opinion). Looking at the differences in the telnet daemon on my system:
In /etc/inetd.conf:
On another system where things are fine (no memory leak), there is a difference:
I tried the telnetd instead of in.telnetd but the telnet fails on my system (with no error logs). In both systems, the in.telnetd/telnetd is supplied by Busybox. However, in the system that works, they are using BusyBox v1.11.1 whereas I am using Busybox v1.01. Anyone know if there is a known problem in v1.01 wrt telnetd? Is it worth it for me to upgrade to v1.11.1 (or later)?
user space
of course init, telnetd, tcpd, busybox, etc. are parts of the user space, not parts of the kernel. you can't 'telnet into the kernel'. it may well be that newer busybox fixes problems of a former version. the telnetd lines in inetd.conf don't tell much, in.telnetd vs. telnetd is just a rename away; your inetd.conf implements address based access control by wrapping the telnetd with tcpd, while the other inetd.conf implements the access control (or no access control by ip address at all) in telnetd itself.
if a process remains in zombie state, that is never a problem with the program itself but with its parent that doesn't wait() for it. as the parent process comes with busybox, too, you could indeed try if booting with a newer image fixes your problem. are there other hung processes? are there error messages (oopses) in dmesg or on the console or in log files, if you have any?
Thanks for your comments and
Thanks for your comments and clarification, strcmp. To answer your questions: no, there aren't any other hung processes or oopses. Everything seems fine except for the Zombie'd "bash" process after telnet exits.
I think I will have to go and build the new busybox. I hope this isn't hard to do (I know there is a lot of configuration for it). I'll have to dig up the doc'n.