the victim should not directly access hardware devices like Xorg server, because the hardware could be left in an unpredictable state, although user-application can set /proc/pid/oom_score_adj to protect it. so i think those processes should get 3% bonus for protection. Signed-off-by: Figo.zhang <figo1802@gmail.com> --- mm/oom_kill.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4029583..df6a9da 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -195,10 +195,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, task_unlock(p); /* - * Root processes get 3% bonus, just like the __vm_enough_memory() - * implementation used by LSMs. + * Root and direct hardware access processes get 3% bonus, just like the + * __vm_enough_memory() implementation used by LSMs. */ - if (has_capability_noaudit(p, CAP_SYS_ADMIN)) + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE) || + has_capability_noaudit(p, CAP_SYS_RAWIO)) points -= 30; /* --
Which applications are you referring to that cannot gracefully exit if LSM's have this bonus for CAP_SYS_ADMIN, but not for CAP_SYS_RAWIO, so CAP_SYS_RAWIO had a much more dramatic impact in the previous heuristic to such a point that it would often allow memory hogging tasks to elude the oom killer at the expense of innocent tasks. I'm not sure this is the best way to go. --
like Xorg server, if xorg server be killed, the gnome desktop will be is it some experiments for demonstration the CAP_SYS_RAWIO will elude the oom killer? --
Right, but you didn't explicitly prohibit such applications from being killed, so that suggests that doing so may be inconvenient but doesn't incur something like corruption or data loss, which is what I would consider "unstable" or "inconsistent" state. We're trying to avoid any additional heuristics from being introduced for specific usecases, even for Xorg. That ensures that the heuristic remains as predictable as possible and frees a large amount of memory. If Xorg is being killed first instead of a true memory hogger, then it seems like a forkbomb scenario instead; could you please post your kernel log so that The old heuristic would allow it to elude the oom killer because it would divide the score by four if a task had the capability, which is a much more drastic "bonus" than you suggest here. That would reduce the score for the memory hogging task significantly enough that we killed tons of innocent tasks instead before eventually killing the task that was leaking memory but failed to be identified because it had CAP_SYS_RAWIO. I'm trying to avoid any such repeats. --
CAP_SYS_RESOURCE also had better get 3% bonus for protection. Signed-off-by: Figo.zhang <figo1802@gmail.com> --- mm/oom_kill.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4029583..30b24b9 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -198,7 +198,8 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, * Root processes get 3% bonus, just like the __vm_enough_memory() * implementation used by LSMs. */ - if (has_capability_noaudit(p, CAP_SYS_ADMIN)) + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE)) points -= 30; /* --
process with CAP_SYS_RESOURCE capibility which have system resource limits, like journaling resource on ext3/4 filesystem, RTC clock. so it also the same treatment as process with CAP_SYS_ADMIN. Best, Figo.zhang --
NACK, there's no justification that these tasks should be given a 3% memory bonus in the oom killer heuristic; in fact, since they can allocate without limits it is more important to target these tasks if they are using an egregious amount of memory. CAP_SYS_RESOURCE threads have the ability to lower their own oom_score_adj values, thus, they should protect themselves if necessary like everything else. --
In your new heuristic, you also get CAP_SYS_RESOURCE to protection.
see fs/proc/base.c, line 1167:
if (oom_score_adj < task->signal->oom_score_adj &&
!capable(CAP_SYS_RESOURCE)) {
err = -EACCES;
goto err_sighand;
}
so i want to protect some process like normal process not
CAP_SYS_RESOUCE, i set a small oom_score_adj , if new oom_score_adj is
small than now and it is not limited resource, it will not adjust, that
seems not right?
--
Tasks without CAP_SYS_RESOURCE cannot lower their own oom_score_adj, otherwise it can trivially kill other tasks. They can, however, increase their own oom_score_adj so the oom killer prefers to kill it first. I think you may be confused: CAP_SYS_RESOURCE override resource limits. --
CAP_SYS_RESOURCE == 1 means without resource limits just like a superuser, CAP_SYS_RESOURCE == 0 means hold resource limits, like normal user, right? a new lower oom_score_adj will protect the process, right? Tasks without CAP_SYS_RESOURCE, means that it is not a superuser, why user canot protect it by oom_score_adj? like i want to protect my program such as gnome-terminal which is without CAP_SYS_RESOURCE (have resource limits), [figo@myhost ~]$ ps -ax | grep gnome-ter Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html 2280 ? Sl 0:01 gnome-terminal 8839 pts/0 S+ 0:00 grep gnome-ter [figo@myhost ~]$ cat /proc/2280/oom_adj 3 [figo@myhost ~]$ echo -17 > /proc/2280/oom_adj bash: echo: write error: Permission denied [figo@myhost ~]$ --
Because, as I said, it would be trivial for a user program to deplete all memory (either intentionally or unintentioally) and cause every other task on the system to be oom killed as a result. That's an undesired result of If this is your system, you can either give yourself CAP_SYS_RESOURCE or do it through the superuser. This isn't exactly new, it's been the case for the past four years. I'm still struggling to find out the problem that you're trying to address with your various patches, perhaps because you haven't said what it is. --
David, Stupid are YOU. you removed CAP_SYS_RESOURCE condition with ZERO explanation and Figo reported a regression. That's enough the reason to undo. YOU have a guilty to explain why do you want to change and why do you think it has justification. Don't blame bug reporter. That's completely wrong. --
Can people stop throwing things at each other and worry about the facts - If it's a regression it should get reverted or fixed. But is it actually a regression ? Has the underlying behaviour changed in a problematic way? "CAP_SYS_RESOURCE threads have the ability to lower their own oom_score_adj values, thus, they should protect themselves if necessary like everything else." The reverse can be argued equally - that they can unprotect themselves if necessary. In fact it seems to be a "point of view" sort of question which way you deal with CAP_SYS_RESOURCE, and that to me argues that changing from old expected behaviour to a new behaviour is a regression. --
I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom killer's heuristic in over five years, so what regression are we referring to in this thread? These tasks already have full control over oom_score_adj to modify its oom killing priority in either direction. And, as I said, giving these threads a bonus to be less preferred doesn't seem appropriate since (1) it's not a defined or expected behavior of CAP_SYS_RESOURCE like it is for sysadmin tasks, and (2) these threads are not bound by resource limits and thus have a higher liklihood of consuming larger amounts of memory. That's why I nack'd the patch in the first place and still do, there's no regression here and it's not in the best interest of freeing a large amount of memory which is the sole purpose of the oom killer. Futhermore, the heuristic was entirely rewritten, but I wouldn't consider all the old factors such as cputime and nice level being removed as "regressions" since the aim was to make it more predictable and more likely to kill a large consumer of memory such that we don't have to kill more tasks in the near future. --
Yes, CAP_SYS_RESOURCE was a part of the heuristic in 2.6.25 along with CAP_SYS_ADMIN and was removed with the rewrite; when I said it "hasn't had a place in the oom killer's heuristic," I meant it's an unnecessary extention to CAP_SYS_ADMIN and allows for killing innocent tasks when a CAP_SYS_RESOURCE task is using too much memory. The fundamental issue here is whether or not we should give a bonus to CAP_SYS_RESOURCE tasks because they are, by definition, allowed to access extra resources and we're willing to sacrifice other tasks for that. This is antagonist to the oom killer's sole goal, however, which is to kill the task consuming the largest amount of memory unless protected by userspace (which CAP_SYS_RESOURCE has completely control in doing). Since these threads have complete ability to give themselves this bonus (echo -30 > /proc/self/oom_score_adj), I don't think this needs to be a part of the core heuristic nor with such an arbitrary value of 3% (the old heuristic divided its badness score by 4, another arbitrary value). --
yes, it can control by user, but is it all system administrators will adjust all of the processes by each one and one in real word? suppose if the goal of oom_killer is to find out the best process to kill, the one should be: 1. it is a most memory comsuming process in all processes 2. and it was a proper process to kill, which will not be let system into unpredictable state as possible. if a user process and a process such email cleint "evolution" with ditecly hareware access such as "Xorg", they have eat the equal memory, so which process are you want to kill? --
Yes, the kernel can't possibly know the oom killing priorities of your task so if you have such requirements then you must use the userspace There are four types of tasks that are improper to kill and this is relatively unchanged in the past five years of the oom killer: - init, - kthreads, - tasks that are bound to a disjoint set of cpuset mems or mempolicy nodes that are not oom, and - those disabled from oom killing by userspace. That does not include CAP_SYS_RESOURCE, nor CAP_SYS_ADMIN. Your argument about killing some tasks that have CAP_SYS_RESOURCE leaving hardware in an unpredictable state isn't even addressed by your own patch, you only give them a 3% memory bonus so they are still eligible. As mentioned previously, for this patch to make sense, you would need to show that CAP_SYS_RESOURCE equates to 3% of the available memory's capacity for a task. I don't believe that evidence has been presented. This has nothing to do with preventing these threads from being killed (at the risk of possibly panicking the machine) since your patch doesn't do Both have equal oom killing priority according to the heuristic if they are not run by root. If you would like to protect Xorg, then you need to use the userspace tunable to protect it just like everything else does. This is completely unchanged from the oom killer rewrite. If you actually have a problem that you're reporting, however, it would probably be better to show the oom killer log from that event and let us address it instead of introducing arbitrary heuristics into something which aims to be as predictable as possible. --
I was surprised this issue is still there. This was pointed out half year But yes. OOM need to care both CAP_SYS_RESOURCE and CAP_SYS_RAWIO. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --
the victim should not directly access hardware devices like Xorg server, because the hardware could be left in an unpredictable state, although user-application can set /proc/pid/oom_score_adj to protect it. so i think those processes should get 3% bonus for protection. in v2, fix the incorrect comment. Signed-off-by: Figo.zhang <figo1802@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/oom_kill.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4029583..9b06f56 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -196,9 +196,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, /* * Root processes get 3% bonus, just like the __vm_enough_memory() - * implementation used by LSMs. + * implementation used by LSMs. And direct hardware access processes + * also get 3% bonus. */ - if (has_capability_noaudit(p, CAP_SYS_ADMIN)) + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE) || + has_capability_noaudit(p, CAP_SYS_RAWIO)) points -= 30; /* --
The logic here is wrong: if killing these tasks can leave hardware in an unpredictable state (and that state is presumably harmful), then they should be completely immune from oom killing since you're still leaving them exposed here to be killed. So the question that needs to be answered is: why do these threads deserve to use 3% more memory (not >4%) than others without getting killed? If there was some evidence that these threads have a certain quantity of memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I have no objection, but that's going to be expressed in a memory quantity not a percentage as you have here. The CAP_SYS_ADMIN heuristic has a background: it is used in the oom killer because we have used the same 3% in __vm_enough_memory() for a long time and we want consistency amongst the heuristics. Adding additional bonuses with arbitrary values like 3% of memory for things like CAP_SYS_RAWIO makes the heuristic less predictable and moves us back toward the old heuristic which was almost entirely arbitrary. Now before KOSAKI-san comes out and says the old heuristic considered CAP_SYS_RAWIO and the new one does not so it _must_ be a regression: the old heuristic also divided the badness score by 4 for that capability as a completely arbitrary value (just like 3% is here). Other traits like runtime and nice levels were also removed from the heuristic. What needs to be shown is that CAP_SYS_RAWIO requires additional memory just to run or we should neglect to free 3% of memory, which could be gigabytes, because it has this trait. --
we let the processes with hardware access get bonus for protection. the yes, i think it is be better those processes which be protection maybe divided the badness score by 4, like old heuristic. --
That's bogus. __vm_enough_memory() does track virtual adress space. oom-killer Old background is very simple and cleaner. CAP_SYS_RESOURCE mean the process has a privilege of using more resource. then, oom-killer gave it additonal bonus. CAP_SYS_RAWIO mean the process has a direct hardware access privilege (eg X.org, RDB). and then, killing it might makes system crash. In another story, somebody doubt 4x bonus is good or not. but 3% has the same problem. --
No, 3% was chosen in __vm_enough_memory() for LSMs as the comment in the
oom killer shows:
/*
* Root processes get 3% bonus, just like the __vm_enough_memory()
* implementation used by LSMs.
*/
and is described in Documentation/filesystems/proc.txt.
I think in cases of heuristics like this where we obviously want to give
some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses
The old heuristic divided the arbitrary badness score by 4 with
CAP_SYS_RESOURCE. The new heuristic doesn't consider it.
As a side-effect of being given more resources to allocate, those
applications are relatively unbounded in terms of memory consumption to
other tasks. Thus, it's possible that these applications are using a
massive amount of memory (say, 75%) and now with the proposed change a
task using 25% of memory would be killed instead. This increases the
liklihood that the CAP_SYS_RESOURCE thread will have to be killed
eventually, anyway, and the goal is to kill as few tasks as possible to
free sufficient amount of memory.
Since threads having CAP_SYS_RESOURCE have full control over their
oom_score_adj, they can take the additional precautions to protect
themselves if necessary. It doesn't need to be a part of the heuristic to
bias these tasks which will lead to the undesired result described above
Then you would want to explicitly filter these tasks from oom kill just as
OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.
--
Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory. You are talking two difference at once. 3% vs 4x and CAP_SYS_RESOURCE and CAP_SYS_ADMIN. No. Why does userland recover your mistake? --
It's not unrelated, the LSM function gives an arbitrary 3% bonus to CAP_SYS_ADMIN. Such threads should also be preferred in the oom killer over other threads since they tend to be more important but not an overly drastic bias such that they don't get killed when using an egregious amount of memory. So in selecting a small percentage of memory that tends to be a significant bias but not overwhelming, I went with the 3% found elsewhere in the kernel. __vm_enough_memory() doesn't have that preference for any scientifically calculated reason, it's a heuristic just You just said killing any CAP_SYS_RAWIO task may make the system crash, so presuming that you don't want the system to crash, you are suggesting we should make these threads completely immune? That's never been the case (and isn't for oom_kill_allocating_task, either), so there's no history you can draw from to support your argument. --
__vm_enough_memory() only gurard to memory overcommiting. And it doesn't have any recover way. We expect admin should recover their HAND. In the other hand, oom-killer _is_ automatic recover way. It's no need admin's No. I only require YOU have to investigate userland usecase BEFORE making change. --
I needed a small bias for CAP_SYS_ADMIN tasks so I chose 3% since it's the same proportion used elsewhere in the kernel and works nicely since the badness score is now a proportion. If you'd like to propose a different percentage or suggest removing the bias for root tasks altogether, feel free to propose a patch. Thanks! --
I only need to revert bad change. Thanks. --
We have always preferred to break ties between applications by not preferring the root task over the user task in the oom killer. If you'd like to remove this bonus for CAP_SYS_ADMIN, please propose a patch. Thanks! --
the victim should not directly access hardware devices like Xorg server, because the hardware could be left in an unpredictable state, although user-application can set /proc/pid/oom_score_adj to protect it. so i think those processes should get bonus for protection. in v2, fix the incorrect comment. in v3, change the divided the badness score by 4, like old heuristic for protection. we just want the oom_killer don't select Root/RESOURCE/RAWIO process as possible. suppose that if a user process A such as email cleint "evolution" and a process B with ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is the same),so which process are you want to kill? so in new heuristic, it will kill the process B. but in reality, we want to kill process A. Signed-off-by: Figo.zhang <figo1802@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> mm/oom_kill.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4029583..f43d759 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, points -= 30; /* + * Root and direct hareware access processor are usually more + * important, so them should get bonus for protection. + */ + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE) || + has_capability_noaudit(p, CAP_SYS_RAWIO)) + points /= 4; + + /* * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may * either completely disable oom killing or always prefer a certain * task. --
the victim should not directly access hardware devices like Xorg server, because the hardware could be left in an unpredictable state, although user-application can set /proc/pid/oom_score_adj to protect it. so i think those processes should get bonus for protection. in v2, fix the incorrect comment. in v3, change the divided the badness score by 4, like old heuristic for protection. we just want the oom_killer don't select Root/RESOURCE/RAWIO process as possible. suppose that if a user process A such as email cleint "evolution" and a process B with ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is the same),so which process are you want to kill? so in new heuristic, it will kill the process B. but in reality, we want to kill process A. Signed-off-by: Figo.zhang <figo1802@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/oom_kill.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4029583..f43d759 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, points -= 30; /* + * Root and direct hareware access processes are usually more + * important, so they should get bonus for protection. + */ + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE) || + has_capability_noaudit(p, CAP_SYS_RAWIO)) + points /= 4; + + /* * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may * either completely disable oom killing or always prefer a certain * task. --
Again, this argument doesn't work: if killing the task leaves hardware in an unpredictable state (and that's presumably harmful), then they shouldn't be killed at all. Please show why CAP_SYS_RESOURCE equates to 3% additional memory for such tasks. CAP_SYS_RESOURCE allows those threads to override resource limits, so these have potentially unbounded amounts of memory usage. Thus, they may have the highest memory usage on the machine and now your patch has caused other innocent tasks to be killed before this is actually targeted. That's a bad result. Why do we need this type of hack in the oom killer when these threads have the privilege to modify oom killing priorities for all tasks on the system? Laziness, at the cost of a less predictable heuristic? Then you need to protect process B accordingly and since it has CAP_SYS_RESOURCE it can easily do that on its own or the admin can protect Unless you did this in private, I didn't see KOSAKI-san's reviewed-by line for this change and it is drastically different from what you've proposed What on earth? So now CAP_SYS_ADMIN gets a 3% bonus in the if-clause above this, then we divide a percentage of memory use by 4? What does that mean AT ALL? And now you've thrown CAP_SYS_RAWIO in there without any mention in the changelog? Are you just trying to introduce all the old arbitrary heuristics from before the rewrite back into the oom killer like this? Do you actually have a log from an event where the oom killer targeted the incorrect task? --
Sorry for the delay. I've sent completely revert patch to linus. It will disappear your headache, I believe. I'm sorry that our development caused your harm. We really don't want it. --
Oh please, your dramatics are getting better and better. Figo.zhang never described a problem that was being addressed but rather proposed several different variants of a patch (some with CAP_SYS_ADMIN, some with CAP_SYS_RESOURCE, some with CAP_SYS_RAWIO, some with a combination, some with a 3% bonus, some with a order-of-2 bonus, etc) to return the same heuristic used in the old oom killer. I asked several times to show the oom killer log from the problematic behavior and none were presented. --
>Nothing to say, really. Seems each time we're told about a bug or a >regression, David either fixes the bug or points out why it wasn't a >bug or why it wasn't a regression or how it was a deliberate behaviour >change for the better. >I just haven't seen any solid reason to be concerned about the state of >the current oom-killer, sorry. >I'm concerned that you're concerned! A lot. When someone such as >yourself is unhappy with part of MM then I sit up and pay attention. >But after all this time I simply don't understand the technical issues >which you're seeing here. we just talk about oom-killer technical issues. i am doubt that a new rewrite but the athor canot provide some evidence and experiment result, why did you do that? what is the prominent change for your new algorithm? as KOSAKI Motohiro said, "you removed CAP_SYS_RESOURCE condition with ZERO explanation". David just said that pls use userspace tunable for protection by oom_score_adj. but may i ask question: 1. what is your innovation for your new algorithm, the old one have the same way for user tunable oom_adj. 2. if server like db-server/financial-server have huge import processes (such as root/hardware access processes)want to be protection, you let the administrator to find out which processes should be protection. you will let the financial-server administrator huge crazy!! and lose so many money!! ^~^ 3. i see your email in LKML, you just said "I have repeatedly said that the oom killer no longer kills KDE when run on my desktop in the presence of a memory hogging task that was written specifically to oom the machine." http://thread.gmane.org/gmane.linux.kernel.mm/48998 so you just test your new oom_killer algorithm on your desktop with KDE, so have you provide the detail how you do the test? is it do the experiment again for anyone and got the same result as your comment ? as KOSAKI Motohiro said, in reality word, it we makes 5-6 brain simulation, embedded, ...
The goal was to make the oom killer heuristic as predictable as possible and to kill the most memory-hogging task to avoid having to recall it and needlessly kill several tasks. The goal behind oom_score_adj vs. oom_adj was for several reasons, as pointed out before: - give it a unit (proportion of available memory), oom_adj had no unit, - allow it to work on a linear scale for more control over prioritization, oom_adj had an exponential scale, - give it a much higher resolution so it can be fine-tuned, it works with a granularity of 0.1% of memory (~128M on a 128G machine), and - allow it to describe the oom killing priority of a task regardless of its cpuset attachment, mempolicy, or memcg, or when their respective You have full control over disabling a task from being considered with oom_score_adj just like you did with oom_adj. Since oom_adj is Xorg tends to be killed less because of the change to the heuristic's baseline, which is now based on rss and swap instead of total_vm. This is seperate from the issues you list above, but is a benefit to the oom killer that desktop users especially will notice. I, personally, am interested more in the server market and that's why I looked for a more robust userspace tunable that would still be applicable when things like cpusets have a node added or removed. --
Meta question - why is that a good thing. In a desktop environment it's frequently wrong, in a server environment it is often wrong. We had this before where people spend months fiddling with the vm and make it work slightly differently and it suits their workload, then other workloads go Which changeset added it to the Documentation directory as deprecated ? Alan --
Most of the arbitrary heuristics were removed from oom_badness(), things like nice level, runtime, CAP_SYS_RESOURCE, etc., so that we only consider the rss and swap usage of each application in comparison to each other when deciding which task to kill. We give root tasks a 3% bonus since they tend to be more important to the productivity or uptime of the machine, which did exist -- albeit with a more dramatic impact -- in the old heursitic. You'll find that the new heuristic always kills the task consuming the most amount of rss unless influenced by userspace via the tunables (or within 3% of root tasks). We always want to kill the most memory-hogging task because it avoids needlessly killing additional tasks when we must immediately recall the oom killer because we continue to allocate memory. If that task happens to be of vital importance to userspace, then the user has full control 51b1bd2a was the actual change that deprecated it, which was a direct follow-up to a63d83f4 which actually obsoleted it. --
It's insufficient. a63d83f427fbce97a6cea0db2e64b0eb8435cd10 (oom: badness heuristic rewrite) introduced a lot of incompatibility to oom_adj and oom_score. Theresore I would sugestted full revert and resubmit some patches which cherry pick no pain piece. --
i had send the patch to protect the hardware access processes for oom-killer before, but rientjes have not agree with me. but today i catch log from my desktop. oom-killer have kill my "minicom" and "Xorg". so i think it should add protection about it. my desktop run on linux-2.6.36. [figo@figo-desktop android]$ uname -a Linux figo-desktop 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:01:53 UTC 2010 i686 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux [figo@figo-desktop android]$ --
On Tue, 04 Jan 2011 15:51:44 +0800
Off topic.
... This means total_swap_pages = 0 while pages are read-in at swapoff.
Let's see 'points' for oom
==
points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
totalpages;
==
Here, totalpages = total_ram + total_swap but totalswap is 0 here.
So, points can be > 1000, easily.
(This seems not to be related to the Xorg's death itself)
Thanks,
-Kame
--
total_swap is 0, so totalpages = total_ram, get_mm_counter(p->mm, MM_SWAPENTS) = 0, so points = (get_mm_rss(p->mm)) * 1000 / totalpages; --
