Re: [PATCH] exec argument expansion can inappropriately trigger OOM-killer

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Roland McGrath
Date: Monday, August 30, 2010 - 3:06 am

> IIRC, prior to that fix, I was able to cause the kernel to loop for tens

And I say, if your userland process could really allocate another 200GB,
then more power to you, you can do it with an exec too.  If you could do
the same with a userland stack allocation, and spend all that time in
strlen calls and then memcpy, you can do it inside execve too.  If it
takes days, that's what you asked for, and it's your process.  It just
ought to be every bit (or near enough) as preemptible and interruptible
as that normal userland activity ought to be.

So, perhaps we want this (count already has a cond_resched in its loop):

diff --git a/fs/exec.c b/fs/exec.c
index 2d94552..0000000 100644  
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -369,6 +369,9 @@ static int count(const char __user * con
 		for (;;) {
 			const char __user * p;
 
+			if (signal_pending(current))
+				return -ERESTARTNOINTR;
+
 			if (get_user(p, argv))
 				return -EFAULT;
 			if (!p)
@@ -400,6 +403,10 @@ static int copy_strings(int argc, const 
 		int len;
 		unsigned long pos;
 
+		if (signal_pending(current))
+			return -ERESTARTNOINTR;
+		cond_resched();
+
 		if (get_user(str, argv+argc) ||
 				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;


I really don't think we need that stuff back.  I think we can get rid of
it and fix the real problems, and be happier overall.


I don't agree.  If all of the implicit allocation done inside execve is
accounted and controlled as well as normal userland allocations so that
the execve fails when userland allocation would fail, then there is no
reason for special-case arbitrary limits.


That change seems like it might be reasonable, but I haven't really
looked at shift_arg_pages before.  Has someone reported this BUG_ON
failure mode with a reproducer?


Agreed.  But IMHO the missing arbitrary limits on arg/env size are not
among them.  I don't know about shift_arg_pages.  The core fix I think
makes sense is making the nascent mm get accounted to the user process
normally.  Rather than better enabling OOM killing, I think what really
makes sense is for the nascent mm to be marked such that allocations in
it (they'll be from get_arg_page->get_user_pages->handle_mm_fault) just
fail with ENOMEM before it resorts to the OOM killer (or perhaps even to
very aggressive pageout).  That should percolate back to the execve just
failing with ENOMEM, which is nicer than OOM kill even if the OOM killer
actually does pick exactly and only the right target.


Thanks,
Roland
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH] exec argument expansion can inappropriately tr ..., Roland McGrath, (Mon Aug 30, 3:06 am)
[PATCH 0/3] execve argument-copying fixes, Roland McGrath, (Tue Sep 7, 7:34 pm)
Re: [PATCH 0/3] execve argument-copying fixes, KOSAKI Motohiro, (Tue Sep 7, 8:00 pm)
[PATCH 0/2] execve memory exhaust of argument-copying fixes, KOSAKI Motohiro, (Wed Sep 8, 10:01 pm)
[PATCH 1/2] oom: don't ignore rss in nascent mm, KOSAKI Motohiro, (Wed Sep 8, 10:03 pm)
Re: [PATCH 1/2] oom: don't ignore rss in nascent mm, Oleg Nesterov, (Thu Sep 9, 3:05 pm)
Re: [PATCH 1/2] oom: don't ignore rss in nascent mm, Roland McGrath, (Fri Sep 10, 2:39 am)
Re: [PATCH 2/2] execve: check the VM has enough memory at ..., KOSAKI Motohiro, (Wed Sep 15, 10:51 pm)