> IIRC, prior to that fix, I was able to cause the kernel to loop for tens
And I say, if your userland process could really allocate another 200GB,
then more power to you, you can do it with an exec too. If you could do
the same with a userland stack allocation, and spend all that time in
strlen calls and then memcpy, you can do it inside execve too. If it
takes days, that's what you asked for, and it's your process. It just
ought to be every bit (or near enough) as preemptible and interruptible
as that normal userland activity ought to be.
So, perhaps we want this (count already has a cond_resched in its loop):
diff --git a/fs/exec.c b/fs/exec.c
index 2d94552..0000000 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -369,6 +369,9 @@ static int count(const char __user * con
for (;;) {
const char __user * p;
+ if (signal_pending(current))
+ return -ERESTARTNOINTR;
+
if (get_user(p, argv))
return -EFAULT;
if (!p)
@@ -400,6 +403,10 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;
+ if (signal_pending(current))
+ return -ERESTARTNOINTR;
+ cond_resched();
+
if (get_user(str, argv+argc) ||
!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
ret = -EFAULT;
I really don't think we need that stuff back. I think we can get rid of
it and fix the real problems, and be happier overall.
I don't agree. If all of the implicit allocation done inside execve is
accounted and controlled as well as normal userland allocations so that
the execve fails when userland allocation would fail, then there is no
reason for special-case arbitrary limits.
That change seems like it might be reasonable, but I haven't really
looked at shift_arg_pages before. Has someone reported this BUG_ON
failure mode with a reproducer?
Agreed. But IMHO the missing arbitrary limits on arg/env size are not
among them. I don't know about shift_arg_pages. The core fix I think
makes sense is making the nascent mm get accounted to the user process
normally. Rather than better enabling OOM killing, I think what really
makes sense is for the nascent mm to be marked such that allocations in
it (they'll be from get_arg_page->get_user_pages->handle_mm_fault) just
fail with ENOMEM before it resorts to the OOM killer (or perhaps even to
very aggressive pageout). That should percolate back to the execve just
failing with ENOMEM, which is nicer than OOM kill even if the OOM killer
actually does pick exactly and only the right target.
Thanks,
Roland
--