Hello,
please consider the following repository for 2.6.36. It introduces a new
syscall for arch independent resource limits handling. It also adds a
support for runtime limits changing. This feature is needed mostly by
daemons servicing databases and similar service where limits are needed
to be changed without services being restarted on production systems.
The following changes since commit 2f7989efd4398d92b8adffce2e07dd043a0895fe:
Merge master.kernel.org:/home/rmk/linux-2.6-arm (2010-07-14 17:28:13
-0700)
are available in the git repository at:
git://decibel.fi.muni.cz/~xslaby/linux writable_limits
Jiri Slaby (10):
rlimits: security, add task_struct to setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: split sys_setrlimit
rlimits: allow setrlimit to non-current tasks
rlimits: do security check under task_lock
rlimits: add rlimit64 structure
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: implement prlimit64 syscall
unistd: add __NR_prlimit64 syscall numbers
Oleg Nesterov (2):
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/include/asm/unistd_32.h | 3 +-
arch/x86/include/asm/unistd_64.h | 2 +
arch/x86/kernel/syscall_table_32.S | 1 +
include/asm-generic/unistd.h | 4 +-
include/linux/posix-timers.h | 2 +-
include/linux/resource.h | 9 ++
include/linux/security.h | 9 +-
include/linux/syscalls.h | 4 +
kernel/compat.c | 17 +---
kernel/posix-cpu-timers.c | 8 +-
kernel/sys.c | 202
++++++++++++++++++++++++++++--------
security/capability.c | 3 +-
security/security.c | 5 +-
security/selinux/hooks.c | 12 ++-
...Ok, so the code looks fine, and I don't have any real objections any
more. I don't know how much use this will get, but it doesn't appear
to be "wrong" in any way. So I was going to pull it.
However, in the meantime we have commit 5360bd776f73 ("Fix up the
"generic" unistd.h ABI to be more useful") that clashes with it. Now,
the conflict is trivial to resolve, and I could do that easily - it's
not a technical problem. But that commit code comments say
+ * Architectures may provide up to 16 syscalls of their own
+ * starting with this value.
+ */
+#define __NR_arch_specific_syscall 244
and the new writable rlimits syscall is obviously 244.
Now, looking at it all, I think that commit was badly done - not
leaving any room for new generic system calls is pretty iffy. And if I
had happened to take the Tilera merge later, I'd have had no problems
with just changing it. As is, though, I want to check with Arnd and
Chris first.
Arnd, Chris - how about making the "arch-specific" system calls start
at 256 or something? Or even higher, like 512? Yes, it makes the
system call array bigger, but is that really a problem? Especially as
we start the "deprecated" system calls at 1024, it would seem to make
sense to raise it to 512, and leave the low numbers for the "regular"
system calls.
[ I'm leaving the quoted email for the edification of Chris/Arnd that
I added to the discussion ]
Linus
--
Jiri and I actually discussed this back on July 20th on LKML when it first conflicted in linux-next, and at the time he said he'd move prlimit64 to 261 in <asm-generic/unistd.h>. It looks like what actually stuck in linux-next was different, however. It's partly my fault for In any case, obviously the larger question is how many architecture-specific syscalls are appropriate, and where they should be located in the syscall number space. To be clear, the model for new generic system calls is that they just continue on after the 16 architecture-specific ones, and in fact __NR_wait4 is already an example of just this -- done that way to avoid making trouble for the "score" architecture, since it was deprecated and then later un-deprecated. So new generic syscalls are not a problem. There is definitely some tension between allowing architectures free reign with their own set of unlimited additional syscalls on the one hand, and having a contiguous and small array of syscalls on the other hand. I suspect it's slightly nicer to have a contiguous and small array, as long as we've provided enough room for architectures to add extra syscalls, but I'm not strongly married to this position. For what it's worth, from Tilera's point of view we can certainly tolerate changes in this area; we have not released any of this new syscall ABI stuff to customers yet, so thrashing this just involves an -- Chris Metcalf, Tilera Corp. http://www.tilera.com --
I would do that if the tree reached linus's tree earlier, so that I could rebase my tree on the top of that. Otherwise I couldn't do much with that. The resolving (merge) in -next is done by Stephen, so he probably misunderstood us. (Oh, I could have a for-next branch where I would merge your tree to solve the -next merging done by Stephen, but it wouldn't solve the situation we got into now.) thanks, -- js suse labs --
Right. The writable_rlimits syscall should just go after wait4 at 262. In retrospect, it would have been nicer to have the architecture specific syscalls start at zero, but it's too late for that. Since we don't have an architecture with more than a handful of arch specific calls, I think 16 will get us a very long way, while trying to leave "enough" space between the generic and the arch specific calls would result either in wasting space in the table or chosing a too small value. Arnd --
.. and in the meantime I added the notify tree too, so now the
x86(-64) numbers also clashed.
So I just moved the prlimit64() system call, both on x86[-64] and in
asm-generic/unistd.h
Pushed out. Guys, please verify that it looks ok.
Linus
--
It looks good in asm-generic/unistd.h; thanks. -- Chris Metcalf, Tilera Corp. http://www.tilera.com --
To me too, except the nits below.
---
arch/x86/ia32/ia32entry.S | 2 +-
arch/x86/kernel/syscall_table_32.S | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 91dc4bb..b9472ec 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -844,5 +844,5 @@ ia32_sys_call_table:
.quad compat_sys_recvmmsg
.quad sys_fanotify_init
.quad sys32_fanotify_mark
- .quad sys_prlimit64
+ .quad sys_prlimit64 /* 340 */
ia32_syscall_end:
diff --git a/arch/x86/kernel/syscall_table_32.S
b/arch/x86/kernel/syscall_table_32.S
index 4802acc..b35786d 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -339,4 +339,4 @@ ENTRY(sys_call_table)
.long sys_recvmmsg
.long sys_fanotify_init
.long sys_fanotify_mark
- .long sys_prlimit64
+ .long sys_prlimit64 /* 340 */
--
Looks good, thanks! Arnd --
On Tue, Aug 10, 2010 at 9:01 AM, Linus Torvalds
I should have clarified that. The new asm-generic prlimit64 system
call was added at the end (as 244), not in general. Only tilera and
score use that "generic" unistd.h file currently, and score doesn't do
any other system calls, which is why it's really only arch/tile that
is affected by this. Of course, new architectures are likely to use
that model, but we don't care about those yet.
I still think that starting the arch-specific ones at 512 is likely
the right model. I just wanted to clarify in case somebody thought
that x86 put a new system call at 244.
Linus
--
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASK |
