Re: [GIT] writable_limits for 2.6.36

Previous thread: [patch] apparmor: issue with ns name without a following profile by Dan Carpenter on Saturday, August 7, 2010 - 4:50 am. (2 messages)

Next thread: Building kernel 2.0.40 with GCC 4.4 by Alex Buell on Saturday, August 7, 2010 - 5:31 am. (1 message)
From: Jiri Slaby
Date: Saturday, August 7, 2010 - 5:15 am

Hello,

please consider the following repository for 2.6.36. It introduces a new
syscall for arch independent resource limits handling. It also adds a
support for runtime limits changing. This feature is needed mostly by
daemons servicing databases and similar service where limits are needed
to be changed without services being restarted on production systems.

The following changes since commit 2f7989efd4398d92b8adffce2e07dd043a0895fe:

  Merge master.kernel.org:/home/rmk/linux-2.6-arm (2010-07-14 17:28:13
-0700)

are available in the git repository at:

  git://decibel.fi.muni.cz/~xslaby/linux writable_limits

Jiri Slaby (10):
      rlimits: security, add task_struct to setrlimit
      rlimits: add task_struct to update_rlimit_cpu
      rlimits: split sys_setrlimit
      rlimits: allow setrlimit to non-current tasks
      rlimits: do security check under task_lock
      rlimits: add rlimit64 structure
      rlimits: redo do_setrlimit to more generic do_prlimit
      rlimits: switch more rlimit syscalls to do_prlimit
      rlimits: implement prlimit64 syscall
      unistd: add __NR_prlimit64 syscall numbers

Oleg Nesterov (2):
      rlimits: make sure ->rlim_max never grows in sys_setrlimit
      rlimits: selinux, do rlimits changes under task_lock

 arch/x86/ia32/ia32entry.S          |    1 +
 arch/x86/include/asm/unistd_32.h   |    3 +-
 arch/x86/include/asm/unistd_64.h   |    2 +
 arch/x86/kernel/syscall_table_32.S |    1 +
 include/asm-generic/unistd.h       |    4 +-
 include/linux/posix-timers.h       |    2 +-
 include/linux/resource.h           |    9 ++
 include/linux/security.h           |    9 +-
 include/linux/syscalls.h           |    4 +
 kernel/compat.c                    |   17 +---
 kernel/posix-cpu-timers.c          |    8 +-
 kernel/sys.c                       |  202
++++++++++++++++++++++++++++--------
 security/capability.c              |    3 +-
 security/security.c                |    5 +-
 security/selinux/hooks.c           |   12 ++-
 ...
From: Linus Torvalds
Date: Tuesday, August 10, 2010 - 9:01 am

Ok, so the code looks fine, and I don't have any real objections any
more. I don't know how much use this will get, but it doesn't appear
to be "wrong" in any way. So I was going to pull it.

However, in the meantime we have commit 5360bd776f73 ("Fix up the
"generic" unistd.h ABI to be more useful") that clashes with it. Now,
the conflict is trivial to resolve, and I could do that easily - it's
not a technical problem. But that commit code comments say

  + * Architectures may provide up to 16 syscalls of their own
  + * starting with this value.
  + */
  +#define __NR_arch_specific_syscall 244

and the new writable rlimits syscall is obviously 244.

Now, looking at it all, I think that commit was badly done - not
leaving any room for new generic system calls is pretty iffy. And if I
had happened to take the Tilera merge later, I'd have had no problems
with just changing it. As is, though, I want to check with Arnd and
Chris first.

Arnd, Chris - how about making the "arch-specific" system calls start
at 256 or something? Or even higher, like 512? Yes, it makes the
system call array bigger, but is that really a problem? Especially as
we start the "deprecated" system calls at 1024, it would seem to make
sense to raise it to 512, and leave the low numbers for the "regular"
system calls.

[ I'm leaving the quoted email for the edification of Chris/Arnd that
I added to the discussion ]

                                   Linus

--

From: Chris Metcalf
Date: Tuesday, August 10, 2010 - 9:21 am

Jiri and I actually discussed this back on July 20th on LKML when it
first conflicted in linux-next, and at the time he said he'd move
prlimit64 to 261 in <asm-generic/unistd.h>.  It looks like what actually
stuck in linux-next was different, however.  It's partly my fault for

In any case, obviously the larger question is how many
architecture-specific syscalls are appropriate, and where they should be
located in the syscall number space.  To be clear, the model for new
generic system calls is that they just continue on after the 16
architecture-specific ones, and in fact __NR_wait4 is already an example
of just this -- done that way to avoid making trouble for the "score"
architecture, since it was deprecated and then later un-deprecated.  So
new generic syscalls are not a problem.

There is definitely some tension between allowing architectures free
reign with their own set of unlimited additional syscalls on the one
hand, and having a contiguous and small array of syscalls on the other
hand.  I suspect it's slightly nicer to have a contiguous and small
array, as long as we've provided enough room for architectures to add
extra syscalls, but I'm not strongly married to this position.

For what it's worth, from Tilera's point of view we can certainly
tolerate changes in this area; we have not released any of this new
syscall ABI stuff to customers yet, so thrashing this just involves an

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--

From: Jiri Slaby
Date: Tuesday, August 10, 2010 - 9:43 am

I would do that if the tree reached linus's tree earlier, so that I
could rebase my tree on the top of that. Otherwise I couldn't do much
with that.

The resolving (merge) in -next is done by Stephen, so he probably
misunderstood us. (Oh, I could have a for-next branch where I would
merge your tree to solve the -next merging done by Stephen, but it
wouldn't solve the situation we got into now.)

thanks,
-- 
js
suse labs
--

From: Arnd Bergmann
Date: Tuesday, August 10, 2010 - 11:50 am

Right. The writable_rlimits syscall should just go after wait4 at 262.

In retrospect, it would have been nicer to have the architecture specific
syscalls start at zero, but it's too late for that. Since we don't have
an architecture with more than a handful of arch specific calls, I think
16 will get us a very long way, while trying to leave "enough" space
between the generic and the arch specific calls would result either
in wasting space in the table or chosing a too small value.

	Arnd
--

From: Linus Torvalds
Date: Tuesday, August 10, 2010 - 12:12 pm

.. and in the meantime I added the notify tree too, so now the
x86(-64) numbers also clashed.

So I just moved the prlimit64() system call, both on x86[-64] and in
asm-generic/unistd.h

Pushed out. Guys, please verify that it looks ok.

                    Linus
--

From: Chris Metcalf
Date: Tuesday, August 10, 2010 - 12:43 pm

It looks good in asm-generic/unistd.h; thanks.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--

From: Jiri Slaby
Date: Tuesday, August 10, 2010 - 2:44 pm

To me too, except the nits below.

---
 arch/x86/ia32/ia32entry.S          |    2 +-
 arch/x86/kernel/syscall_table_32.S |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 91dc4bb..b9472ec 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -844,5 +844,5 @@ ia32_sys_call_table:
        .quad compat_sys_recvmmsg
        .quad sys_fanotify_init
        .quad sys32_fanotify_mark
-       .quad sys_prlimit64
+       .quad sys_prlimit64                     /* 340 */
 ia32_syscall_end:
diff --git a/arch/x86/kernel/syscall_table_32.S
b/arch/x86/kernel/syscall_table_32.S
index 4802acc..b35786d 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -339,4 +339,4 @@ ENTRY(sys_call_table)
        .long sys_recvmmsg
        .long sys_fanotify_init
        .long sys_fanotify_mark
-       .long sys_prlimit64
+       .long sys_prlimit64             /* 340 */

--

From: Arnd Bergmann
Date: Tuesday, August 10, 2010 - 7:39 pm

Looks good, thanks!

	Arnd
--

From: Linus Torvalds
Date: Tuesday, August 10, 2010 - 9:24 am

On Tue, Aug 10, 2010 at 9:01 AM, Linus Torvalds

I should have clarified that. The new asm-generic prlimit64 system
call was added at the end (as 244), not in general. Only tilera and
score use that "generic" unistd.h file currently, and score doesn't do
any other system calls, which is why it's really only arch/tile that
is affected by this. Of course, new architectures are likely to use
that model, but we don't care about those yet.

I still think that starting the arch-specific ones at 512 is likely
the right model. I just wanted to clarify in case somebody thought
that x86 put a new system call at 244.

                        Linus
--

Previous thread: [patch] apparmor: issue with ns name without a following profile by Dan Carpenter on Saturday, August 7, 2010 - 4:50 am. (2 messages)

Next thread: Building kernel 2.0.40 with GCC 4.4 by Alex Buell on Saturday, August 7, 2010 - 5:31 am. (1 message)