(changed the Subject line)
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
the difference to NFS is that for PID namespaces we do have a single
trusted kernel that fully controls all the domains so there's no obvious
"hard barrier of trust" that people could perceive as a showstopper.
We've got a global kernel and unlike other namespaces there's (almost)
no "directed allocation" done of specific PIDs (unlike files, socket
addresses or fds). So the PID is a cookie that is 99.9% shaped _by the
kernel already_. [there are a few exceptions but those are much less
problematic than the lack of global PIDs is] So we might as well shape
the cookies in a way that keeps them global. What is the technological
reason for not keeping PIDs globally unique? We've cited a good number
of reasons why it's desirable - it's a pretty damn useful cookie for
identifying tasks. (it's also very scalable - PID -> task lookup is
completely lockless.)
I.e. keep the namespace functionality but use a modulo 1.000.000 base
for the PIDs so that it all looks nicer to the user. Minimal visibility
difference but maximum compatibility. (The resulting limits are
reasonable: 1 million tasks per container and 4 million containers on a
single 32-bit box.) We could still restrict cross-namespace API use but
all the cases where a global PID is desirable would still all work. I
might be missing something obvious though.
The reason why i bring this up now is because 2.6.24 is an
all-or-nothing flag day for this detail. Once it's out there we wont
realistically be able to change any of these details. (And in general
i'm very supportive of the containers concept - a year ago at the KS i
was one of the very few proponents of quickly merging containers into
the kernel.)
Ingo
-