Re: [patch] PID namespace design bug, workaround

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Dave Hansen <haveblue@...>, Andrew Morton <akpm@...>, Pavel Emelyanov <xemul@...>, Ulrich Drepper <drepper@...>, <linux-kernel@...>, Dinakar Guniguntala [imap] <dino@...>, Sripathi Kodi <sripathik@...>
Date: Saturday, November 3, 2007 - 4:12 pm

* Linus Torvalds <torvalds@linux-foundation.org> wrote:


i see two main categories of problems:

- one problem is that this condition is 'invisible'. If two namespaces 
  happen to access the same robust futex (say a yum update from two 
  PID namespaces sharing the same read-mostly filesystem) there's silent
  breakage and data corruption due to PID overlap. The other
  namespaces have no such problems. I think the "dont do that" answer is
  lame because most apps _will_ work across PID namespaces because 
  things like fcntl based locking does work. And there's no valid
  technical excuse why futexes shouldnt work: it's all controlled by the
  same native kernel, there's no untrusted network separating the nodes,
  etc.

- so via this we isolate an important category of syscalls from
  cross-namespace use perhaps forever. Pick just about any other kernel
  resource and they can be shared between namespaces. But not futexes -
  which happen to be the most scalable locking primitive and people will
  almost certainly want to use them across namespaces. A
  completely new breed of futexes has to be introduced and trickled
  through userspace and all the architectures to make it work again
  across namespaces. Who will do that work? Generally the people who
  introduce a new concept are the ones who should do that. But in this
  case they are apparently not interested in making it generic enough
  (they are concentrated on their 'isolate it all' aspect) so
  nobody else will do and we are stuck with an incomplete concept.

The answer of user-space/apps is predictable: they'll gravitate towards 
the path of least resistance, and that will be "dont use futexes". PID 
namespaces basically single out an important API category and use the 
natural pressure of the other 300 syscalls and tens of thousands of apps 
against this category. Linux is basically used against itself. The 
counter-force is relatively weak and there's no solution available _at 
all_ presently so it's not even the fight of patches against each other, 
it's the sheer lack of a feature which has an obvious end-result.

We've already got way too many incomplete concepts and APIs in the 
kernel. Maybe i'm over-worrying, but i fear we end up like with 
capabilities or sendfile - code merged too soon and never completed for 
many years - perhaps never completed at all. VMS and WNT did those 
things a bit better i think - their API frameworks were/are pervasive 
and complete, even in the corner cases.

Whether it's the right approach to force reasonable perfection of 
frameworks like this from the get go is another question - but in 
practice even for relatively popular new APIs like epoll we see a way 
too slow movement towards the 'completion of the API', and that hinders 
adoption of new APIs very much. (With splice being a notable exception - 
there the central concept was so strong that it quickly pushed itself to 
total completion - combined with a capable maintainer of the API.) But 
it's not that easy for futexes and we put another roadblock in the path 
of futexes.

	Ingo
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch] PID namespace design bug, workaround, Ingo Molnar, (Thu Nov 1, 10:43 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Thu Nov 1, 11:02 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Thu Nov 1, 10:53 am)
Re: [patch] PID namespace design bug, workaround, Ingo Molnar, (Thu Nov 1, 11:05 am)
Re: [patch] PID namespace design bug, workaround, Theodore Tso, (Thu Nov 1, 2:57 pm)
Re: [patch] PID namespace design bug, workaround, Ingo Molnar, (Thu Nov 1, 3:53 pm)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Thu Nov 1, 8:23 pm)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Thu Nov 1, 10:51 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Thu Nov 1, 10:56 am)
Re: [patch] PID namespace design bug, workaround, Dave Hansen, (Thu Nov 1, 12:12 pm)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Thu Nov 1, 11:05 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Thu Nov 1, 8:21 pm)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Fri Nov 2, 3:55 am)
Re: [patch] PID namespace design bug, workaround, Andrew Morton, (Fri Nov 2, 4:04 am)
Re: [patch] PID namespace design bug, workaround, Dave Hansen, (Fri Nov 2, 1:30 pm)
Re: [patch] PID namespace design bug, workaround, Linus Torvalds, (Fri Nov 2, 1:39 pm)
Re: [patch] PID namespace design bug, workaround, Ingo Molnar, (Sat Nov 3, 4:12 pm)
Futexes and network filesystems., Er ic W. Biederman, (Tue Nov 20, 6:53 pm)
Re: Futexes and network filesystems., Kyle Moffett, (Wed Nov 21, 2:16 am)
Re: Futexes and network filesystems., Eric W. Biederman, (Wed Nov 21, 2:30 am)
Re: [patch] PID namespace design bug, workaround, Linus Torvalds, (Sat Nov 3, 6:40 pm)
Re: [patch] PID namespaces, Ingo Molnar, (Sun Nov 4, 6:38 am)
Re: [patch] PID namespaces, Denys Vlasenko, (Mon Nov 5, 10:47 am)
Re: [patch] PID namespaces, Dave Hansen, (Sun Nov 4, 4:12 pm)
Re: [patch] PID namespace design bug, workaround, Arjan van de Ven, (Sat Nov 3, 7:55 pm)
Re: [patch] PID namespace design bug, workaround, Nicholas Miell, (Sat Nov 3, 12:02 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Fri Nov 2, 4:14 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Fri Nov 2, 10:05 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Fri Nov 2, 10:21 am)
Re: [patch] PID namespace design bug, workaround, Eric W. Biederman, (Sun Nov 4, 3:17 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Fri Nov 2, 11:34 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Fri Nov 2, 11:58 am)
Re: [patch] PID namespace design bug, workaround, Ulrich Drepper, (Sat Nov 3, 12:34 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Tue Nov 6, 3:49 am)
Re: [patch] PID namespace design bug, workaround, Theodore Tso, (Fri Nov 2, 5:39 pm)
Re: [patch] PID namespace design bug, workaround, Peter Zijlstra, (Thu Nov 1, 10:56 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Thu Nov 1, 11:06 am)
Re: [patch] PID namespace design bug, workaround, Ingo Molnar, (Thu Nov 1, 11:17 am)
Re: [patch] PID namespace design bug, workaround, Pavel Emelyanov, (Thu Nov 1, 11:30 am)