Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes

Previous thread: [GIT PULL] ALSA fixes by Takashi Iwai on Friday, September 26, 2008 - 9:13 am. (2 messages)

Next thread: [PATCH 1/4] futex: rely on get_user_pages() for shared futexes by Peter Zijlstra on Friday, September 26, 2008 - 10:32 am. (1 message)
From: Peter Zijlstra
Date: Friday, September 26, 2008 - 10:32 am

Since get_user_pages_fast() made it in, I thought to give this another try.
Lightly tested by disabling the private futexes and running some java proglets.


--

From: Ingo Molnar
Date: Saturday, September 27, 2008 - 9:17 am

hm, very interesting. Since this is an important futex usecase i started 
testing it in tip/core/futexes:

 cd33272: futex: cleanup fshared
 a135356: futex: use fast_gup()
 39ce77b: futex: reduce mmap_sem usage
 0d7a336: futex: rely on get_user_pages() for shared futexes

Nick, it would be nice to get an Acked-by/Reviewed-by from you, before 
we think about whether it should go upstream.

	Ingo
--

From: Nick Piggin
Date: Tuesday, September 30, 2008 - 12:21 am

Yeah, these all look pretty good. It's nice to get rid of mmap sem here.

Which reminds me, we need to put a might_lock mmap_sem into
get_user_pages_fast...

But these patches look good to me (last time we discussed them I thought
there was a race with page truncate, but it looks like you've closed that
by holding page lock over the whole operation...)

Nice work, Peter.

BTW. what kinds of things use inter-process futexes as of now?
--

From: Peter Zijlstra
Date: Tuesday, September 30, 2008 - 1:51 am

Just to be sure, I only hold the page lock over the get_futex_key() op,
and drop it after getting a ref on the futex key.

I then drop the futex key ref after the futex op is complete.

This assumes the futex key ref is suffucient to guarantee whatever is
needed - which is the point I'm still not quite sure about myself.

The futex key ref was used between futex ops, with I assume the intent
to ensure the futex backing stays valid. However, the key ref only takes
a ref on either the inode or the mm, neither which avoid the specific
address of the futex to get unmapped between ops.

So in that respect we're not worse off than before, and any application
doing: futex_wait(), munmap(), futex_wake() is going to suffer. And as
far as I understand it get the waiting task stuck in D state for
ever-more or somesuch.

By now not holding the mmap_sem over the full futex op, but only over
the get_futex_key(), that munmap() race gets larger and the actual futex
could disappear while we're working on it, but in all cases I looked at
that will make the futex op return -EFAULT, so we should be good there.

Gah, now that I look at it, it looks like I made get_futex_key()
asymetric wrt private futexes, they don't take a ref on the key, but


On a regular modern Linux system, not much. But I've been told there are
applications out there that do indeed make heavy use of them - as
they're part of POSIX etc.. blah blah :-)

Also some legacy stuff that's stuck on an ancient glibc (but somehow did
manage to upgrade the kernel) might benefit.


---
Subject: futex: fixup get_futex_key() for private futexes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

With the get_user_pages_fast() patches we made get_futex_key() obtain a
reference on the returned key, but failed to do so for private futexes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/futex.c b/kernel/futex.c
index 197fdab..beee9af 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -227,6 ...
From: Ingo Molnar
Date: Tuesday, September 30, 2008 - 3:39 am

applied to tip/core/futexes, thanks Peter!

	Ingo
--

From: Nick Piggin
Date: Tuesday, September 30, 2008 - 3:42 am

It is enough to guarantee enough to function normally I guess. Actually
futex syscalls are interesting in that they logically perform 2 totally
different operations and actually want 2 keys but it manages to mash
them into one (the user address).

They need a futex identifier, on which to wait/wake/etc. For anonymous
futexes, this is basically an arbitrary number which is taken from the
user address (but note that if that address is subsequently mremap()ed
for example, then the futex identifier does not follow the address). For
shared futexes, the pagecache address is used for the futex, which is
derived from the address.

Then they also need a memory address which is the target of the
particular operation requested. This doesn't actually logically have to
be directly related to the futex identifier...

I guess for all practical purposes, this is fine. "if you mremap a live
mutex, you lose" and similar probably doesn't constrain too many people
(although I don't think any of that is actually documented). But anyway,
just to point out that we do already have constraints on how existing

Yeah, I think you're fine there (the sleep is interruptible, so it should
not be a DoS or anything). Just so long as you always pin things like the
inode correctly while taking a ref on it...

--

From: Eric Dumazet
Date: Tuesday, September 30, 2008 - 3:55 am

inter-process futexes are still used for pthread creation/join 
(aka clear_child_tid / CLONE_CHILD_CLEARTID)

kernel/fork.c, functions mm_release() & sys_set_tid_address()

I am not sure how it could be converted to private futexes, since

Sorry I am lost...




--

From: Peter Zijlstra
Date: Tuesday, September 30, 2008 - 4:16 am

Ah, right - its a NOP, that's why it didn't show up in testing.

The thing is, I changed the semantics of get_futex_key() to return a key
with reference taken. And then noticed I didn't take one in the private
futex path, and failed to notice the ref ops are nops for private
futexes.

So yeah, the below patch is basically a NOP, but we might consider


--

From: Ulrich Drepper
Date: Tuesday, September 30, 2008 - 8:13 pm

We considered this back when but any effort seems too much.  We'd
either need a clone flag (a scarce resource) or replace the set_tid
_address syscall.  Given that the futex is woken once per thread
lifetime it shouldn't be an issue.  If the semaphore shows up even
after this patch feel free to introduce a new set_tid_address syscall.
--

From: Ingo Molnar
Date: Tuesday, September 30, 2008 - 3:39 am

great - i've added your Acked-by to the patches and have activated the 
tip/core/futexes branch for tip/auto-core-next linux-next integration.

here are the commits:

 42569c3: futex: fixup get_futex_key() for private futexes
 c2f9f20: futex: cleanup fshared
 734b05b: futex: use fast_gup()
 6127070: futex: reduce mmap_sem usage
 38d47c1: futex: rely on get_user_pages() for shared futexes

Thanks,

	Ingo
--

Previous thread: [GIT PULL] ALSA fixes by Takashi Iwai on Friday, September 26, 2008 - 9:13 am. (2 messages)

Next thread: [PATCH 1/4] futex: rely on get_user_pages() for shared futexes by Peter Zijlstra on Friday, September 26, 2008 - 10:32 am. (1 message)