login
Header Space

 
 

Linux: Native POSIX Threading Library (NPTL)

September 20, 2002 - 4:45pm
Submitted by Jeremy on September 20, 2002 - 4:45pm.
Linux news

Ulrich Drepper recently announced the first public release of the Red Hat sponsored "Native POSIX Thread Library" (NPTL). He explained, "Unless major flaws in the design are found this code is intended to become the standard POSIX thread library on Linux system and it will be included in the GNU C library distribution."

One test mentioned in Ulrich's email - running 100,000 concurrent threads on an IA-32 - generated some interesting discussion. Ingo Molnar explained that with the current stock 2.5 kernel such a test requires roughly 1GB RAM, and the act of starting and stopping all 100,000 threads in parallel takes only 2 seconds. In comparison, with the 2.5.31 kernel (prior to Ingo's recent threading work), such a test would have taken around 15 minutes.

Ingo provides further details:

"With the default split and kernel stack we can start up 94,000 threads on x86. With Ben's/Dave's patch we can have up to 188,000 threads. With a 2:2 GB VM split configured we can start 376,000 threads. If someone's that desperate then with a 1:3 split we can start up 564,000 threads."

And Ingo's response to the logical followup question, "why so many threads, the answer is because we can :)". Much of the discussion follows, and is well worth the time it takes to read...


From: Ulrich Drepper
To: linux-kernel mailing list
Subject: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Thu, 19 Sep 2002 17:41:37 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We are pleased to announce the first publically available source
release of a new POSIX thread library for Linux. As part of the
continuous effort to improve Linux's capabilities as a client, server,
and computing platform Red Hat sponsored the development of this
completely new implementation of a POSIX thread library, called Native
POSIX Thread Library, NPTL.

Unless major flaws in the design are found this code is intended to
become the standard POSIX thread library on Linux system and it will
be included in the GNU C library distribution.

The work visible here is the result of close collaboration of kernel
and runtime developers. The collaboration proceeded by developing the
kernel changes while writing the appropriate parts of the thread
library. Whenever something couldn't be implemented optimally some
interface was changed to eliminate the issue. The result is this
thread library which is, unlike previous attempts, a very thin layer
on top of the kernel. This helps to achieve a maximum of performance
for a minimal price.


A white paper (still in its draft stage, though) describing the design
is available at

http://people.redhat.com/drepper/nptl-design.pdf

It provides a larger number of details on the design and insight into
the design process. At this point we want to repeat only a few
important points:

- - the new library is based on an 1-on-1 model. Earlier design
documents stated that an M-on-N implementation was necessary to
support a scalable thread library. This was especially true for
the IA-32 and x86-64 platforms since the ABI with respect to threads
forces the use of segment registers and the only way to use those
registers was with the Local Descriptor Table (LDT) data structure
of the processor.

The kernel limitations the earlier designs were based on have been
eliminated as part of this project, opening the road to a 1-on-1
implementation which has many advantages such as

+ less complex implementation;
+ avoidance of two-level scheduling, enabling the kernel to make all
scheduling decisions;
+ direct interaction between kernel and user-level code (e.g., when
delivering signals);
+ and more and more.

It is not generally accepted that a 1-on-1 model is superior but our
tests showed the viability of this approach and by comparing it with
the overhead added by existing M-on-N implementations we became
convinced that 1-on-1 is the right approach.

Initial confirmations were test runs with huge numbers of threads.
Even on IA-32 with its limited address space and memory handling
running 100,000 concurrent threads was no problem at all, creating
and destroying the threads did not take more than two seconds. This
all was made possible by the kernel work performed as part of this
project.

The only limiting factors on the number of threads today are
resource availability (RAM and processor resources) and architecture
limitations. Since every thread needs at least a stack and data
structures describing the thread the number is capped. On 64-bit
machines the architecture does not add any limitations anymore (at
least for the moment) and with enough resources the number of
threads can be grown arbitrarily.

This does not mean that using hundreds of thousands of threads is a
desirable design for the majority of applications. At least not
unless the number of processors matches the number of threads. But
it is important to note that the design on the library does not have
a fixed limit.

The kernel work to optimize for a high thread count is still
ongoing. Some places in which the kernel iterates over process and
threads remain and other places need to be cleaned up. But it has
already been shown that given sufficient resources and a reasonable
architecture an order of magnitude more threads can be created than
in our tests on IA-32.


- - The futex system call is used extensively in all synchronization
primitives and other places which need some kind of
synchronization. The futex mechanism is generic enough to support
the standard POSIX synchronization mechanisms with very little
effort.

The fact that this is possible is also essential for the selection
of the 1-on-1 model since only with the kernel seeing all the
waiters and knowing that they are blocked for synchronization
purposes will allow the scheduler to make decisions as good as a
thread library would be able to in an M-on-N model implementation.

Futexes also allow the implementation of inter-process
synchronization primitives, a sorely missed feature in the old
LinuxThreads implementation (Hi jbj!).


- - Substantial effort went into making the thread creation and
destruction as fast as possible. Extensions to the clone(2) system
call were introduced to eliminate the need for a helper thread in
either creation or destruction. The exit process in the kernel was
optimized (previously not a high priority). The library itself
optimizes the memory allocation so that in many cases the creation
of a new thread can be achieved with one single system call.

On an old IA-32 dual 450MHz PII Xeon system 100,000 threads can be
created and destroyed in 2.3 secs (with up to 50 threads running at
any one time).


- - Programs indirectly linked against the thread library had problems
with the old implementation because of the way symbols are looked
up. This should not be a problem anymore.


The thread library is designed to be binary compatible with the old
LinuxThreads implementation. This compatibility obviously has some
limitations. In places where the LinuxThreads implementation diverged
from the POSIX standard incompatibilities exist. Users of the old
library have been warned from day one that this day will come and code
which added work-arounds for the POSIX non-compliance better be
prepared to remove that code. The visible changes of the library
include:


- - The signal handling changes from per-thread signal handling to the
POSIX process signal handling. This change will require changes in
programs which exploit the non-conformance of the old implementation.

One consequence of this is that SIGSTOP works on the process. Job
control
in the shell and stopping the whole process in a debugger work now.

- - getpid() now returns the same value in all threads

- - the exec functions are implemented correctly: the exec'ed process gets
the PID of the process. The parent of the multi-threaded application
is only notified when the exec'ed process terminates.

- - thread handlers registered with pthread_atfork are not anymore run
if vfork is used. This isn't required by the standard (which does
not define vfork) and all which is allowed in the child is calling
exit() or an exec function. A user of vfork better knows what s/he
does.

- - libpthread should now be much more resistant to linking problems: even
if the application doesn't list libpthread as a direct dependency
functions which are extended by libpthread should work correctly.

- - no manager thread

- - inter-process mutex, read-write lock, conditional variable, and barrier
implementations are available

- - the pthread_kill_other_threads_np function is not available. It was
needed to work around the broken signal handling. If somebody shows
some existing code which makes legitimate use of this function we
might add it back.

- - requires a kernel with the threading capabilities of Linux 2.5.36.



The sources for the new library are for the time being available at

ftp://people.redhat.com/drepper/nptl/

The current sources contain support only for IA-32 but this will
change very quickly. The thread library is built as part of glibc so
the complete set of glibc sources is available as well. The current
snapshot for glibc 2.3 (or glibc 2.3 when released) is necessary. You
can find it at

ftp://sources.redhat.com/pub/glibc/snapshots

Final releases will be available on ftp.gnu.org and its mirrors.


Building glibc with the new thread library is demanding on the
compilation environment.

- - The 2.5.36 kernel or above must be installed and used. To compile
glibc it is necessary to create the symbolic link

/lib/modules/$(uname -r)/build

to point to the build directory.

- - The general compiler requirement for glibc is at least gcc 3.2. For
the new thread code it is even necessary to have working support for
the __thread keyword.

Similarly, binutils with functioning TLS support are needed.

The (Null) beta release of the upcoming Red Hat Linux product is
known to have the necessary tools available after updating from the
latest binaries on the FTP site. This is no ploy to force everybody
to use Red Hat Linux, it's just the only environment known to date
which works. If alternatives are known they can be announced on the
mailing list.

- - To configure glibc it is necessary to run in the build directory
(which always should be separate from the source directory):

/path/to/glibc/configure --prefix=/usr --enable-add-ons=linuxthreads2
--enable-kernel=current --with-tls

The --enable-kernel parameter requires that the 2.5.36+ kernel is
running. It is not strictly necessary but helps to avoid mistakes.
It might also be a good idea to add --disable-profile, just to speed
up the compilation.

When configured as above the library must not be installed since it
would overwrite the system's library. If you want to install the
resulting library choose a different --prefix parameter value.
Otherwise the new code can be used without installation. Running
existing binaries is possible with

elf/ld.so --library-path .:linuxthreads2:dlfcn:math ...

Alternatively the binary could be build to find the dynamic linker
and DSO by itself. This is a much easier way to debug the code
since gdb can start the binary. Compiling is a bit more complicated
in this case:

gcc -nostdlib -nostartfiles -o csu/crt1.o csu/crti.o
$(gcc --print-file-name=crtbegin.o)
-Wl,-rpath,$PWD,-dynamic-linker,$PWD/ld-linux.so.2
linuxthreads2/libpthread.so.0 ./libc.so.6 ./libc_nonshared.a
elf/ld-linux.so.2 $(gcc --print-file-name=crtend.o) csu/crtn.o

This command assumes that it is run in the build directory. Correct
the paths if necessary. The compilation will use the system's
headers which is a good test but might lead to strange effects if
there are compatibility bugs left.


Once all these prerequisites are met compiling glibc should be easy.
But there are some tests which will flunk. For good reasons we aren't
officially releasing the code yet. The bugs are either in the TLS
code which is not enabled in the standard glibc build, or obviously in
the thread library itself. To run the tests for the thread library
run

make subdirs=linuxthreads2 check

One word on the name 'linuxthreads2' of the directory. This is only a
convenience thing so that the glibc configure scripts don't complain
about missing thread support. It will we changed to reflect the real
name of the library ASAP.


What can you expect?

This is a very early version of the code so the obvious answer is:
some problems. The test suite for the new thread code should pass but
beside that and some performance measurement tool we haven't run much
code. Ideally we would get people to write many more of these small
test programs which are included in the sources. Compiling big
programs would mean not being able to locate problems easy. But I
certainly won't object to people running and debugging bigger
applications. Please report successes and failures to the mailing
list.

People who are interested in contributing must be aware that for any
non-trivial change we need an assignment of the code to the FSF. The
process is unfortunately necessary in today's world.

People who are contaminated by having worked on proprietary thread
library implementation should not participate in discussions on the
mailing list unless they willfully disclose the information. Every
bit of information is publically available from the mailing list
archive.


Which brings us to the final point: the mailing list for *all*
discussions related to this thread library implementation is

phil-list AT redhat.com

Go to

https://listman.redhat.com/mailman/listinfo/phil-list

to subscribe, unsubscribe, or review the archive.

- --
- ---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper ,-------------------' Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9im7E2ijCOnn/RHQRApe9AKCN20A8A5ITi3DUq+3IRZ0gsSVHTQCeKqEu
fA5OFtNuzYqltxSMoL8Ambw=
=4pb4
-----END PGP SIGNATURE-----



From: Rik van Riel
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Thu, 19 Sep 2002 23:01:33 -0300 (BRT)

On Thu, 19 Sep 2002, Ulrich Drepper wrote:

> Initial confirmations were test runs with huge numbers of threads.
> Even on IA-32 with its limited address space and memory handling
> running 100,000 concurrent threads was no problem at all,

So, where did you put those 800 MB of kernel stacks needed for
100,000 threads ?

If you used the standard 3:1 user/kernel split you'd be using
all of ZONE_NORMAL for kernel stacks, but if you use a 2:2 split
you'll end up with a lot less user space (bad if you want to
have many threads in the same address space).

Do you have some special solutions up your sleeve or is this
in the category of as-of-yet unsolved problems ?

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: september@surriel.com trac@trac.org



From: Larry McVoy
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Thu, 19 Sep 2002 19:17:39 -0700

On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
> On Thu, 19 Sep 2002, Ulrich Drepper wrote:
>
> > Initial confirmations were test runs with huge numbers of threads.
> > Even on IA-32 with its limited address space and memory handling
> > running 100,000 concurrent threads was no problem at all,
>
> So, where did you put those 800 MB of kernel stacks needed for
> 100,000 threads ?

Come on, you and I normally agree, but 100,000 threads? Where is the need
for that? More importantly, is there any realistic application that can
use 100,000 threads where the kernel stack is 0 but the user level stack
doesn't have exactly the same problem? The kernel can be perfect, i.e.,
cost zero, and you still have a problem.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm



From: Rik van Riel
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Thu, 19 Sep 2002 23:24:58 -0300 (BRT)

On Thu, 19 Sep 2002, Larry McVoy wrote:
> On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:

> > So, where did you put those 800 MB of kernel stacks needed for
> > 100,000 threads ?
>
> Come on, you and I normally agree, but 100,000 threads? Where is the
> need for that?

I agree, it's pretty silly. But still, I was curious how they
managed to achieve it ;)

OTOH, some applications are known for sillyness ...

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/



From: Ulrich Drepper
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Thu, 19 Sep 2002 19:32:20 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rik van Riel wrote:

> I agree, it's pretty silly. But still, I was curious how they
> managed to achieve it ;)

Ingo will be able to tell you when he gets up. This is not my area of
expertise. AFAIK there were no special changes involved; Ben's irq
stack patch would add to this number (I think Ingo said something about
188,000 threads or so).

- --
- ---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper ,-------------------' Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9ioi02ijCOnn/RHQRAnw+AJ9fFu36D8ZIk2Y3NC8Rpekb5EXwPwCePCBL
Z/u1XIdgB2F/UuixLkIpNvI=
=Ldzx
-----END PGP SIGNATURE-----



From: Linus Torvalds
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 06:01:47 +0000 (UTC)

Rik van Riel wrote:

>I agree, it's pretty silly. But still, I was curious how they
>managed to achieve it ;)

You didn't read the post carefully.

They started and waited for 100,000 threads.

They did not have them all running at the same time. I think the
original post said something like "up to 50 at a time".

Basically, the benchmark was how _fast_ thread creation is, not now many
you can run at the same time. 100k threads at once is crazy, but you can
do it now on 64-bit architectures if you really want to.

Linus



From: Ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 10:02:37 +0200 (CEST)

On Fri, 20 Sep 2002, Linus Torvalds wrote:

> They did not have them all running at the same time. I think the
> original post said something like "up to 50 at a time".

actually, that was Ulrich's other test, which tests the serial starting of
100,000 threads.

the test i did started up 100,000 concurrent threads which shot up the
load-average to a couple of thousands. [the default timeslice the parent
has is enough to start more than 50,000 parallel threads a pop or so.]

> Basically, the benchmark was how _fast_ thread creation is, not now many
> you can run at the same time. 100k threads at once is crazy, but you can
> do it now on 64-bit architectures if you really want to.

we did both, and on the dual-P4 testbox i have started and stopped 100,000
*parallel* threads in less than 2 seconds. Ie. starting up 100,000 threads
without any throttling, waiting for all of them to start up, then killing
them all. It needs roughly 1 GB of RAM to do this test on the default x86
kernel, it need roughly 500 MB of RAM to do this test with the IRQ-stacks
patch applied.

with 2.5.31 this test would have taken roughly 15 minutes, on the same
box, provided the NMI watchdog is turned off.

with 100,000 threads started up and idling silently the system is
completely usable - all the critical for_each_task loops have been fixed.
Obviously with 100,000 threads running at once there's some shortage in
CPU power :-) [ I will perhaps try that once, at SCHED_BATCH priority,
just for kicks. Not that it makes much sense - they will get a 3 seconds
worth of timeslice every 3 days. ]

Ingo



From: Ingo Molnar
Subject: Re: 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1]
Date: Fri, 20 Sep 2002 09:52:39 +0200 (CEST)

On Thu, 19 Sep 2002, Rik van Riel wrote:

> So, where did you put those 800 MB of kernel stacks needed for 100,000
> threads ?

With the default split and kernel stack we can start up 94,000 threads on
x86. With Ben's/Dave's patch we can have up to 188,000 threads. With a 2:2
GB VM split configured we can start 376,000 threads. If someone's that
desperate then with a 1:3 split we can start up 564,000 threads.

Anton tested 1 million concurrent threads on one of his bigger PowerPC
boxes, which started up in around 30 seconds. I think he saw a load
average of around 200 thousand. [ie. the runqueue was probably a few
hundred thousand entries long at times.]

> If you used the standard 3:1 user/kernel split you'd be using all of
> ZONE_NORMAL for kernel stacks, but if you use a 2:2 split you'll end up
> with a lot less user space (bad if you want to have many threads in the
> same address space).

the extreme high-end of threading typically uses very controlled
applications and very small user level stacks.

as to the question of why so many threads, the answer is because we can :)
This, besides demonstrating some of the recent scalability advances, gives
us the warm fuzzy feeling that things are right in this area. I mean,
there are architectures where Linux could map a petabyte of RAM just fine,
even though that might not be something we desperately need today.

Ingo




From: Adrian Bunk
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 11:54:20 +0200 (CEST)

On Thu, 19 Sep 2002, Ulrich Drepper wrote:

>...
> Unless major flaws in the design are found this code is intended to
> become the standard POSIX thread library on Linux system and it will
> be included in the GNU C library distribution.
>...
> - - requires a kernel with the threading capabilities of Linux 2.5.36.
>...


My personal estimation is that Debian will support kernel 2.4 in it's
stable distribution until 2006 or 2007 (this is based on the experience
that Debian usually supports two stable kernel series and the time between
stable releases of Debian is > 1 year). What is the proposed way for
distributions to deal with this?


cu
Adrian
--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox


From: Ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 12:53:49 +0200 (CEST)

On Fri, 20 Sep 2002, Adrian Bunk wrote:

> > - - requires a kernel with the threading capabilities of Linux 2.5.36.
> >...
>
> My personal estimation is that Debian will support kernel 2.4 in it's
> stable distribution until 2006 or 2007 (this is based on the experience
> that Debian usually supports two stable kernel series and the time
> between stable releases of Debian is > 1 year). What is the proposed way
> for distributions to deal with this?

Ulrich will give a fuller reply i guess, but the new threading code in 2.5
does not disable (or in any way obsolete) the old glibc threading library.
So by doing boot-time kernel version checks glibc can decide whether it
wants to provide the new library or the old library.

Ingo




From: Bill Huey (Hui)
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 03:20:31 -0700

On Thu, Sep 19, 2002 at 05:41:37PM -0700, Ulrich Drepper wrote:
> It is not generally accepted that a 1-on-1 model is superior but our
> tests showed the viability of this approach and by comparing it with
> the overhead added by existing M-on-N implementations we became
> convinced that 1-on-1 is the right approach.

Maybe not but...

You might like to try a context switching/thread wakeup performance
measurement against FreeBSD's libc_r. I'd imagine that it's difficult
to beat a system like that since they keep all of that stuff in
userspace since it's just 2 context switches and a call to their
thread-kernel.

I'm curious as to the rough numbers you got doing the 1:1 and M:N
comparison.

bill



From: Ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 12:47:12 +0200 (CEST)

On Fri, 20 Sep 2002, Bill Huey wrote:

> You might like to try a context switching/thread wakeup performance
> measurement against FreeBSD's libc_r. I'd imagine that it's difficult to
> beat a system like that since they keep all of that stuff in userspace
> since it's just 2 context switches and a call to their thread-kernel.

our kernel thread context switch latency is below 1 usec on a typical P4
box, so our NPT library should compare pretty favorably even in such
benchmarks. We get from the pthread_create() call to the first user
instruction of the specified thread-function code in less than 2 usecs,
and we get from pthread_exit() to the thread that does the pthread_join()
in less than 2 usecs as well - all of these operations are done via a
single system-call and a single context switch.

also consider the fact that the true cost of M:N threading does not show
up with just one or two threads running. The true cost comes when
thousands of threads are running, each of them doing nontrivial work that
matters, ie. IO. The true cost of M:N shows up when threading is actually
used for what it's intended to be used :-) And basically nothing offloads
work to threads for them to just do userspace synchronization - real,
useful work always involves some sort of IO and kernel calls. At which
point M:N loses out badly.

M:N's big mistake is that it concentrates on what matters the least:
useruser context switches. Nothing really wants to do that. And if it
does, it's contended on some userspace locking object, at which point it
doesnt really matter whether the cost of switching is 1 usec or 0.5 usecs,
the main application cost is the lost paralellism and increased cache
trashing due to the serialization - independently of what kind of
threading abstraction is used.

and since our NPT library uses futexes for *all* userspace synchronization
primitives (including internal glibc locks), all uncontended
synchronization is done purely in user-space. [and for the contended case
we *want* to switch into the kernel.]

Ingo



From: Bill Huey (Hui)
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 05:06:06 -0700

On Fri, Sep 20, 2002 at 12:47:12PM +0200, Ingo Molnar wrote:
> our kernel thread context switch latency is below 1 usec on a typical P4
> box, so our NPT library should compare pretty favorably even in such
> benchmarks. We get from the pthread_create() call to the first user
> instruction of the specified thread-function code in less than 2 usecs,
> and we get from pthread_exit() to the thread that does the pthread_join()
> in less than 2 usecs as well - all of these operations are done via a
> single system-call and a single context switch.

That's outstanding...

> also consider the fact that the true cost of M:N threading does not show
> up with just one or two threads running. The true cost comes when
> thousands of threads are running, each of them doing nontrivial work that
> matters, ie. IO. The true cost of M:N shows up when threading is actually
> used for what it's intended to be used :-) And basically nothing offloads
> work to threads for them to just do userspace synchronization - real,
> useful work always involves some sort of IO and kernel calls. At which
> point M:N loses out badly.

It can. Certainly, if IO upcall overhead is greater than just running the
thread that's blocked inside the kernel, then yes. Not sure how this is all
going to play out...

> M:N's big mistake is that it concentrates on what matters the least:
> useruser context switches. Nothing really wants to do that. And if it
> does, it's contended on some userspace locking object, at which point it
> doesnt really matter whether the cost of switching is 1 usec or 0.5 usecs,
> the main application cost is the lost paralellism and increased cache
> trashing due to the serialization - independently of what kind of
> threading abstraction is used.

Yeah, that's not a new argument and is a solid criticism...

Hmmm, random thoughts... This is probably outside the scope of lkml,
but...

I'm trying to think up a possible problem with how the JVM does threading that
might be able to exploit this kind of situation...Hmm, there's locks on
the method dictionary, but that's not something that's generally changing a
lot of the time... I'll give it some thought.

The JVM needs a couple of pretty critical things that are a bit off from
the normal Posix threading standard. One of them is very fast thread
suspension for both individual threads and the all threads accept the
currently running one...

In the Solaris threads implementation of JVM/HotSpot it has two methods of
getting a ucontext for doing GC and wierd exception/signal handling via
safepoints (a JIT compiler goody) and it would be nice to have...

1) Slow Version. Throw a SIGUSR1 at a thread and read/write the ucontext on
the signal frame itself.

2) Fast Version. The thread state and ucontext is examined directly to determine
the validity of the stored thread context, whether it's blocked on
a syscall (ignore it) or was doing a CPU intensive operation (use it).

That ucontext is used for various things:

a) Proper GC so that registers that might contain valid references are
taken into account properly to maintain the correctness of the
mark/sweep algorithms.

b) The thread's program counter value is altered to deal with safepoints.

(2) above being the most desireable since it's a kind of fast path for
(a) and (b).

So userspace exposure to the thread's ucontext would be a good thing.
I'm not sure how this is dealt within the current implementation of
what you folks are doing at this moment.

> primitives (including internal glibc locks), all uncontended
> synchronization is done purely in user-space. [and for the contended case
> we *want* to switch into the kernel.]

If there's any thing on this planet that's going to stress a threading
system, it's going to be the JVM. I'll give what you've said a some
thought. My bias has been to FreeBSD's KSE project for the most part
over this last threading/development run.

/me thinks...

bill



From: Ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 18:20:10 +0200 (CEST)

On Fri, 20 Sep 2002, Bill Huey wrote:

> The JVM needs a couple of pretty critical things that are a bit off from
> the normal Posix threading standard. One of them is very fast thread
> suspension for both individual threads and the all threads accept the
> currently running one...

the user contexts for active but preempted threads are stored in the
kernel stack. To support GC safepoints we need fast access to the current
state of every not voluntarily preempted thread. This is admittedly easier
if threads are abstraced in user-space [in which case the context is
stored in user-space], but the question is, what is more important, an
occasional pass of garbage collection, or the cost of doing IO?

until then it can be done via sending SIGSTOP/SIGCONT to the process PID
from the garbage collection thread, which should stop all threads pretty
efficiently in 2.5.35+ kernels. Then all threads that are not voluntarily
sleeping can be fixed up via ptrace calls.

and it can be further improved by tracking preempted user contexts in the
scheduler and giving fast access to them via a syscall. (all voluntarily
sleeping contexts can properly prepare their suspension state in
userspace.) So it's possible to do it efficiently.

how frequently does the GC thread run?

Ingo




From: Jim Nance
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 08:37:36 -0400

On Thu, Sep 19, 2002 at 05:41:37PM -0700, Ulrich Drepper wrote:

> We are pleased to announce the first publically available source
> release of a new POSIX thread library for Linux. As part of the
> continuous effort to improve Linux's capabilities as a client, server,
> and computing platform Red Hat sponsored the development of this
> completely new implementation of a POSIX thread library, called Native
> POSIX Thread Library, NPTL.

Is this related to the thread library work that IBM was doing
or was this independently developed?

Thanks,

Jim



From: Ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 18:42:48 +0200 (CEST)

> Is this related to the thread library work that IBM was doing or was
> this independently developed?

independently developed.

Ingo



Related Links:



Interviews With Members Of Above Thread:

  • Larry McVoy
  • some shortage ;-)

    September 20, 2002 - 6:53pm

    In particular I liked the comment from Ingo to running 100,000 threads at once:

     Obviously with 100,000 threads running at once there's some shortage in
     CPU power :-) [ I will perhaps try that once, at SCHED_BATCH priority,
     just for kicks. Not that it makes much sense - they will get a 3 seconds
     worth of timeslice every 3 days. ]

    Makes one second a day for every thread ... ;-)

    86400 seconds in a day.

    September 21, 2002 - 2:25pm
    Anonymous

    Actually, if you want to get pedantic, 100000 runnable threads would each get, on average, 0.864 seconds of CPU time a day if nothing else was using the CPU. A patch that raised that to 1 second per day on average (on a single CPU machine) would be interesting indeed!

    You forgot about leap years.

    September 22, 2002 - 12:43pm
    Anonymous

    You twat. ;-)

    <pre> tags

    September 20, 2002 - 7:09pm

    I noticed this story uses <pre> tags for the mails. Should all new submissions follow this?

    re: &lt;pre&gt; tags

    September 20, 2002 - 8:14pm

    It's a good idea. The only reason I haven't is a formatting issue with Drupal and <pre> tags. It's something I've never dug into, but I'll try and resolve it soon.

    Good stuff but...

    September 20, 2002 - 7:59pm
    Anonymous

    I am very happy to see this developing, how does this threading compare to OS's who already have supposedly have "world class" threading, such as Solaris and BeOS?

    Unfortunately, I don't see myself installing a glibc2.3 snapshot any time soon so I can't test this out on my dual p3 800 system... but maybe, just maybe, I might put myself through it to at least see what happens.

    I built a 2.5.36 kernel last night only to have my system crash right at the beginning of the boot process (screen scrolls too fast so I can't really see what the problem is, just that it crapped out) and unless I have more luck with 2.5.37, I can't see myself giving this a shot any time too soon.

    Is there any increased threading performance even without using NPTL?

    When is Linux 2.6 due? I've heard about a november-december range.

    Whether that's true or not, it seems too soon to me, hopefully development will last somewhat longer so more things are accomplished and more new toys make it in!

    -Hiryu

    re: Good stuff but...

    September 21, 2002 - 3:45am

    I built a 2.5.36 kernel last night only to have my system crash right at the beginning of the boot process
    I get that as well. But then, it IS the development kernel.

    Is there any increased threading performance even without using NPTL?
    I don't think so. [not positive, I read something about it being compatible with both pthread implementations.]

    When is Linux 2.6 due? I've heard about a november-december range.
    Not quite, 2.5 goes into freeze sometime in October, then it will take probably at least 6 months to get it stable. A lot has happened in 2.5...

    You will see increased

    September 22, 2002 - 1:15pm
    Anonymous

    You will see increased performance, even with the old thead library. Most of the speed improvements are in the kernel alone. Ingo has speeded up the old system calls.

    The user-spaces changes are:

    - Changes to make theading more POSIX compliant.

    - Switching to futexes, which are a faster synchronization primitive.

    And?

    September 21, 2002 - 12:03pm

    unless I have more luck with 2.5.37, I can't see myself giving this a shot any time too soon

    Unless you're a kernel hacker and want to help development, you really SHOULDN'T be using 2.5.

    You're funny.

    September 22, 2002 - 4:15am
    Anonymous

    Um, you would have answered your own question if you took the time to read up on how this works.

    If you don't know what you're doing, don't run odd-minor number kernels. Wait for your distribution to catch up.

    Don't suffer. You don't know what you're doing. Wait until those who do fix it for you.

    If this comment doesn't make sense, or seems harsh, please look at http://www.kernel.org/. Read through things, this time.

    Ha!

    September 23, 2002 - 5:45pm
    Anonymous

    Well,

    I've been using UNIX since 1996 and Linux since 1998... I think it's safe to say that I know how it works by now.

    I am also somewhat profficient with C... I've already fixed several compile time errors I've come across for the 2.4 series on
    x86, ppc, and sparc based systems (usually something that's not too big of a deal).

    I figured I could at least try 2.5 (I've tried previous versions of 2.5 already) and see what's going on.

    There's not much I can do however if the system locks up right after completely scrolling past the point where the actual
    error is displayed right?

    So I am sorry if I came off like a complete newbie.

    Boot panic fixed in 2.5.38

    September 22, 2002 - 6:07pm

    >I built a 2.5.36 kernel last night only to have my system crash right at the
    >beginning of the boot process (screen scrolls too fast so I can't really
    >see what the problem is, just that it crapped out) and unless I have more
    >luck with 2.5.37, I can't see myself giving this a shot any time too soon.

    This problem has been fixed in kernel 2.5.38

    growing up now...

    September 20, 2002 - 9:13pm
    Anonymous

    hi,

    from the bsd's point of view... linux seems to grow up now... finally... it was time... for years linux lacks such important features... now... maybe somethings changes... linux-2.5 is growing up including many important new features, added in the last releases... and im amazed by the speed this all happens... the whole kernel seems to be under a major rewrite ;-) hey... if only linux would be a unix :-(

    Eugene

    I find that pretty rich, BSD

    September 21, 2002 - 10:01am
    Anonymous

    I find that pretty rich, BSD isn't exactly stellar at threading itself.

    heh

    September 22, 2002 - 4:10pm
    Anonymous

    Where's the "*BSD is dying" troll when you need him... :-)

    (Tongue planted firmly in cheek.)

    I heard that

    September 22, 2002 - 8:48pm
    Anonymous

    Indeed. The user-level threading in BSD pretty much sucks ass. There simply is no kernel threading.

    My group's main application is heavily threaded, and works fine on Solaris (sparc & x86) and Linux, but chokes on FreeBSD.

    FreeBSD has great plans for their threading, but I wouldn't hold my breath, since apparently the kernel developers don't really believe in threading. And it sounds like it will have a hard time surpassing Linux, since there are two separate groups working on improving the threading stuff (RedHat and IBM).

    my no cents...

    September 20, 2002 - 10:55pm

    Hiryu:
    > When is Linux 2.6 due?

    As Stallman would say: It will be released when it's ready and
    it will be ready sooner if you help.

    Seriously, there are no deadlines. Feature freeze is _planned_
    for the end of October. 2.4 took nearly a year to be released
    after feature freeze was declared. 2.5 contains a many more
    big changes than 2.3 did. I don't think 2.4 is a great kernel,
    I reckon 2.6 will kick ass but it might take a long time to
    reach a releasable state.

    Eugene:
    > if only linux would be a unix :-(

    huh?
    Linux is not a unix, it's a unix-style kernel.
    GNU is not unix, it's a unix-style OS.
    In my opinion, GNU/Linux is the best "unix" around.

    Ciaran O'Riordan

    heh

    September 21, 2002 - 11:36pm
    Anonymous

    >As Stallman would say: It will be released when it's ready and
    >it will be ready sooner if you help.

    Well, of course, that goes without saying, I was just wondering what the developers were looking towards is all, an exact date is out of the question.

    Where can I see all the planned features for 2.6? Are we getting acl's and attributes?

    Freezing the kernel means no more new stuff, that just finish the stuff they have now correct? Not in the debian sense where they only look for bugs because it sounds like a fair amount of things in 2.5 aren't really complete even if you don't count the lack of adequate testing.

    As for linux "finally growing up", I believe Linux has been better at threading, Matt Dillon himself has said linux is a year ahead in SMP, I think Linux finally began growing up with 2.2, and 2.4 is only a continuation of that (when 2.4 works, it works a lot better than 2.2... when it works), 2.6 looks like it will be a lot of fun!

    heh is right!

    September 22, 2002 - 4:17am

    > Where can I see all the planned features for 2.6?

    A good list of what's done and what's to do is maintained at:
    www.kernelnewbies.org/status/latest.html

    > Are we getting acl's and attributes?

    I'm not too well up on this but I believe the ACL core code has
    already been merged. i.e. the VFS work is done but the filesystems
    don't take advantage of this yet. Getting everyone to agree on a
    good filesystem interface will take a while but it looks like ACLs
    will be in 2.6. (I'm not sure where attributes are.)

    > Freezing the kernel means no more new stuff [...] correct?

    More of a stiff sludge than a freeze. New stuff that doesn't
    effect the core kernel and is viewed at removing bad code will
    still be accepted. That's my understanding anyway. Some of the
    grey notes (post Halloween plans) on kernelnewbies seem to
    confirm this.

    > looks like it will be a lot of fun!

    Definitely. If there are no serious data corruption problems, I
    hope to start using it as my desktop kernel in Jan 03.

    Ciaran O'Riordan

    sooner if I help?

    September 23, 2002 - 11:14am
    Anonymous

    I don't think so. It might not help to complain, but I don't know how I could help either.

    sooner if YOU help

    September 23, 2002 - 12:26pm

    Everyone can help. There is easy work as well as hard work to
    be done in the linux. Someone has to do the easy work.

    Try the kernel-janitor mailing list for a list of simple tasks
    and how you can help:
    http://kernel-janitor.sourceforge.net/

    Ciaran O'Riordan

    As Stallman would say: It wil

    November 5, 2003 - 4:29pm
    Anonymous

    As Stallman would say: It will be released when it's ready and
    it will be ready sooner if you help.

    Bit rich him saying that, he's not even writing 2.6.

    Ingo interview

    September 21, 2002 - 11:55am

    With all the threading work in 2.5 I think it would only be fair to feature an interview with the man, the legend... the threading god.. Ingo..

    Jeremy - do me a huge personal favor and convince that guy to answer a few questions and give us some insight in the work he's been doing

    re: Ingo interview

    September 21, 2002 - 12:53pm

    I hate to spoil a surprise, so let me just reply: check back soon... ;)

    Lemme guess

    September 21, 2002 - 1:29pm

    He'll feature as this months Kerneltrap centerfold kernel hacker ?

    Ingo The Centerfold

    September 22, 2002 - 11:22pm
    Anonymous

    > He'll feature as this months Kerneltrap centerfold kernel hacker ?

    Im sure glad Jeremy rusn this site and not David. I really apreciate teh interviews, now David's hope of a Hacker Centerfold is the msot disturbing thing ive ever heard.

    Fight Hacker Porn!!!

    j/k

    September 23, 2002 - 9:25am

    I was of course just kidding - although my nickname on certain boards is Lovechild... :)

    The interviews was what drew me to this site in the first place - now I get most of kernel news here - I consider this one of the few trustworthy places for that kind of information.

    Good work Jeremy..

    [phil-list] first NPT vs. NGPT vs. LinuxThreads benchmark result

    September 22, 2002 - 12:55pm

    Very very promising results.
    Guess I'll unsubscribe from the NGPT-list, now that NPTL is tremendiously faster :D

    first NPT vs. NGPT vs. LinuxThreads benchmark results

    PS: I know that the subjects says "NPT" isntead of "NPTL", Ulrich Drepper used that subject to post to the list, so don't ask.

    --
    I used to have a sig until the great Kahuna of FOOness
    told me to dump it and use /dev/urandom instead.

    what about the nptl vs win32

    April 25, 2004 - 12:28pm
    Anonymous

    what about the nptl vs win32 thread and the speed or the overcome linux spawn the thread vs the windows spawn(obviously within nptl) ?

    didn't nobody to compare ?

    NPTL Test & Trace project

    May 11, 2004 - 7:48am
    Anonymous

    Hello,
    We are adding new tests about NPTL (compliance, stress) to the LTP.
    We also are working on a NPTL trace mechanism.

    Everyone willing to provide help (writing test cases or providing
    comments about the NPTL trace we're designing/writing) can contact me
    or look at our site:
    http://nptl.bullopensource.org/

    Writing POSIX threads tests is not easy at all. Since we have spent
    time reading and understanding the POSIX Thread standard, we are able
    to provide a clear description of the tests to be written. That way,
    you should need only a limited knowledge about POSIX details.

    Regards,

    Tony Reix ( Tony . Reix AT bull . net )
    Carpe Diem

    Many thanks for the interesti

    December 7, 2006 - 12:43pm

    Many thanks for the interesting site. Absorbing articles, rich archive. Will be back soon by all means.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.
    speck-geostationary