Neal Walfield [interview] announced the first release of RMGPT, which "is (or rather, aspires to one day be) a complete, portable implementation of IEEE Std 1003.1-2001 threads [also] known as POSIX threads." I was excited to read Neal's announcement email, as this is a big step forward for the GNU/Hurd project. With this new pthreads library, it will soon be possible to run complex software packages on the Hurd, including the GNOME and KDE desktops, the OpenOffice suite, and the Mozilla web browser.
Regarding the name, RMGPT, Neal explains, "Most new program names are a bunch of letters stuck together. Only later does it become an acronym and the words become bound. This is boring; each new release of RMGPT will offer a fresh, new and exciting expansion of the 'acronym'." For this first release, RMGPT stands for "Rubbish, I asked for mine with Minced Garlic, Please Take this back".
Neal was kind enough to answer a few questions about his pthread efforts. Read on to learn more...
Neal Walfield: Beyond adding another important, commonly used interface, I think that a pthread implementation represents a large step forward in the public eye: we should soon have many more major software packages including GNOME, KDE, OpenOffice and Mozilla. The perceived lack of support for complex software was often assumed to be a result of a general lack of maturity on the part of the Hurd itself. In certain respects, this is correct: until now, there was no pthread implementation; there are still limits on the maximum size of file systems; and Mach only supports a limited amount of hardware. On the other hand, the Hurd was not designed to just clone an existing interface; the goal was to study what was available, explore the flaws and then redesign it. From this perspective, I think that Hurd has been very successful: the translator concept is incredibly powerful and flexible; and security wise, Unix just cannot compete.
JA: You say that RMGPT aspires to one day be a complete, portable implementation of IEEE Std 1003.1-2001 threads, also known as pthreads. How complete is it today?
Neal Walfield: With respect to the pthread interface, all of the prototypes are present; implementation wise, I think that we are about ninety percent done. The last ten percent consists of advanced scheduling features (e.g. mutex priority ceilings) and process shared resources (the ability to share, for instance, a mutex between multiple processes just using shared memory). Neither of these are terribly important from a usability perspective as not many applications take advantage of them, however, I am interested in implementing them. I think that the ABI should remain stable. I am relatively confident that the data structures are flexible and expandable enough to cover most future changes.
There are also bugs, however, the implementation seems to be relatively stable under normal application load. Several people have compiled some different packages over the past few days and they seem to be crashing of their own accord, not pthreads'.
JA: How long before you expect RMGPT to be fully completed?
Neal Walfield: The goal right now is to stabilize and get some people to test the code. Then we can concentrate on finishing the scheduling and process shared attributes and worry about optimizations. It should be integrated into the Debian unstable system some time this week. Applications will follow.
JA: How did you come up with the ever changing acronym RMGPT?
Neal Walfield: Take, for instance Perl and UVM: the authors are victims of their own genius: even though they stated that the name did not mean anything in particular, people have tried to guess what their real intentions were thus, de facto interpretations have come into use. I am blatantly telling everyone that it RMGPT will have a new meaning every release: life does not get any simpler. Plus, it will be less stress for the users.
JA: How close now would you estimate the GNU/Hurd is for another official release?
Neal Walfield: Getting closer everyday. In fact, I hope that by this time in October, we will be a whole month closer.
From: Neal H. Walfield
Subject: libpthread
Date: 28 Sep 2002 01:38:41 -0400
I am extremely pleased to announce the first release of RMGPT, the
"Rubbish, I asked for mine with Minced Garlic, Please Take this back"
release [1]. RMGPT is (or rather, aspires to one day be) a complete,
portable implementation of IEEE Std 1003.1-2001 threads as known as
POSIX threads.
A project of this sort does not write and test itself. First, I would
like to thank Roland: without his encouragement this past July to
complete and port my minimal pthread implementation to Mach, RMGPT
would never have become a reality. I am also thankful for the
feedback I have received since the first alpha release: it has been
nothing but positive and encouraging.
Overview
========
This library can happily coexist with cthreads. It provides all of
the call backs that glibc expects from the current implementation of
cthreads, including the initialization code, mutexes, stream locking
and thread specific data in an ABI compatible way. This library
requires *no* changes to glibc. This should make the migration path
from cthreads to pthread fairly painless and permit users to start to
take advantage of libpthread in the very near future.
All functions--both optional and required by POSIX threads--are
present in at least stub form. Nearly all are implemented and tested.
This does not mean that they are bug free: we still need testers and
more test cases. If you are looking to help, this is an excellent way
to get started.
There are several deficiencies that I hope to correct in the near
future. The following list details all of the problems that I am
aware of; I am certain that there are more.
Interfaces
----------
All of the definitions are either in or the appropriate
"bits" file. There are two definitions which are present in
which do not belong there: pthread_kill and
pthread_sigmask. POSIX states that these function definitions shall
be made available by including . Without actually changing
the installed , this is difficult. This should not pose a
real problem for general usage--at best there may be a warning or two.
pthread_atfork
--------------
This is currently unimplemented. It either requires hooks in glibc
(c.f. libc/sysdeps/mach/hurd/fork.c) or a custom fork implementation
that wraps the fork in the underlying libc using dlopen et al. The
latter is clearly less intrusive and more portable, however, I am not
sure of its affects on static linking. I will implement this later
once I understand the problem a bit better. As the stub for
pthread_atfork is already present, a future implementation should have
no impact on the ABI.
Static Linking
--------------
Because most functions are implemented in individual files, there must
be an explicit dependency to pull a given function in. Since
`-lpthread' normally appears before `-lc' on the link line, the
initialization routines and cthread compatibility are not added to the
final executable. To get around this, `-u_cthread_init_routine' et
al. can be added, however, there must be a more elegant way to do this
perhaps using a linker script that undefines the appropriate symbols
and then pulls in libpthread.
Versioning
----------
I have not yet written a versioning script. I feel that using
HURD_CTHREADS_0.3 is wrong. Additionally, the symbol we choose may
depend on where libpthread will be included.
Scheduling and Thread Priorities
--------------------------------
There is no support for any of this. All of the following functions
are stubs:
pthread_attr_getinheritsched
pthread_attr_setinheritsched
pthread_attr_getschedparam
pthread_attr_setschedparam
pthread_attr_getschedpolicy
pthread_attr_setschedpolicy
pthread_attr_getscope
pthread_attr_setscope
pthread_mutexattr_getprioceiling
pthread_mutexattr_setprioceiling
pthread_mutexattr_getprotocol
pthread_mutexattr_setprotocol
pthread_mutex_getprioceiling
pthread_mutex_setprioceiling
pthread_setschedprio
pthread_getschedparam
pthread_setschedparam
I really need some hints on how to proceed with the implementation of
these functions. At the moment, it seems that we either have to
maintain the wait queues in priority order (which means a lot of
possible shuffling if the priorities change) or multiple queues. Can
the kernel help in any way here? This is further complicated as we do
not implement most scheduling functions at the process level, for
instance, sched_setscheduler, sched_setparam and sched_getparam are
stubs which return ENOSYS.
Stacks
------
Stacks default to two megabytes; we may want to reduce this to 64k
which is what cthreads currently uses. This value cannot be changed
as we use the Hurd TSD implementation which requires fixed sized
stacks to function correctly. To move away from this model would
require reorganization in glibc (c.f. ). Despite
this, user stacks can be used (by calling pthread_attr_getstackaddr,
pthread_attr_setstackaddr, pthread_attr_getstack,
pthread_attr_setstack, pthread_attr_getstacksize and
pthread_attr_setstacksize) as long as the supplied stack is of the
correct size and has the appropriate alignment.
Stack guards are also supported. They are tested and appear to work.
Process Shared Attribute
------------------------
Currently, there is no real support for the process shared attribute.
Spin locks should, however, work as we just use a test and set, yield
loop. On the other hand, barriers, conditions, mutexes and rwlocks
signal wakeups by queuing messages on ports whose names are process
local.
One solution I have consider in passing is to hash to local data using
the address of the shared data structure as the key. The first thread
that blocks per process would spin on the shared memory area; all
others would block as normal. When the resource became available, the
first thread would signal the other local threads as necessary.
Alternatively, there could be an external server. This may, however,
open a large security whole; I need to consider it more.
Cancelation
-----------
The only cancelation points are in pthread_join, pthread_cond_wait,
pthead_cond_timedwait and pthread_testcancel. I need to do some more
research to determine if attaching a function to
hurd_sigstate->cancel_hook (c.f. ) will provide the
desired semantics. If not, we must either wrap some functions using
dlopen or integrate with glibc.
The implementation of asynchronous cancelation injects a new IP into
the canceled thread which runs the cancelation handlers in its
context (c.f. sysdeps/mach/hurd/pt-docancel.c) and then calls
pthread_exit. The handlers need to have access to the stack as they
may use local variables. I think that the current method may leave
the frame pointer in a corrupted state if the thread was in, for
instance, the middle of a function call. I would like third party
confirmation that the implementation is in fact robust.
Waking Up
---------
When a thread blocks, it adds itself to a queue attached to the
particular resource it is interested in. It then waits for a message
on a thread local port. To wake it up, a thread queues a message on
the waiter's port. If the wakeup is a broadcast wakeup
(e.g. pthread_cond_broadcast, pthread_barrier_wait and
pthread_rdlock_unlock), the waker thread must send N messages where N
is the number of waiting threads on the queue. If all the threads
instead receive on a lock local (i.e. as opposed to thread local) port
then the thread which eventually does the wakeup needs to do just one
operation, mach_port_destroy, to wakeup all of the waiting threads
(they would get MACH_RCV_PORT_DIED back from mach_msg). Note that
there is a trade off: the port must be recreated. This needs to be
implemented, tested and benchmarked.
This approach sounds nice until we consider scheduling priorities:
there may be a preference for certain threads to wakeup before others
(especially if we are not doing a broadcast, for instance,
pthread_mutex_unlock and pthread_cond_signal). If the outlined
approach is taken, the kernel chooses which threads are awakened. If
we find that the kernel makes the wrong choices, we can still beat it
by merging the two algorithms and having a list of ports sorted in
priority order. The waker could then mach_port_destroy or queue a
message on a port as appropriate.
Barriers
--------
Barriers may be slow as the contention can be very high. The waiting
algorithm presented above may offer substantial gains. The
improvement may be further augmented with an initial number of spins
and yields: it is expected that all of the threads reach the barrier
within close succession, thus just queuing a message may turn out to
be more expensive. This needs to be implemented and benchmarked.
Clocks
------
pthread_condattr_setclock permits a process to specify a clock for use
with pthread_cond_timedwait. What is the correct default for this?
Right now, it uses CLOCK_REALTIME, however, the underlying
implementation is really using the system clock (gettimeofday) which,
if I understand correctly, is completely different.
An important question I have is: can we even use other clocks?
mach_msg uses a relative time against the system clock. I am not
aware of a way to override this.
pthread_getcpuclockid just returns CLOCK_THREAD_CPUTIME_ID if defined.
Is this the correct behavior?
Timed Waiting
-------------
pthread_cond_timedwait, pthead_mutex_timedlock,
pthread_rwlock_timedrdlock and pthread_rwlock_timedwrlock all take
absolute times. We need to convert them to relative times for
mach_msg. Is there a way around this? How will clock skew affect us?
Weak Aliases
ROFL!
I love it! I shared this gem with the group here. The ultimate quote on fuzzy schedules. :-)
--Joe
Re: ROFL!
Well, actually it would be a good achievement to be a whole month closer. I've seen several projects ("right on track, thank you Sir") where, one month later, the project was actually three months delayed... Ah, the joys of software development!
RMGPT is not an acronym...
It is an abbreviation. An acronym is an abbreviation that forms a word. RMGPT is not a word, it is a string of letters.
RMGPT on the J2 CDs?
Is there any chance of RMGPT being on the J2 CDs?
Probably not. I'll download it when I have my Hurd box
set up. Great Work Neal (and Roland). I'll give feedback
as soon as I can.
Good for HURD
The lack of ease of installation and ease of use is the main reason i'm not already playing with HURD; having things like Gnome and Mozilla will go a long way to eliminating this. (And no, it's not that i can't cope with tricky installs and interfaces - i managed to install and learn slackware from a set of 5.25 in. floppies back in the kernel 1.0.x days - i just don't have time anymore, i've got a life now!)
Hopefully some day fairly soon i'll be able to burn a CD or three from downloaded images, pop one into a spare pentium, and fiddle around with a basic Hurd system (hopefully complete with online documentation and a decent reader for it) within an hour or two. By the time it's that accessible, it'll be in practical reach for me, and i'd definitely be right into it.
We have lives too
By saying you have a life now youre implying other geeks playing with their shells dont. We create what you play with when you dont have a life.
Ghazan Haider
your life, your call
no, i'm not implying any such thing. i was speaking about my life, which i judge on my own grounds by my own standards - you may very well have different standards for what constitutes "having a life", but that would not have any impact on my judgement call.
info pages, probably. *sigh*
I hate to sound like a troll here, but I'm sure they'll have plenty of 'info' pages. Great docs, horrible reader. Maybe it makes sense if you're an emacs user, but I'm not.
Try pinfo
Try pinfo if you want an easier-to-learn info viewer. It's an info-file viewer, but it uses familiar lynx-like key commands.
http://zeus.polsl.gliwice.pl/~pborys/
wow, thanks!
THANKS!
I just discovered that it came on my RedHat 7.2 and 7.3 machines.
(My main machine is a RH6.0 box, but my wife's computer and my webserver are a little more up-to-date.) The versions of pinfo that come in RH7.2 and 7.3 don't read the GCC info pages correctly, so I updated to 0.6.6p1.
It's much nicer than straight info. THANKYOUTHANKYOUTHANKYOU THANK YOU! No more wacky keybindings, such as "Backspace" for "Help".... Info itself would be much nicer if it just had decent keybindings.
--Joe
Performance
How does this Hurd threading library compare to Linux's new NPTL in terms of speed and efficiency? I've traditionally been skeptical of Hurd, but if it performs significantly better than Linux I may give it a spin.
Performance
How does this Hurd threading library compare to Linux's new NPTL? I've traditionally been skeptical of Hurd, but if it performs significantly than Linux I may give it a spin.
Re: Performance
Ah, the tin god efficiency.
This is the first RMGPT release, and the focus right now is naturally to kill any bugs, and have a complete implementation; it is an error to worry about optimization before you have something stable and correct.
As Knuth put it, "Premature optimization is the root of all evil."
Also, Walfield states in the mini-interview: "The goal right now is to stabilize and get some people to test the code. Then we can concentrate on finishing the scheduling and process shared attributes and worry about optimizations."
Dog slow
The Hurd, right now, is dog slow. And I can't see the first testing release of a threads library be anything other than dog slow, can you? The Hurd's underlying design has a significant overhead compared with monolithic linux, and on todays processors, though this design *theoretically* speeds up certain things, this overhead more than wipes out any speed increase you might see.
If you after speed, stay with your linux, if you are looking to learn a lot about kernel workings and OS design concepts, take a look at the Hurd. It is a fundamentally different thing, with a lot if fundamentaly different ideas. It is not Unix (as in GNU is not UNIX), it is the Hurd, a POSIX compatible OS. It's the POSIX compatibility (which means source code compatible with UNIX and UNIX-like OSes) that makes people thinks it's a UNIX-like in the same way as *BSD, Solaris and even Linux are.
Dog slow... for now
I've been following the Hurd's development for a few months now, and I can speak to your "dog slow" comment.
Firstly, yes, the Hurd's performance on a PC is nowhere near that of Linux.
However, the Hurd is at version 0.2. This is *not* the time to optimize the code.
Furthermore, much of the Hurd's slowness comes from the fact that it is currently implemented on top of the Mach microkernel. Neal is leading plans to port Hurd to L4, which, if you look at papers (www.l4ka.org), is *significantly* faster than Mach.
Also, I'll remind you that we're comparing a multiserver microkernel (Hurd) to monolithic kernels (Linux and *BSD) on single-processor machines, where monolithic kernels are right at home. It's not until you hit multiprocessor machines that the performance benefits of a microkernel really become apparent (this is why scalability hurts GNU/Linux... NT-based Windows systems can scale better on "Big Iron").
NT Scaling?
> It's not until you hit multiprocessor machines that the performance
> benefits of a microkernel really become apparent (this is why
> scalability hurts GNU/Linux... NT-based Windows systems can scale
> better on "Big Iron").
You're saying the Windows NT scales better than Linux on large
systems? Do you have any evidence of this? How large are you
talking here? Does Windows even run on 32 or 64 processor
systems? Linux does (32 at least, I haven't heard about 64).
I find this quite surprising. Even if it is a micro-kernel based,
their are still contention issues (when two or more CPUs try and
execute the same chunk of code) - this occurs - micro-kernel or not.
The schedular is the classic case: all CPUs have to run the scheduler,
so it's a natural bottleneck. The VM systems is almost certainly
another such bottleneck.
In the past year or so, the Linux developers have focused a LOT on
improving scalability, especially in scheduling and memory management.
I'd be *highly* surprised if Windows NT comes anywhere close.
Now, once it's been optimized, maybe the Hurd can scale better than
Linux, but IMO, that's *years* away (if ever).
Also, extreme scaling is not the be-all/end-all of OS design. As
Linux is showing, scaling down into embedded space is probably more
important than scaling up.
IMO, "One OS to rule them all" is a fantasy. Each OS needs to be
taylored for it's target environment. If you want extreme high-end
scaling, then Solaris, IRIX and other heavy weight Unix systems are
probably what you want. They've been there and have been tuned for
that environment for years now.
But if you want very small systems, then there are numerous RTOS and
other very very small OSes that fit your needs. Linux tries to fit
in the middle - it scales down to reasonably small systems and up to
reasonably large ones. There is almost a contstant friction between
the two, but Linus has been doing a good job of managing that and
keeping Linux in the middle ground.
That being said, there is signficant evidence that systems with 32
or more CPUs on them are dead - it's much more efficient (and cheap)
to setup a cluster with 32 (or 3200) stand-alone systems and scale
in that mannor. This leads to Linux even more, since it is tuned for
single or small-multiple CPU systems, which are perfect for clusters.
The costs associated with 32 CPUs in one systems are rarely needed -
there are only a very few specialized apps that need that level of
system. And the failure modes are just too complex.
Anyway, enough conjecture and ranting for now...
Pete
devices & big iron
I agree that 2.6 will beat WinNT on 32 processor machines but
I think the reason NT _used_ to beat Linux was because NT has
a microkernel-ish design.
Right now the HURD doesn't even support 2 cpus. When it does,
I think it will start to scale very well very quickly. The
microkernel design is optimised for highly threaded environments.
I also agree that all this is years off but when it comes to
doing embedded systems I can see the HURD having an easy time
there too. The kernel is very modular, the bare minimum is
very, well, micro.
HURD based GNU systems, IMO, will be One OS To Rule Them All.
Reason? a highly modular design and freedom for anyone to
adapt it to an environment (and copyleft so that everyone
benefits).
You say you believe 32 cpu systems may be dead. Alan Cox
would disagree with you and say that the future is cards
of cpus.
Anyway, enough rebuttal and chatting for now...:p
Ciaran O'Riordan
SMP
I'll try the hurd as soon as it does support at least 2 cpus (because that's how many I have).
These threads, will they "imediately" support SMP when the Hurd itself does? I guess what I'm really asking might be "Are these kernel level threads or userspace?". Is that the right question?
Hurd seems to be making steady progress, I can't wait to try it!
Nt beats linux on 32 cpu's? How about on 2 or 4 cpu's? Are there any benchmarks I can run? I have both win2k and Linux on my personal system. Linux feels a lot faster to me on my system.. But linux also feels a lot faster to me on a single cpu system, so that's pretty inconclusive.
Thanks.
-Hiryu
Re: SMP
pthreads are user-space threads, so they don't take advantage of SMP.
Sure they do.
Who says pthreads can't take advantage of SMP? That's the whole point of the M:N model. M is number of user-space threads, N is the number of top level processes. These N processes can get scheduled on N CPUs.
Indeed, that's what IBM's NGPT does.
--Joe
re: Sure they do
Anyplace i can read up on that? They don't have any papers on the NGPT site.
No good docs, unfortunately, but there is the code itself
The only reference I've found so far was a mailing list message from one of the NGPT developers indicating as much as what I said.
The docs in NGPT are largely unmodified from the original GNU Pth
documentation. GNU's Pth implementation explicitly did NOT support
SMP, as you noted.
The following code in pth_lib.c in NGPT 2.0.3 seems to corroborate that IBM's work on Pthreads added M:N scheduling that supports multiple CPUs. Note the 'begin ibm' comment, which seems to indicate this is exactly where NGPT deviates from the original GNU Pth.
/*begin ibm*/ pth_threads_per_native = 1; pth_max_native_threads = 0; pth_number_of_natives = 1; /* determine the number of native threads per cpu. */ c_ratio = getenv("MAXTHREADPERCPU"); if (c_ratio != NULL) { long ratio = strtol(c_ratio, (char **)NULL, 10); if (errno != ERANGE) pth_threads_per_native = (int)ratio; } /* * See if the MAXNATIVETHREADS environment variable is set. * We'll use this instead of the number of cpus if this * is set since the user wants to override the default behavior * which is based on the number of CPUs in the host. */ c_numcpus = getenv("MAXNATIVETHREADS"); if (c_numcpus != NULL) { long numcpus = strtol(c_numcpus, (char **)NULL, 10); if (errno != ERANGE) pth_max_native_threads = (int)numcpus; } /* * We check to see if we've gotten an override... * If not, we'll base it off of CPU and set a * max number of threads per cpu to 1. */ if (pth_max_native_threads == 0) { pth_max_native_threads = sysconf(_SC_NPROCESSORS_CONF); pth_threads_per_native = 1; cpu_based = 1; } if (pth_max_native_threads > 1) { pth_main->boundnative = &pth_first_native; pth_max_native_threads++; }That code snippet seems to pretty clearly indicate that NGPT is setting up multiple native threads. Basically, as long as you have more than one native thread, you can likely take advantage of SMP. (Unless, of course, you only let one native thread run at a time. ;-) )
--Joe
Hurd Scalability
Remember that the Hurd is a set of servers, it is not a kernel or a microkernel.
The Hurd scales to as many processors as the microkernel on which it is running. Currently, the Hurd is only implemented for GNU Mach, so it scales to... 1 IA32 CPU.
GNU isn't doing any work on GNU Mach. The plan is to move to the University of Utah's OSKIT Mach once it is completed. Neal Walfield (same guy who's doing these pthreads) is in charge of plans to port the Hurd to the Pistachio version of the L4 microkernel once 1. Pistachio is officially released, and 2. the Hurd has stabilized on Mach.
server
I understand what you mean by servers, but my point is that at some
point, a *single* "something" has to make some decision (e.g.
scheduling). If it's running on a single CPU, as a single thread,
then that's all well and good.
However, if it's running on multiple CPUs, then the scheduling
"server" is either single threaded or not. If it's single threaded,
then it is the blocking factor and scaling sucks. If it's multiple
threaded, then it still has data structures which must be protected
by mutex, and scaling sucks.
Unless I'm missing something micro-kernel or monolithic-kernel
doesn't matter in these situations. The only thing that matters
is the ability to make these decisions as efficiently as possible,
with the minimum amount of contention among CPUs (seperate per-CPU
data structures is obviously the best as that minimizes the CPU's
fighting over the data). But per-CPU data structures are not
practical for all data (again, scheduling is, at some level,
"global" activity).
As the number of CPUs scale, these problems become worse, till
you get to the point at which the locking traffic on the CPUs
dominates any work getting done.
If you see systems with extreme numbers of CPUs (i.e. IMO, anything
over about 64), then they are almost certainly NUMA type systems,
or some other esoteric design (hyper-cube style), but in any case,
they are not "mainstream" and never will be as they are too expensive
and limited in usefulness.
Anyway, no one's gonna read this comment anyway :-/.
Pete
Cards full of CPUs
Indeed, "cards full of CPUs" are already out there. Cellular basestations have cards with ~50 DSPs on them, each running their own local RTOS instance, and all of them communicating to some master controller. They're very good at the job they do. Just because your devices are all on one card doesn't mean you're going to run only one instance of the OS.
--Joe