I have a question about the semantics of wait()/waitpid().
My understanding is, as soon as wait() returns, the process is gone from the
process table, and therefore another fork() on the system could immediately
re-use the same PID. Is that correct?
Now let's suppose I have a program which forks children when it needs them.
It maintains a datastructure which is a hash of { pid => info }
Let's say there's a separate thread which blocks on a wait() call, and once
it has gotten the pid it updates this data structure to remove the entry for
<pid>
Now, it seems to me there is a race condition here: between wait() returning
and the <pid> entry being removed from the data structure, the main program
may have forked off another child with the same <pid>
Protecting the 'wait' and 'fork' threads with a mutex doesn't help. If I
lock the mutex before calling wait() then I prevent all forks for an
indefinite period of time; if I lock the mutex after calling wait() then the
race still exists, as the forking thread may already have the mutex and be
in the process of forking another child with the same pid.
So, what's the best way to handle this? Options I can think of are:
(1) Polling.
- lock mutex
- call waitpid(-1, 0, WNOHANG)
- update the data structure
- unlock mutex
- sleep 100ms
- go back to start
This seems rather icky.
(2) Modify the data structure to allow for the unlikely, but possible,
situation of having two processes with the same PID: one which has just been
reaped, and one which has just been forked. The reap process then removes
the first entry for the PID returned from wait().
This gives a messy datastructure just for handling this edge case.
(3) If there were an option to waitpid() which could tell you the pid of a
terminated process *without* reaping it, then it becomes easy:
- waitpid(-1, 0, WNOWAIT)
- update the data structure to remove the entry for this pid
- waitpid(pid, 0, 0) to remove it from the process table
It looks like Linux has a waitid() call with a WNOWAIT option, but I can't
see anything in the wait manpage for OpenBSD (4.0) which works this way.
Any other suggestions as to the best way to avoid this problem? I'm sure
this must be old ground :-)
Thanks,
Brian.
| Rafael J. Wysocki | 2.6.28-rc3-git6: Reported regressions from 2.6.27 |
| Rafael J. Wysocki | [Bug #11207] VolanoMark regression with 2.6.27-rc1 |
| Matthew Wilcox | [PATCH] Fix boot-time hang on G31/G33 PC |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Jon Smirl | ! [rejected] master -> master (non-fast forward) |
| Jon Smirl | Packfile can't be mapped |
| Sverre Rabbelier | Git vs Monotone |
| Shawn O. Pearce | libgit2 - a true git library |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Douglas A. Tutty | OBSD's perspective on SELinux |
| Girish Venkatachalam | Ethernet jumbo frames? |
| Volker Armin Hemmann | build error with 2.6.27.6+reiser4+ehci-hub patch. ERROR: "mii_ethtool_gset" [drive... |
| Michael Grollman | Re: 8169 Intermittent ifup Failure Issue With RTL8102E Chipset in Intel's New D945... |
| Evgeniy Polyakov | [resend take 2 0/4] Distributed storage. |
| Krzysztof Halasa | Re: [PATCH v2] Re: WAN: new PPP code for generic HDLC |
| serial driver xmit problem | 14 minutes ago | Linux kernel |
| Why Windows is better than Linux | 14 minutes ago | Linux general |
| How can I see my kernel messages in vt12? | 7 hours ago | Linux kernel |
| Grub | 18 hours ago | Linux general |
| vmalloc_fault handling in x86_64 | 1 day ago | Linux kernel |
| epoll_wait()ing on epoll FD | 1 day ago | Linux kernel |
| Framebuffer in x86_64 causes problems to multiseat | 1 day ago | Linux kernel |
| Difference between 2.4 and 2.6 regarding thread creation | 1 day ago | Linux general |
| Netfilter kernel module | 1 day ago | Linux kernel |
| Compiling gfs2 on kernel 2.6.27 | 1 day ago | Linux kernel |
