Andrew, The timerfd() syscall went into 2.6.22. While writing the man page for this syscall I've found some notable limitations of the interface, and I am wondering whether you and Linus would consider having this interface fixed for 2.6.23. On the one hand, these fixes would be an ABI change, which is of course bad. (However, as noted below, you have already accepted one of the ABI changes that I suggested into -mm, after Davide submitted a patch.) On the other hand, the interface has not yet made its way into a glibc release, and the change will not break applications. (The 2.6.22 version of the interface would just be "broken".) Details of my suggested changes are below. A complication in all of this is that on Friday, while I was part way though discussing this with Davide, he went on vacation for a month and is likely to have only limited email access during that time. (See my further thoughts about what to do while Davide is away at the end of this mail message.) Our last communication, after Davide had expressed reluctance about making some of the interface changes, was a more extensive note from me describing the problems of the interface. The problems of the 2.6.22 timerfd() interface are as follows: Problem 1 --------- The value returned by read(2)ing from a timerfd file descriptor is the number of timer overruns. In 2.6.22, this value is 4 bytes, limiting the overrun count to 2^32. Consider an application where the timer frequency was 100 kHz (feasible in the not-too-distant future, I would guess), then the overrun counter would cycle after ~40000 seconds (~11 hours). Furthermore returning 4 bytes from the read() is inconsistent with eventfd file descriptors, which return 8 byte integers from a read(). Davide has already submitted a patch to you to make read() from a timerfd file descriptor return an 8 byte integer, and I understand it to have been accepted into -mm. Problem 2 --------- Existing timer APIs (Unix interval timers -- ...
I think if the need is sufficient we can do this: fix it in 2.6.23 and in 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels argh. Nobody told me it was an ABI change! We'll need to consider merging make-timerfd-return-a-u64-and-fix-the-__put_user.patch into 2.6.22.x as well. -
So I'm trying to write a halfway respectable description of that patch and I'm stuck when it comes to describing what will happen if someone tries to run a future timerfd-enabled glibc on 2.2.22 base. In what manner will it misbehave? What are the consequences of this decision? -
The difference would be that read() will only return 4 bytes, while the application will expect 8. If the application is checking the size of returned value, as it should, then it will be able to detect the problem (it could even be sophisticated enough to know that if this is a 4-byte return, then it is running on an old 2.6.22 kernel). If the application is not checking the return from read(), then its 8-byte buffer will not be filled -- the contents of the last 4 bytes will be undefined, so the u64 value as a whole will be junk. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. -
Andrew, So I'm still not quite clear. Can I take it from your statement above that the proposed ABI changes would be admissible, as long as Davide is okay with them? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. -
On Wed, 25 Jul 2007 20:18:51 +0200 yup, I'll send that diff into Linus and -stable and see what happens. -
Andrew, I'm working on the changes to timerfd(), but must admit I am struggling to understand some of the kernel code for working with userspace timers (e.g., in kernel/posix-timers.c). Can you suggest anyone who could provide assistance? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. -
The code was originally contributed by George Anzinger. He has moved on and Thomas Gleixner now does most work in there. I'd expect that Thomas is your guy, but he is nearing the end of a two-week vacation and will presumably return to an akpm-sized inbox, so I'd give him a little time ;) -
Yes, I pinged George a week or so back, but got no response. Thanks -- Thomas CCed on this reply. Unfortunately I myself go on holiday on Saturday for two weeks. All of these holidays (I guess that Davide is only back in another 1.5 weeks or so) start to make the chances of getting change in during the .23-rc window look tight... Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. -
Andrew,
Here's my first shot at changing the timerfd() interface; patch
is against 2.6.23-rc1-mm2.
I think I got to the bottom of understanding the timer code in
the end, but I may still have missed some things...
This is my first kernel patch of any substance. Perhaps Randy
or Christoph can give my toes a light toasting for the various
boobs I've likely made.
Thomas, would you please look over this patch to verify my
timer code?
Davide: would you please bless this interface change ;-).
It makes it possible to:
1) Fetch the previous timer settings (i.e., interval, and time
remaining until next expiration) while installing new settings
2) Retrieve the settings of an existing timer without changing
the timer.
Both of these features are supported by the older Unix timer APIs
(setitimer/getitimer and timer_create/timer_settime/timer_gettime).
The changes are as follows:
a) A further argument, outmr, can be used to retrieve the interval
and time remaining before the expiration of a previously created
timer. Thus the call signature is now:
timerfd(int utmr, int clockid, int flags,
struct itimerspec *utmr,
struct itimerspec *outmr)
The outmr argument:
1) must be NULL for a new timer (ufd == -1).
2) can be NULL if we are updating an existing
timer (i.e., utmr != NULL), but we are not
interested in the previous timer value.
3) can be used to retrieve the time until next timer
expiration, without changing the timer.
(In this case, utmr == NULL and ufd >= 0.)
b) Make the 'clockid' immutable: it can only be set
if 'ufd' is -1 -- that is, on the timerfd() call that
first creates a timer. (This is effectively
the limitation that is imposed by POSIX timers: the
clockid is specified when the timer is created with
timer_create(), and can't be changed.)
[In the 2.6.22 interface, the clockid of an existing
timer can be changed with a further call to timerfd()
...Hi, OK, I'll make a few comments. 1. Provide a full changelog, including reasons why the patch is needed. Don't assume that we have read the previous email threads. put_compat_itimerspec(old_utmr, &t)) IOW, I think that the parameters are reversed (at least this way --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
Sorry for the late commentary...as I looked this over, one thing popped timerfd() is looking increasingly like a multiplexor system call in disguise. There is no form of invocation which uses all of the arguments. Are we sure we wouldn't rather have something like: timerfd_open(clock); timerfd_adjust(fd, new_time, old_time); ? jon -
Hey there Michael, all, I'm feeling slow, and think I'm missing something. Why is this an issue? Wouldn't userspace just keep track of the last overrun count, and subtract the two to get the overruns-since-last-read? That makes it oblivious to rollovers, unless it can't manage to do a read once every 11 hours. (This is the same sort of thing we already have to deal with in certain situations, such as network stat counters on 32 bit platforms.) Ray -
The value returned by read() is the number of overruns since But userspace can't deal with the condition accurately, so why require userspace to worry about this when we could just use a 64-bit value instead? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. -
Once every 11 hours isn't a terrible price to pay. Once every 11 seconds would be another matter. It doesn't seem unreasonable to expect an application that is already actively dealing with timers to go and take a look at things every hour to see if there have been more Okay, perhaps this is where I'm missing something? If userspace wakes up once every hour, checks the overrun counter against the previous <shrug> I don't have strong feelings either way. It just seemed like something that could already be taken care of with today's interface. Given that the discussion was about an API change between 2.6.22 and 2.6.23, I was looking for options to avoid that. -
In fact I don't have strong feelings on this part of the interface design either. I pointed out the limitation to Davide, and pointed out that the related eventfd interface read()s an 8-byte integer, and Davide then just fired off a patch to Andrew which went into --mm. It's the other problems with the interface that bother me more (inability to retrieve previous setting when changing the timer; inability to retrieve time until next expiration without changing the current setting). Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. -
