Re: [PATCH] Revised timerfd() interface

Previous thread: PROBLEM: Caught SIGFPE exceptions aren't reset by Clark Cooper on Friday, August 24, 2007 - 9:40 pm. (2 messages)

Next thread: [git patches] net driver fixes by Jeff Garzik on Saturday, August 25, 2007 - 12:51 am. (1 message)
From: Michael Kerrisk
Date: Friday, August 24, 2007 - 11:41 pm

[resend, since LKML didn't like last send.]

Randy,

Thanks for the input.  I will try to send a revised patch after I get back
from holiday.  Some comments below.

Davide -- ping!  Can you please offer your comments about this change, and
also thoughts on Jon's and my comments about a more radical API change
later  in this thread.

Cheers,

Michael


Cheers,

Michael



Okay -- will do for a future revision of this patch.

In the meantime, the most relevant earlier mail thread is this:

http://article.gmane.org/gmane.linux.kernel/559193

The following are also of some relevance:

http://article.gmane.org/gmane.linux.kernel/556043



Yes, that was just a marker to myself  for comments that should eventually




-

From: Davide Libenzi
Date: Thursday, August 30, 2007 - 5:01 am

IMO the complexity of the resulting API (and resulting patch), and the ABI 
change, is not justified by the added value.



- Davide


-

From: Michael Kerrisk
Date: Tuesday, September 4, 2007 - 1:03 am

Neither of the proposed APIs (either my multiplexed version of timerfd()
or Jon's/my idea of using three system calls (like POSIX timers), or
the notion of timerfd() integrated with POSIX timers) is more
complicated than the existing POSIX timers API.

The ABI change doesn't really matter, since timerfd() was broken in 2.6.22
anyway.

Both previous APIs provided the features I have described provide:

* the ability to fetch the old timer value when applying
  a new setting

* the ability to non-destructively fetch the amount of time remaining
  on a timer.

This is clearly useful for timers -- but you have not explained why
you think this is not necessary for timerfd timers.

Please -- let's do timerfd() better.  Either three syscalls like:

timerfd_create()
timerfd_settime()
timer_gettime()

(the analogs of timer_create(), timer_settime(), timer_gettime()).

Or (if possible, and even better) timerfd() integrated with POSIX timers.

Then we have a good API for the coming decades.  I'm prepared to help
out with patches (for what my help is worth ;-)).

Cheers,

Michael

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.
-

From: Andrew Morton
Date: Tuesday, September 4, 2007 - 1:18 am

<wakes up>

I'd have thought that the existing stuff would be near-useless without the
capabilities which you describe?


-

From: Michael Kerrisk
Date: Tuesday, September 4, 2007 - 1:24 am

Not useless, but certainly less functional than it can/should be
(and with not too much effort on the kernel implementation side).

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Tuesday, September 4, 2007 - 8:25 am

Useless like it'd be a motorcycle w/out a cup-holder :)
Seriously, the ability to get the previous values from "something" could 
have a meaning if this something is a shared global resource (like signals 
for example). In the timerfd case this makes little sense, since you can 
create as many timerfd as you like and you do not need to share a single 
one by changing/restoring the original context.
On top of that, the cup-holder addition would cost in terms of API clarity 
(or in terms of two additional system calls in the other case), and in 
terms of kernel code footprint. Costs that IMO are not balanced by the 
added values.



- Davide


-

From: Michael Kerrisk
Date: Tuesday, September 4, 2007 - 8:39 am

However, one can have multipe POSIX timers, just as you can 
have multiple timerfd timers; nevertheless POSIX timers provide

I agree my proposed API is not as clean as it could be, that's why

Or better still, have timerfd() integrated with POSIX tiemrs (if
this is feasible).  This givesus a simple API, exactly one new
syscall, and all of the functionality of the existing POSIX

Not sure what your concern is here.  The total amount of 
new code for all of these options is pretty small.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Tuesday, September 4, 2007 - 3:41 pm

The fact that POSIX defined a certain API in a given way, does not 
automatically mean that every other API has to look exactly like that.

fs/compat.c              |   34 ++++++++--
fs/timerfd.c             |  147 +++++++++++++++++++++++++++++++++++++++--------
include/linux/compat.h   |    3 
include/linux/syscalls.h |    3 
4 files changed, 153 insertions(+), 34 deletions(-)

And the API definition becomes pretty messy. The other way is to add new 
system calls. 120+ lines of code more of new system calls wouldn't even be 
a problem in itself, if the added value was there.
IMO, as I already said, the added value does not justify them.



- Davide


-

From: Michael Kerrisk
Date: Tuesday, September 4, 2007 - 1:49 pm

Davide,

As I think about this more, I see more problems with
your argument.  timerfd needs the ability to get and 
get-while-setting just as much as the earlier APIs.
Consider a library that creates a timerfd file descriptor that
is handed off to an application: that library may want
to modify the timer settings without having to create a
new file descriptor (the app mey not be able to be told about
the new fd).  Your argument just doesn't hold, AFAICS.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Tuesday, September 4, 2007 - 3:44 pm

Such hypotethical library, in case it really wanted to offer such 
functionality, could simply return an handle instead of the raw fd, and 
take care of all that stuff in userspace.
Again, mimicking POSIX APIs doesn't always take you in the right place.



- Davide


-

From: Michael Kerrisk
Date: Tuesday, September 4, 2007 - 5:08 pm

Did I miss something?  Is it not the case that as soon as the
library returns a handle, rather than an fd, then the whole
advantage of timerfd() (being able to select/poll/epoll on 

POSIX may goof in places, but in general it is the result of
many smart people thinking about how to design/standardize APIs.
So the onus is on us to explain why they got this point wrong.
And it is not merely POSIX that did things things in the
way I've described: so did the earlier setitimer()/getitimer().

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Andrew Morton
Date: Wednesday, September 5, 2007 - 5:02 am

<wakes up even more>

Seems that we need to pay some more attention to this, as time is pressing
and we should get this stuff finalised in 2.6.23.

Which means that most of us have some catching-up to do.

Michael, could you please refresh our memories with a brief, from-scratch
summary of what the current interface is, followed by a summary of what you
believe to be the shortcomings to be?  
-

From: Davide Libenzi
Date: Wednesday, September 5, 2007 - 9:14 am

Why? The handle would simply be a little struct where the timerfd fd is 
stored, and a XXX_getfd() would return it.
So my point is, I doubt such functionalities are really needed, and I also 
argue that the kernel is the best place for such wrapper code to go.



- Davide


-

From: Michael Kerrisk
Date: Wednesday, September 5, 2007 - 9:23 am

So what happens if one thread (via the library) wants modify
a timer's settings at the same timer as another thread is 
select()ing on it?  The first thread can't do this by creating
a new timerfd timer, since it wants to affect the select()
in the other thread?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Wednesday, September 5, 2007 - 12:57 pm

It can be done w/out any problems. The select thread will be notified 
whenever the new timer setting expires.


- Davide


-

From: Michael Kerrisk
Date: Wednesday, September 5, 2007 - 3:50 pm

We are going in circles here.  I think you are missing my point.
Consider the following

[[
Thread A: calls library function which creates a timerfd file
descriptor.

Thread B: calls select() on the timerfd file descriptor.

Thread A: calls library function which wants to:
   a) modify timer settings, and retrieve copy of current timer
      settings, and later
   b) restore old timer settings.
]]

This seems a quite reasonable use-case to me, and the existing
interface simply can't support it.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Wednesday, September 5, 2007 - 4:45 pm

"Quite reasonable"? :)
I honestly doubt it, but anyway. Modulo error checking:

struct tfd {
	int fd, clockid;
	struct itimerspec ts;
};

struct tfd *tfd_create(int clockid, int flags, const struct itimerspec *ts) {
	struct tfd *th;
	th = malloc(sizeof(*th));
	th->clockid = clockid;
	th->ts = *ts;
	th->fd = timerfd(-1, clockid, flags, ts);
	return th;
}

void tfd_close(struct tfd *th) {
	close(th->fd);
	free(th);
}

int tfd_getfd(const struct tfd *th) {
	return th->fd;
}

int tfd_gettime(const struct tfd *th, int *clockid, struct itimerspec *ts) {
	*clockid = th->clockid;
	*ts = th->ts;
	return 0;
}

int tfd_settime(struct tfd *th, int clockid, int flags,
		const struct itimerspec *ts) {
	th->fd = timerfd(th->fd, clockid, flags, ts);
	th->clockid = clockid;
	th->ts = *ts;
	return 0;
}

Wrap the get/set with a mutex in case you plan to shoot yourself in a foot 
by doing get/set from multiple threads ;)
So, once again:

- I sincerly doubt the above is common usage/design patters for timerfds

  * timerfds are not a common global resource, ala signals, that requires 
    get+set+restore pattern - you can have many of them set to different 
    times

- Those IMO *very* special use cases can be handled in userspace with few 
  lines of code, *if* really needed



- Davide


-

From: Michael Kerrisk
Date: Wednesday, September 5, 2007 - 11:58 pm

^
(By the way, I assume that back there, there was a missing "not",

You are asserting this in the face of two previous APIs designed 
by people who (at least in the case of POSIX timers) probably 

I'm not sure what problem this code is supposed to solve.  It

This function is *not at all* equivalent to the "get"
functionality of the previous APIs.  The "get" functionality
of POSIX timers (for example) returns a structure that contains
the timer interval and the *time until the next expiration of
the timer* (not the initial timer string, as your code above
does).

So come up with a reliable, race-free way of doing that in
userspace.  Then make it work for both CLOCK_MONOTONIC and
CLOCK_REALTIME timers.  (You'll certainly need to be making
some additional system calls, by the way: at least some

See my comments above in this message.  You may doubt it, but

No!  In the example I described the library is able to create and
control exactly *one* timerfd file descriptor.  It wants to hand
that fd back to the application and then perform arbitrary
manipulations of the timer's settings.  Meanwhile the application
needs to (repeatedly) monitor that one file descriptor in a
select()/poll()/epoll() (and so the library can't just arbitrarily

You have not demonstrated this ;-).  Your userspace code does
not solve the problem.

Best regards,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Davide Libenzi
Date: Thursday, September 6, 2007 - 4:37 pm

No, I don't think I will. Since 1) it's so trivial it's not even 
challenging 2) you know it can be done (I assume) 3) it solves a non-case 
made up to justify an API/kernel-code bloating.
But you don't need *my* signoff. Cook a working patch, make a case for it, 
and gather support and signoffs. I won't be acking it because, as I said 
many times, I think it's useless bloating.



- Davide


-

From: Michael Kerrisk
Date: Sunday, September 9, 2007 - 8:15 pm

Well, it seems to me that the proposed library wrapper code
could start to get rather verbose by the time one adds in

Sorry -- I was probably speaking too rhetorically.  In fact, it's
not completely clear to me that it can (always) be done.  For
example, the answer depends on knowing the clockid.  But
what if the file descriptor has been handed off to some code
that doesn't know the clock type (i.e., isn't accessing the fd
via your suggested library)?

Cheers,


-

From: Denys Vlasenko
Date: Wednesday, September 5, 2007 - 5:12 am

I think at least ability to read remaining time from a timerfd is needed.
--
vda
-

From: Michael Kerrisk
Subject: timerfd redux
Date: Wednesday, September 5, 2007 - 8:32 am

Andrew,

I'll break this up into parts:

1. the existing timerfd interface
2. timerfd limitations
3. possible solutions
     a) Add an argument
     b) Create an interface similar to POSIX timers
     c) Integrate timerfd with POSIX timers

Cheers,

Michael


1: the existing timerfd interface
=================================

In 2.6.22, Davide added timerfd() with the following interface:

returned_fd = timerfd(int fd, int clockid, int flags,
                      struct itimerspec *utimer);

If fd is -1, a new timer is created and started.  The syscall
returns a file descriptor for the timer. 'utimer' specifies
the initial expiration and interval of the timer.
'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
value is relative, unless TFD_TIMER_ABSTIME is specified in
'flags', in which case the initial expiration is specified
absolutely.

If 'fd' is not -1, then the call modifies the existing timer
referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
and 'utimer' can all be modified.  The return value is 'fd'.

The key feature of timerfd() is that the caller can use
select/poll/epoll to wait on traditional file descriptors and
one or more timers.

read() from a timerfd file descriptor (should) return a 4-byte
integer that is the number of timer expirations since the last
read.  (If no expiration has so far occurred, read() will block.)

IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
only a single byte of info was returned by read().  I regard
this as a virtue: it gives us something closer to a blank slate
for fixing the problems described below; furthermore,
arguably at this point we could buy ourselves time by
pulling timerfd() from 2.6.23, and taking more time to get
things right in 2.6.24.

(More details on timerfd() can be found here: 
http://lwn.net/Articles/245533/)

2. timerfd limitations
======================

Unix has two older timer interfaces:

* setitimer/getitimer and

* POSIX timers ...
From: Andrew Morton
Date: Wednesday, September 12, 2007 - 7:39 pm

Sure.  If you're implementing a timeout and you want to reset it, you might
indeed want to know how close the old one was to expiring.

Davide's proposal sounds like an awkward workaround for missing
functionality.



I don't think we'll have this settled and coded in time for 2.6.23.  So I
think the prudent thing to do is to push this back to 2.6.24 and not offer
sys_timerfd() in 2.6.23.

-

From: Michael Kerrisk
Date: Wednesday, September 12, 2007 - 11:14 pm

In the other thread I commented that the userspace solution
starts to look pretty complex, and I doubt that it can


I think this would be wise.  I'd like to talk with Davide some
more about the possibilities.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Michael Kerrisk
Date: Thursday, September 13, 2007 - 1:13 am

Did you want a patch to remove the syscall number for now,
or will you do that?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-

From: Andrew Morton
Date: Thursday, September 13, 2007 - 1:20 am

Please send one over sometime.
-

Previous thread: PROBLEM: Caught SIGFPE exceptions aren't reset by Clark Cooper on Friday, August 24, 2007 - 9:40 pm. (2 messages)

Next thread: [git patches] net driver fixes by Jeff Garzik on Saturday, August 25, 2007 - 12:51 am. (1 message)