Re: [RFC] support for IEEE 1588

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <netdev@...>
Date: Friday, July 4, 2008 - 9:37 am

Hallo Tavi,

Interesting initiative. I'm employed by Intel and had the chance to do
some exploratory work on software PTP support for Intel's new 82576
Gigabit Ethernet Controller [1], which introduces hardware time stamping
for PTP packets. I modified the open source PTPd so that it uses the
more accurate hardware time stamps instead of time stamps generated by
the Linux IP stack. The advantage was 50x higher accuracy under load.
You can read more about that in a paper [2].

[1] http://download.intel.com/design/network/ProdBrf/320025.pdf
[2] http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf

In order to get these time stamps and read the clock inside the NIC
which generates these time stamps, we had to add ioctl() calls to the
igb driver - not nice and certainly not a suitable long-term solution.
If there is a consensus on a better user space API and the Linux IP
stack gets a general framework for PTP, then perhaps it could also be
used with Intel's new NICs. Note that I'm not speaking in any official
capacity for Intel here, just expressing my own opinion (and hope). I'm
not even in the network team.

I cannot release the PTPd and igb patches right now because that would
require legal approval, but if there is interest I can get that process
started. There's no reason not to do that.

So, let's move on to Tavi's proposal:

On Fri, 2008-07-04 at 01:47 +0300, Octavian Purdila wrote:

I agree.

Currently there is something similar with SO_TIMESTAMP and
SCM_TIMESTAMP, but the problem with those is that only a timeval is
returned, i.e., accuracy is limited to microseconds. To make full use of
hardware time stamps we'll want a timespec with nanoseconds.

We also need something more flexible than SO_TIMESTAMP. Depending on
what the user space program wants to measure, it would be useful to time
stamp
      * the various flavors of PTP packets (v1/v2/802.1as,
        SYNC/DELAY_REQUEST) selectively
      * all packets

The hardware might not be capable of supporting all modes, but at least
the API should support them and provide room for future extensions.

It would be possible to fall back to time stamping using system time if
the hardware is incapable of implementing the requested operation.
Depending on how that fallback is implemented, PTPd's accuracy might be
improved even without any hardware support.


Forgive me my ignorance, can you provide more details how that would
work?

How about adding a new flag for send/sendto/sendmsg() instead of a new
control message?


Sounds a bit complicated to me. The trick currently used by PTPd might
be more elegant and/or require less changes: it enables looping of
outgoing packets with IP_MULTICAST_LOOP. The RX timestamp of the looped
packet is then used as approximation for the TX time stamp of the
original outgoing packet. Clearly this is inaccurate, in particular
under load, but it is very easy to use.

When a driver gets a skb with the request to generate a TX time stamp,
it could send the packet, upon completion obtain the time stamp from the
hardware and feed the packet and the time stamp back to the upper layers
as if it had just been received. Would that work?

The user space then obtains TX time stamps just like RX time stamps and
can use the payload to determine what kind of time stamp it got. That
also avoids the need for special cookies to detect packet loss or
reordering.

So far all that we get out of this is access to the raw time stamps.
There may be some use for that, as Tavi said, but it would be a lot more
interesting if the kernel would transform the raw time stamps into
system time stamps if the user space process wants that. Then it can be
used by a modified PTPd to synchronize the system time inside a cluster
a lot more accurately than it is currently possible with NTP (think
sub-microsecond accuracy instead of milliseconds).

On Fri, 2008-07-04 at 03:42 +0300, Octavian Purdila wrote:

For the paper I tried out two different ways of synchronizing the system
time with the NIC time. The one called "Assisted System Time" could be
implemented relatively easily inside the IP stack: the driver only has
to provide access to the NIC's hardware clock. Then the layer above it
can sample the system time/NIC time offset at regular intervals; when
they drift apart, that drift rate can be tracked as part of the
measurements and be taken into account when transforming from one time
base into the other. The other method ("Two-Level PTP") is more
complicated and didn't bring much benefit.

Bye, Patrick


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] support for IEEE 1588, Octavian Purdila, (Thu Jul 3, 6:47 pm)
Re: [RFC] support for IEEE 1588, Lennart Sorensen, (Wed Jul 9, 11:31 am)
Re: [RFC] support for IEEE 1588, Patrick Ohly, (Fri Jul 4, 9:37 am)
Re: [RFC] support for IEEE 1588, Octavian Purdila, (Fri Jul 4, 8:21 pm)
Re: [RFC] support for IEEE 1588, Patrick Ohly, (Mon Jul 7, 8:34 am)
Re: [RFC] support for IEEE 1588, Stephen Hemminger, (Thu Jul 3, 7:24 pm)
Re: [RFC] support for IEEE 1588, Octavian Purdila, (Thu Jul 3, 7:40 pm)
Re: [RFC] support for IEEE 1588, Rick Jones, (Thu Jul 3, 8:15 pm)
Re: [RFC] support for IEEE 1588, Andi Kleen, (Fri Jul 4, 7:24 am)
Re: [RFC] support for IEEE 1588, Octavian Purdila, (Thu Jul 3, 8:42 pm)