[RFC PATCH 01/11] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller

Previous thread: ixp4xx_hss MAX_CHAN_DEVICES by Miguel Ángel Álvarez on Wednesday, November 19, 2008 - 4:17 am. (12 messages)

Next thread: [RFC PATCH 08/11] clocksource: allow usage independent of timekeeping.c by Patrick Ohly on Wednesday, November 19, 2008 - 5:08 am. (6 messages)
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

This patch series was discussed before on linux-netdev ("hardware
time stamping + igb example implementation"). Since then I have
rebased against net-next and addressed all comments sent so far,
except Octavian's suggestion to include more information in the
packet which is bounced back to the application.

As suggested by David, I'm now also including linux-kernel because:
 * patch 2 adds a new user space API (documentation and example
   program included, but no man page patch yet)
 * patch 8 extends the clocksource struct (mostly adds code, but
   also adds one branch to reading the clock, which may affect
   gettimeofday)
 * patch 10 adds generic code for time synchronization (not
   network specific, so people not subscribed to linux-netdev
   might have comments)

The open question on linux-netdev was whether struct sk_buff
should be extended to hold the additional hardware time
stamp. The previous patch avoided that at the cost of much more
complicated code and side effects on normal time stamping.

This patch now adds a 8 byte field unconditionally. The
implementation is a lot more straight-forward. The user space
API was already designed to cover this case, so it remained
unchanged.

There's one unsolved problem, though: time synchronization with
PTP (the use case I'm working on) requires a transformation of
raw hardware time stamps into system time. Currently this is done
at the socket level by finding the device which created the time
stamp and letting it do the transformation. This fails for
incoming packets because skb->rt points to the "lo" device.

Perhaps the interface number can be used to find the real
hardware device. Alternatively the conversion could be done when
generating the time stamp (might also be more accurate), but then
another 8 byte field is needed. Delta encoding won't help because
one cannot assume that hardware time stamps track system time
closely enough.

I'm posting the patch despite this problem so that the discussion
can move ...
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

In __sock_recv_timestamp() the additional SCM_TIMESTAMP[NS] is used. This
has the same value as SO_TIMESTAMP[NS], so this is a purely cosmetic change.
---
 net/compat.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 67fb6a3..6ce1a1c 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -226,14 +226,14 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 		return 0; /* XXX: return error? check spec. */
 	}
 
-	if (level == SOL_SOCKET && type == SO_TIMESTAMP) {
+	if (level == SOL_SOCKET && type == SCM_TIMESTAMP) {
 		struct timeval *tv = (struct timeval *)data;
 		ctv.tv_sec = tv->tv_sec;
 		ctv.tv_usec = tv->tv_usec;
 		data = &ctv;
 		len = sizeof(ctv);
 	}
-	if (level == SOL_SOCKET && type == SO_TIMESTAMPNS) {
+	if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) {
 		struct timespec *ts = (struct timespec *)data;
 		cts.tv_sec = ts->tv_sec;
 		cts.tv_nsec = ts->tv_nsec;
-- 
1.6.0.4

--

From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.

When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.

The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.

TODO: add SO_TIMESTAMPING define also to other platforms
---
 Documentation/networking/timestamping.txt          |  178 ++++++++
 .../networking/timestamping/timestamping.c         |  469 ++++++++++++++++++++
 arch/x86/include/asm/socket.h                      |    3 +
 include/linux/errqueue.h                           |    1 +
 include/linux/sockios.h                            |    3 +
 include/net/timestamping.h                         |   95 ++++
 6 files changed, 749 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/networking/timestamping.txt
 create mode 100644 Documentation/networking/timestamping/timestamping.c
 create mode 100644 include/net/timestamping.h

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
new file mode 100644
index 0000000..3ae60b5
--- /dev/null
+++ b/Documentation/networking/timestamping.txt
@@ -0,0 +1,178 @@
+The existing interfaces for getting network packages time stamped are:
+
+* SO_TIMESTAMP
+  Generate time stamp for each incoming packet using the (not necessarily
+  monotonous!) system time. Result is returned via recv_msg() in a
+  control message as timeval (usec resolution).
+
+* SO_TIMESTAMPNS
+  Same time ...
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

The new sk_buff->hwtstamp is used to transport time stamping
instructions to the device driver (outgoing packets) and to
return raw hardware time stamps back to user space (incoming
or sent packets).

Implements TX time stamping in software if the device driver
doesn't support hardware time stamping.

The new semantic for hardware/software time stamping around
net_device->hard_start_xmit() is based on two assumptions about
existing network device drivers which don't support hardware
time stamping and know nothing about it:
- they leave the skb->hwtstamp field unmodified
- the keep the connection to the originating socket in skb->sk
  alive, i.e., don't call skb_orphan()

Given that hwtstamp is a new field, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
---
 include/linux/netdevice.h |   11 ++++
 include/linux/skbuff.h    |  136 ++++++++++++++++++++++++++++++++++++++++++++-
 net/core/dev.c            |   23 +++++++-
 net/core/skbuff.c         |   72 ++++++++++++++++++++++++
 4 files changed, 239 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 12d7f44..24bea0c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -763,6 +763,17 @@ struct net_device
 	/* for setting kernel sock attribute on TCP connection setup */
 #define GSO_MAX_SIZE		65536
 	unsigned int		gso_max_size;
+
+#define HAVE_HW_TIME_STAMP
+	/* Transforms original raw hardware time stamp to
+	 * system time base. Always required when supporting
+	 * hardware time stamping.
+	 *
+	 * Returns empty stamp (= all zero) if conversion wasn't
+	 * possible.
+	 */
+	ktime_t             (*hwtstamp_raw2sys)(struct net_device *dev,
+						ktime_t hwstamp);
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git ...
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set.  It's disabled if all of these are off.
---
 include/net/sock.h |   37 ++++++++++++++++++++++--
 net/compat.c       |   19 ++++++++----
 net/core/sock.c    |   79 ++++++++++++++++++++++++++++++++++++++++++++--------
 net/socket.c       |   75 ++++++++++++++++++++++++++++++++++++-------------
 4 files changed, 168 insertions(+), 42 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 0a63894..0f71951 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -158,7 +158,7 @@ struct sock_common {
   *	@sk_allocation: allocation mode
   *	@sk_sndbuf: size of send buffer in bytes
   *	@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
-  *		   %SO_OOBINLINE settings
+  *		   %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
   *	@sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
   *	@sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
   *	@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
@@ -488,6 +488,13 @@ enum sock_flags {
 	SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */
 	SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
 	SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+	SOCK_TIMESTAMPING_TX_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_TX_HARDWARE */
+	SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_TX_SOFTWARE */
+	SOCK_TIMESTAMPING_RX_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RX_HARDWARE */
+	SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RX_SOFTWARE */
+	SOCK_TIMESTAMPING_SOFTWARE,    /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_SOFTWARE */
+	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RAW_HARDWARE */
+	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_SYS_HARDWARE */
 };
 
 static inline void ...
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
---
 include/net/ip.h     |    1 +
 net/can/raw.c        |    6 ++++++
 net/ipv4/icmp.c      |    2 ++
 net/ipv4/ip_output.c |    2 ++
 net/ipv4/raw.c       |    1 +
 net/ipv4/udp.c       |    4 ++++
 6 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index bc026ec..9bc2b65 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -55,6 +55,7 @@ struct ipcm_cookie
 	__be32			addr;
 	int			oif;
 	struct ip_options	*opt;
+	union sk_buff_hwtstamp	tstamp_tx;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/can/raw.c b/net/can/raw.c
index 6e0663f..d4a38e3 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -618,6 +618,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
 	struct raw_sock *ro = raw_sk(sk);
 	struct sk_buff *skb;
 	struct net_device *dev;
+	union sk_buff_hwtstamp tstamp_tx;
 	int ifindex;
 	int err;
 
@@ -639,6 +640,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
 	if (!dev)
 		return -ENXIO;
 
+	err = sock_tx_timestamp(msg, sk, &tstamp_tx);
+	if (err < 0)
+		return err;
+
 	skb = sock_alloc_send_skb(sk, size, msg->msg_flags & MSG_DONTWAIT,
 				  &err);
 	if (!skb) {
@@ -654,6 +659,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
 	}
 	skb->dev = dev;
 	skb->sk  = sk;
+	skb->hwtstamp = tstamp_tx;
 
 	err = can_send(skb, ro->loopback);
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 21e497e..ba739f4 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	inet->tos = ip_hdr(skb)->tos;
 	daddr = ipc.addr = rt->rt_src;
 	ipc.opt = NULL;
+	ipc.tstamp_tx.hwtstamp.tv64 = 0;
 	if (icmp_param->replyopts.optlen) {
 		ipc.opt = &icmp_param->replyopts;
 		if (ipc.opt->srr)
@@ -532,6 +533,7 @@ void ...
From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

---
 fs/compat_ioctl.c |    1 +
 net/core/dev.c    |    2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 5235c67..a5001a6 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -2555,6 +2555,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc)
 HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc)
 HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc)
 HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc)
+HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc)
 
 /* ioctls used by appletalk ddp.c */
 HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc)
diff --git a/net/core/dev.c b/net/core/dev.c
index b4b8eb8..4f61f5e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3650,6 +3650,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 			    cmd == SIOCSMIIREG ||
 			    cmd == SIOCBRADDIF ||
 			    cmd == SIOCBRDELIF ||
+			    cmd == SIOCSHWTSTAMP ||
 			    cmd == SIOCWANDEV) {
 				err = -EOPNOTSUPP;
 				if (dev->do_ioctl) {
@@ -3805,6 +3806,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 		case SIOCBONDCHANGEACTIVE:
 		case SIOCBRADDIF:
 		case SIOCBRDELIF:
+		case SIOCSHWTSTAMP:
 			if (!capable(CAP_NET_ADMIN))
 				return -EPERM;
 			/* fall through */
-- 
1.6.0.4

--

From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 5:08 am

---
 drivers/net/igb/igb_main.c |   30 ++++++++++++++++++++++++++++++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 89ffc07..be8e2b8 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -34,6 +34,7 @@
 #include <linux/ipv6.h>
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
+#include <net/timestamping.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
 #include <linux/if_vlan.h>
@@ -4115,6 +4116,33 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 }
 
 /**
+ * igb_hwtstamp_ioctl - control hardware time stamping
+ * @netdev:
+ * @ifreq:
+ * @cmd:
+ *
+ * Currently cannot enable any kind of hardware time stamping, but
+ * supports SIOCSHWTSTAMP in general.
+ **/
+static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+	struct hwtstamp_config config;
+
+	if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+		return -EFAULT;
+
+	/* reserved for future extensions */
+	if (config.flags)
+		return -EINVAL;
+
+	if (config.tx_type == HWTSTAMP_TX_OFF &&
+		config.rx_filter_type == HWTSTAMP_FILTER_NONE)
+		return 0;
+
+	return -ERANGE;
+}
+
+/**
  * igb_ioctl -
  * @netdev:
  * @ifreq:
@@ -4127,6 +4155,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 	case SIOCGMIIREG:
 	case SIOCSMIIREG:
 		return igb_mii_ioctl(netdev, ifr, cmd);
+	case SIOCSHWTSTAMP:
+		return igb_hwtstamp_ioctl(netdev, ifr, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
1.6.0.4

--

From: Patrick Ohly
Date: Wednesday, November 19, 2008 - 8:21 am

Before someone else mentions it: this was meant to be "sizeof(*serr)" of
course.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

--

From: Oliver Hartkopp
Date: Wednesday, November 26, 2008 - 11:14 pm

Hello Patrick,



I still have two questions of understanding regarding these (offset) 
transformations that make it still difficult:

1. As no one has an insight of how the specific hardware generates the 
hw time stamp anyway, why can't you put the already transformed values 
into the hw timestamp field? I won't stick on the fact that the hw time 
stamp is the value that has been read from specific controller registers 
that are very hardware depended.

2. From what i've read in the discussion, i understood that the hardware 
clock and the system clock skew. Assuming both to be strong 
monotonic(?), is there a linear skew observable from your testing 
experiences?

If so i would suggest to have some kind of sysfs entry like e.g.

/sys/class/net/eth0/hwstamp_skew_ns

that gives something like

-86234765@12277656012584334095

as output. In general:

<skew_in_ns>@<systemtime>

So if you would check this sysfs entry two or more times, you would get 
an impression about the hw time stamp behaviour of your hardware by 
simply calculating the linear (timedepended) offset based on several 
'skew-sample-points', right?

I hope these suggestions are not completely damaged ;-)

Best regards,
Oliver

--

From: Patrick Ohly
Date: Thursday, November 27, 2008 - 3:07 am

It certainly would be sufficient for PTP with the "assisted" PTP mode of
synchronizing system time. But my understanding is that Octavian and
possibly others want to have access to the raw, unmodified hardware time
stamps. Octavian, perhaps you can confirm/elaborate what your use case
is?

I should point out that currently the "raw" hardware time stamp is
already "cooked" a bit in the sense that the new field is spec'ed to
contain nanoseconds, not some opaque 64 bit blob. This is intentional
because even though the time when the clock started to run is unknown,
at least deltas can be calculated. The expectation is that this delta is
close to world and system time, with some inaccuracies caused by using
different crystals to drive the underlying hardware and thus varying
frequencies.

The "two-way" PTP mode could use these values to synchronize NIC clocks
(using a yet to be defined ioctl call which speeds up or slows down NIC
clocks, like adjtimex does for system time) and system time on top of
that NIC time. At the NIC level synchronization is going to be more
accurate than at the system level because it removes the NIC/system time
comparison from the equation. I believe this is similar to what Octavian

The clock drift changes over time, so on a big scale the skew is not
linear. For short periods (minute range) it is reasonable to assume that
the drift is constant, so a linear interpolation works reasonably well.
For your proposal that means that the user application needs to reread
the sysfs file at least once in a while, without really knowing how
often that should be.


This would work, but it exposes internal mechanisms (skew measurement)
to user space. If a driver developer has a better method that is more
suitable for his hardware (perhaps something non-linear or based on more
frequent skew measurements), then the application doesn't benefit from
it.

To summarize, I see the following options at this time:

1  only store transformed time stamp...
1a ... ...
From: Octavian Purdila
Date: Thursday, November 27, 2008 - 7:02 am

From: Patrick Ohly <patrick.ohly@intel.com>

We use the hardware timestamp to determine one-way delay:
- hardware synchronize the NIC clocks (20ns resolution)
- insert the hw timestamp in the packet as its going out the wire (this is 
done in hw)
- received packets will be tagged with a hardware timestamp

One way delay is computed by subtracting the TX hw timestamp from the RX hw 

I also vote for 3 (storing hw timestamps in the skb). 

Let me throw in another idea: when enabling hw timestamps could we allocate a 
bigger skb and store the hw timestamp somewhere in the skb data buffer? 

We can then modify sock_recv_timestamp to call a new netdev method which 
should return the hw timestamp. This should take care of RX hw timestamps.

For TX, we can we get a new skb with skb_copy_expand  to make room for the TX 
timestamp, and queue that in the error socket. 

There is still the problem of requesting TX timestamps per packets. At this 
point it seems that on the TX path the tstamp field is not used, so we could 
use that space. 

Or, maybe we can use the same dynamic approach: can we modify the 
hard_header_len after device registration (e.g. when TX timestamps are 
enabled)?

Thanks,
tavi





--

From: Patrick Ohly
Date: Thursday, November 27, 2008 - 8:31 am

How does the socket layer detect that the HW timestamp is available in

Finding the netdev is non-trivial, see David's comment about the current
hacky approach via the route. Besides, if the time stamp is in the skb

Agreed in general, but there was one corner case where the tstamp field

I don't know.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

--

From: Octavian Purdila
Date: Thursday, November 27, 2008 - 11:53 am

From: Patrick Ohly <patrick.ohly@intel.com>

It doesn't know, thats why we need help from the device (the new netdev 

OK I see now, it seems that we can't get to the device in a generic way. 

How about this twist: we add a new option at the socket level, to get the 
whole skb->head - skb->end data into a user buffer. Then, we call an device 
ioctl and pass this buffer. The device will extract the hw timestamp and give 
it to the user. 

use the convention that the device driver must put the timestamp below the mac 
header.

One potential problem I see with this approach is leaking sensitive 
information into userspace, which means we will have to restrict this to 
privileged processes only.

tavi

--

From: Oliver Hartkopp
Date: Thursday, November 27, 2008 - 3:13 pm

4 would be mine.

I assume, we would get kicked somewhere when we are trying to push 

Ugh.

Not every protocol that uses skbuffs, has a mac header (e.g. the CAN 
protocol doesn't have mac addresses). This twist does not look very 
maintainable to me ...

One additional question for Patrick:
As you wrote that your hw timestamp contained in the new skbuff-field is 
already cocked ... is there any identifier that tells the userspace 
application about the type of hw timestamp he gets (e.g. cocked, raw 
registers, offset to whatever, etc.) ?

Regards,
Oliver

--

From: Octavian Purdila
Date: Friday, November 28, 2008 - 5:55 am

From: Oliver Hartkopp <oliver@hartkopp.net>

OK, then what about this: we use a device ioctl to get the number of bytes to 
copy from skb->head. We then pass this to the socket level option.

This is more complicated than option 3 or 4, but it should address the 
concerns raised here - no performance impact when not using this feature. The 

Could you elaborate on the maintainability issues, they are not clear to me.


Thanks,
tavi


--

From: Oliver Hartkopp
Date: Friday, November 28, 2008 - 8:38 am

Do you know what all the people are doing with skbuffs in the kernel? 
I'm not aware of it.
So whenever we touch a vital data structure like the skbuff we must be 
sure that our handling is a wise approach for all.

Btw. your answer brought me to a completely different approach:

What about just creating a new pointer in the struct skbuff that points 
to a struct hwstamp when it is available OR the pointer is NULL when no 
hwstamps are available.

This struct hwstamp may consist of 'everthing we really need for 
timestamping' and is added somewhere at the head or the tail of the 
skbuff data section by the device driver.
And if the socket sees this pointer to be not NULL it knows where to 
look for our fancy struct hwstamp ...

I don't know, where this hwstamp data could be pasted into the data 
section in the best way - but surely others know. Especially this 
approach meets the requirement that the additional hwstamp data is only 
allocated (per interface!), when it's really needed - and not for 
everyone by default.

What do you thing about this idea?

Regards,
Oliver
--

From: Octavian Purdila
Date: Friday, November 28, 2008 - 9:00 am

From: Oliver Hartkopp <oliver@hartkopp.net>

I agree. But what in this approach do you think its not wise? :)

The skb is not touched at all, we just need to allocate a larger skb when 

That is still adding a member into the skb. For what I am proposing we don't 
have to touch the skb and it has all of your approach benefits. 

I would rather use this approach since it allows us to "extend" the skb in 
this manner int the future for other hardware features which requires passing 
per-skb information to userspace.

Thanks,
tavi




--

From: Patrick Ohly
Date: Monday, December 1, 2008 - 3:37 am

That just pushes the problem into the device driver: when it gets passed
a skb pointer, how can it tell reliably that the data buffer contains
the HW timestamping information?

I agree with Oliver, this approach doesn't look maintainable to me. I
don't know enough about the kernel to tell whether it works reliably at
the moment, much less whether it will work in the future.


In the proposed API the userspace application gets three time stamps:
software, "cooked" hardware time stamp (converted to nanoseconds by the
driver, but not tampered with in any other way), hardware time stamp
converted to system time. Each of these may be missing (not available,
couldn't be calculated). So yes, the userspace application knows what it
got and can pick the value that it needs.


I like this better than tampering with the data buffer pointers
implicitly because it enables usages of the additional information
inside the kernel itself. It's similar to skb_shared_info, except that
it is not allocated for all skbs.

The skb_shared_info is always at the end of the data buffer. Assume that
we have a new __netdev_alloc_hw_skb() which increases the allocated data
buffer to make room for the additional struct hwtstamp (either before
skb_shared_info or after). I cannot think of a way how the rest of the
kernel can tell that this additional data is available by just looking
at the existing head/data/end fields in a skb - if I missed something,
please let me know.

So it seems to me that we need the additional 32 bit offset (or pointer,
on 32 bit architectures) in skb which points towards the struct
hwtstamp. But that's actually less than the additional 64 bit which hold
the time stamp value, as in the current patch. I'll give it a few more
days for further debate, then try out this approach.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the ...
From: Patrick Ohly
Date: Monday, December 1, 2008 - 9:31 am

It doesn't even need a 32 bit offset. If it is always at the same
location in the buffer (e.g., directly after skb_shared_info), then a
single bit is sufficient. The same mechanism could be used to also store
other optional structs/fields.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

--

From: Oliver Hartkopp
Date: Monday, December 1, 2008 - 9:45 am

That's really fine. Sorry that i did not go that deep into your code 

I'm not very familiar with skb fields but hiding this with a new 
__netdev_alloc_hw_skb() looks very good to me and creates a proper way 
to add hw-specific netdevice information in a generic way and - that's 

I'll be on a business trip until Thursday, so from my side you would get 
a 'go ahead' right now ;-)

Best regards,
Oliver

--

Previous thread: ixp4xx_hss MAX_CHAN_DEVICES by Miguel Ángel Álvarez on Wednesday, November 19, 2008 - 4:17 am. (12 messages)

Next thread: [RFC PATCH 08/11] clocksource: allow usage independent of timekeeping.c by Patrick Ohly on Wednesday, November 19, 2008 - 5:08 am. (6 messages)