FreeBSD: Zero Copy Sockets

Submitted by Jeremy
on June 25, 2002 - 6:21pm

Kenneth Merry recently announced, "I'm planning on checking in the zero copy sockets code Tuesday evening, MDT." He has a web page set up with quite a bit of information for the curious, including a full changelog. The main zero copy patch was written by Drew Gallatin and is mentioned by several of these documents.

The "Zero Copy" patch removes the copying of buffers from the user process into the kernel when sending packets, and the copying of buffers from the kernel into the user process when recieving packets, offering a performance gain. Details on how this is accomplished can be found in the FAQ on the page linked above. The FAQ also explains that there is still another copy that happens, "The DMA or copy from the kernel into the NIC, or from the NIC into the kernel is not the copy that is being eliminated. In fact you can't eliminate that copy without taking packet processing out of the kernel altogether. (i.e. the kernel has to see the packet headers in order to determine what to do with the payload)"

On the same web page you can also find a couple of benchmarks showing as much as "986Mbps throughput over gigabit ethernet with the patches". Unfortunately, lacking is a comparison benchmark without the patches applied. Scrolling all the way to the bottom of the page is a discussion on improving the zero copy implementations.


From: Kenneth D. Merry
To: current AT FreeBSD.ORG
Subject: zero copy code checkin in 2 days, new snapshot
Date: Sun, 23 Jun 2002 23:36:28 -0600

I'm planning on checking in the zero copy sockets code Tuesday evening,
MDT. If there are any concerns, I'm more than willing to delay it.

I've also released a new snapshot, based on -current from June 23rd, 2002:

http://people.freebsd.org/~ken/zero_copy/

The following changes went into this snapshot:

 - Added a zero_copy(9) man page that describes the general characteristics
of the zero copy send and receive code, and what an application author
should do to take advantage of the code.

- Update the ti(4) man page to include information on the ioctl interface
and the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options.

- Added information to NOTES about the ZERO_COPY_SOCKETS,
TI_PRIVATE_JUMBOS,
TI_JUMBO_HDRSPLIT, MSIZE and MCLSHIFT kernel config options.

- ti(4) driver cleanup: cleaned up some unused code, commented out some
stray diagnostic printfs, and added a problem describing the transmit
flow control problem for posterity.

- Added a new jumbo(9) man page that describes the jumbo allocator.

I haven't run this through the usual array of regression tests just yet,
but that will be done before checkin.

(I used the time to run through a buildworld and a LINT build instead.)

Feedback and comments are welcome.

Again, if there are any concerns, I'm more than willing to delay the
checkin.

Ken
--
Kenneth Merry

From: Mike Silbersack
Subject: Re: zero copy code checkin in 2 days, new snapshot
Date: Mon, 24 Jun 2002 01:17:03 -0500 (CDT)

On Sun, 23 Jun 2002, Kenneth D. Merry wrote:

> I'm planning on checking in the zero copy sockets code Tuesday evening,
> MDT. If there are any concerns, I'm more than willing to delay it.

Out of curiousity, what happens when the page being write()n is a mmap'd
page shared by multiple processes? Will the page be shared? That could
be a big reduction in mbuf cluster usage on some http/ftp systems, I'd
guess.

Mike "Silby" Silbersack

From: Kenneth D. Merry
Subject: Re: zero copy code checkin in 2 days, new snapshot
Date: Mon, 24 Jun 2002 00:21:06 -0600

On Mon, Jun 24, 2002 at 01:17:03 -0500, Mike Silbersack wrote:
> Out of curiousity, what happens when the page being write()n is a mmap'd
> page shared by multiple processes? Will the page be shared? That could
> be a big reduction in mbuf cluster usage on some http/ftp systems, I'd
> guess.

The page would be shared, until one of the processes decides to write to it
while it is still referenced in the kernel. If that happens, it'll get
copied.

Ken
--
Kenneth Merry