Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer

Previous thread: e100 firmware in 2.6.29-rc7? by Ben Greear on Friday, March 13, 2009 - 1:16 pm. (7 messages)

Next thread: [RFC] ipv4: add link_filter sysctl by Stephen Hemminger on Friday, March 13, 2009 - 4:12 pm. (4 messages)
From: David Miller
Date: Friday, March 13, 2009 - 2:02 pm

From: Tom Herbert <therbert@google.com>

If the hash is good is will distribute the load properly.

If the NIC is sophisticated enough (Sun's Neptune chipset is)
you can even group interrupt distribution by traffic type
and even bind specific ports to interrupt groups.

I really detest all of these software hacks that add overhead
to solve problems the hardware can solve for us.
--

From: Tom Herbert
Date: Friday, March 13, 2009 - 2:59 pm

I appreciate this philosophy, but unfortunately I don't have the
luxury of working with a NIC that solves these problems.  The reality
may be that we're trying to squeeze performance out of crappy hardware
to scale on multi-core.  Left alone we couldn't get the stack to
scale, but with these "destable hacks" we've gotten 3X or so
improvement in packets per second across both our dumb 1G and 10G
NICs.  These gains have translated into tangible application
performance gains, so we'll probably continue to have interest in this
area of development at least for the foreseeable future.
--

From: David Miller
Date: Friday, March 13, 2009 - 3:19 pm

From: Tom Herbert <therbert@google.com>
                         ^^^^^^^^


Do these NICs at least support multiqueue?
--

From: Herbert Xu
Date: Friday, March 13, 2009 - 4:58 pm

I don't think they do.  See the lsat paragraph in Tom's first
email.

I think we all agree that hacks such as these are onlhy useful
for NICs that either don't support mq or if the number of rx
queues is too small.

The question is how much do we love these NICs :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Tom Herbert
Date: Friday, March 13, 2009 - 5:24 pm

Yes, we are using a 10G NIC that supports multi-queue.  The number of
RX queues supported is half the number of cores on our platform, so
that is going to limit the parallelism.  With multi-queue turned on we
do see about 4X improvement in pps over just using a single queue;
this is about the same improvement we see using a single queue with
our software steering techniques (this particular device provides the
Toeplitz hash).  Enabling HW multi-queue has somewhat higher CPU
utilization though, the extra device interrupt load is not coming for
free.  We actually use the HW multi-queue in conjunction with our
software steering to get maximum pps (about 20% more).
--

From: Andi Kleen
Date: Friday, March 13, 2009 - 6:53 pm

The standard wisdom is that you don't necessarily need to transmit
to each core, but rather to each shared mid or least level cache.
Once the data is cache hot (or cache near) distributing it further
in software is comparable cheap.

So this means you don't necessarily need as many queues as cores,
but more as many as big caches.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: David Miller
Date: Friday, March 13, 2009 - 7:19 pm

From: Tom Herbert <therbert@google.com>

This is a non-intuitive observation.  Using HW multiqueue should be
cheaper than doing it in software, right?
--

From: Herbert Xu
Date: Saturday, March 14, 2009 - 6:19 am

Shared caches can play games with the numbers, we need to look
at this a bit more.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Tom Herbert
Date: Saturday, March 14, 2009 - 11:15 am

I suppose it may be counter-intuitive, but I am not making a general
claim.  I would only suggest that these software hacks could be a very
good approximation or substitute for hardware functionality.  This is
a generic way to get more performance out of deficient or lower end
NICs.
--

From: David Miller
Date: Saturday, March 14, 2009 - 11:45 am

From: Tom Herbert <therbert@google.com>

They certainly could.  Why don't you post the current version
of your patches so we have something concrete to discuss?
--

From: Tom Herbert
Date: Monday, March 16, 2009 - 9:53 am

I'll do that.
--

Previous thread: e100 firmware in 2.6.29-rc7? by Ben Greear on Friday, March 13, 2009 - 1:16 pm. (7 messages)

Next thread: [RFC] ipv4: add link_filter sysctl by Stephen Hemminger on Friday, March 13, 2009 - 4:12 pm. (4 messages)