Re: Network multiqueue question

Previous thread: [RFC][PATCH] xfrm6 refcnt problem in bundle creation by Nicolas Dichtel on Thursday, April 15, 2010 - 9:32 am. (2 messages)

Next thread: Re: [PATCH net-next] net/l2tp/l2tp_debugfs.c: Convert NIPQUAD to %pI4 by James Chapman on Thursday, April 15, 2010 - 11:44 am. (1 message)
From: George B.
Date: Thursday, April 15, 2010 - 9:58 am

I am in need of a little education on multiqueue and was wondering if
someone here might be able to help me.

Given intel igb network driver, it appears I can do something like:

 tc qdisc del dev eth0 root handle 1: multiq

which works and reports 4 bands:  dev eth0 root refcnt 4 bands 4/4

But our network is a little more complicated.  Above the ethernet we
have the bonding driver which is using mode 2 bonding with two
ethernet slaves.  Then we have vlans on the bond interface.  Our
production traffic is on a vlan and resource contention is an issue as
these are busy machines.

It is my understanding that the vlan driver became multiqueue aware in
2.6.32 (we are currently using 2.6.31).

It would seem that the first thing the kernel would encounter with
traffic headed out would be the vlan interface, and then the bond
interface, and then the physical ethernet interface.  Is that correct?
 So with my kernel, I would seem to get no utility from multiq on the
ethernet interface if the vlan interface is going to be a
single-threaded bottleneck.  What about the bond driver?  Is it
currently multiqueue aware?

I am try to get some sort of logical picture of how all these things
interact with each other to get things a little more efficient and
reduce resource contention in the application while still trying to be
efficient in use of network ports/interfaces.

If someone feels up to the task of sending a little education my way,
I would be most appreciative.  There doesn't seem to be a whole lot of
documentation floating around about multiqueue other than a blurb of
text in the kernel and David's presentation of last year.

Thanks!

George
--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 10:47 am

Hi George

Vlan is multiqueue aware, but bonding is not unfortunatly at this
moment.

We could let it being 'multiqueue' (a patch was submitted by Oleg A.
Arkhangelsky a while ago), but bonding xmit routine needs to lock a
central lock, shared by all queues, so it wont be very efficient...

Since this bothers me a bit, I will probably work on this in a near
future. (adding real multiqueue capability and RCU to bonding fast
paths)

Ref: http://permalink.gmane.org/gmane.linux.network/152987


--

From: Jay Vosburgh
Date: Thursday, April 15, 2010 - 11:09 am

The lock is a read lock, so theoretically it should be possible
to enter the bonding transmit function on multiple CPUs at the same

	The question I have about it (and the above patch), is: what
does multi-queue "awareness" really mean for a bonding device?  How does
allocating a bunch of TX queues help, given that the determination of
the transmitting device hasn't necessarily been made?

	I haven't had the chance to acquire some multi-queue network
cards and check things out with bonding, so I'm not really sure how it
should work.  Should the bond look, from a multi-queue perspective, like
the largest slave, or should it look like the sum of the slaves?  Some
of this is may be mode-specific, as well.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 11:41 am

Yes, and with 10Gb cards, this is a limiting factor, if you want to send
14 million packets per second ;)

read_lock() is one atomic op, dirtying cacheline
read_unlock() is one atomic op, dirtying cache line again (if contended)

in active-passive mode, RCU use should be really easy, given netdevices
are already RCU compatable. This way, each cpu only reads bonding state,

Well, it is a problem that was also taken into account with vlan, you
might take a look at this commit :

commit 669d3e0babb40018dd6e78f4093c13a2eac73866
Author: Vasu Dev <vasu.dev@intel.com>
Date:   Tue Mar 23 14:41:45 2010 +0000

    vlan: adds vlan_dev_select_queue
    
    This is required to correctly select vlan tx queue for a driver
    supporting multi tx queue with ndo_select_queue implemented since
    currently selected vlan tx queue is unaligned to selected queue by
    real net_devce ndo_select_queue.
    
    Unaligned vlan tx queue selection causes thrash with higher vlan
    tx lock contention for least fcoe traffic and wrong socket tx
    queue_mapping for ixgbe having ndo_select_queue implemented.
    
    -v2
    
    As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
    vlan net_device_ops to have them with and without
vlan_dev_select_queue
    and then select according to real dev ndo_select_queue present or
not
    for a vlan net_device. This is to completely skip
vlan_dev_select_queue
    calling for real net_device not supporting ndo_select_queue.
    
    Signed-off-by: Vasu Dev <vasu.dev@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>




--

From: George B.
Date: Thursday, April 15, 2010 - 8:54 pm

I would say that having the number of bands be either the number of
cores or 4, whichever is the smaller would be a good start.  That is
probably fine for GigE.  Of the network cards we have that support
multiqueue, they are either 4 or 8 bands.  In an optimal world, you
would have the number of bands that you have available at the physical
ethernet level but changing those on the fly in case of a change in
available interfaces might be more trouble than it is worth.

Four or eight would seem to be a good number to start with as I don't
think I have seen an ethernet card with less than 4.  If you have
fewer than 4 CPUs there probably isn't much utility in having more
bands than processors, or maybe that utility rapidly diminishes as the
number of bands increases beyond the number of CPUs.  At that point
you have probably just spent a lot of work building a bigger buffer.

I would be happy with 4 bands.  I guess it just depends on where you
want the bottleneck.  If you have 8 bands on the bond driver (another
reasonable alternative) and only 4 bands available for output, you
have just moved the contention down a layer to between the bond and
the ethernet driver.  But I am a fan of moving the point of contention
as far away from the application interface as possible.  If I have one
big lock around the bond driver and have 6 things waiting to talk to
the network, those are six things that can't be doing anything else.
I would rather have the application handle its network task and get
back to other things.  Now if you have 8 bands of bond and only 4
bands of ethernet, or even one band of ethernet, oh well.  Maybe have
1 to 8 bands configurable by an option to the driver that could be set
explicitly and defaults to, say, 4?

Thanks for taking the time to answer.

George
--

From: George B.
Date: Thursday, April 15, 2010 - 9:00 pm

That would be great and you would have my sincere thanks..  And if
anyone is interested, what we do is take a pair of "top of rack"
switches and cluster them together so they appear as one switch.
Configure a LAG consisting of a port on each physical switch to a pair
of bonded interfaces on the server and use mode 2 bonding.  In normal
operation, both interfaces are active.  Should one switch experience a
power or interface failure, the server sees one of the interfaces fail
but just keeps working on the remaining interface.  There is no
"failover" event going on.

Thanks,

George
--

From: Eric Dumazet
Date: Thursday, April 15, 2010 - 9:53 pm

What kind of traffic do your machines manage exactly ?

On server, you use two ports of the same kind (same number of queues) ?


--

From: George B.
Date: Friday, April 16, 2010 - 12:28 am

Yes, same kind.  We try to make everything identical.  Fewer problems that way.

George
--

Previous thread: [RFC][PATCH] xfrm6 refcnt problem in bundle creation by Nicolas Dichtel on Thursday, April 15, 2010 - 9:32 am. (2 messages)

Next thread: Re: [PATCH net-next] net/l2tp/l2tp_debugfs.c: Convert NIPQUAD to %pI4 by James Chapman on Thursday, April 15, 2010 - 11:44 am. (1 message)