Re: [PATCH 4/5] virtio_net: Add a MAC filter table

Previous thread: [PATCH] iproute2: 0 out memory for tc_skbedit struct prior to using it. by Jeff Kirsher on Monday, January 26, 2009 - 8:20 pm. (3 messages)

Next thread: Re: [PATCH 2/5] virtio_net: Add a virtqueue for outbound control commands by Alex Williamson on Monday, January 26, 2009 - 9:00 pm. (4 messages)
From: Alex Williamson
Date: Monday, January 26, 2009 - 8:38 pm

Hi Rusty,


I suppose it's just a matter of where do you want to add the smarts and
the tune-ability.  Since we can't actually have an infinite table and an
array implementation seems to make sense from an efficiency standpoint,
it needs to be defined by someone to be a fixed size before we start
using it.  I was hoping the guest driver might have a better idea how it
plans to use the filter table and that there'd be some benefit to having
that handshake happen between the driver and the backend.  The module
parameter fell out of this and seems rather convenient.

I could pursue this is you like, but I'm not sure of the benefit,
particularly if we want to give the user some control of the size of the
actual table.  Thoughts?  Thanks for the comments,

Alex 

-- 
Alex Williamson                             HP Open Source & Linux Org.

--

From: Rusty Russell
Date: Wednesday, January 28, 2009 - 3:45 am

I don't think the either-or case is real.  Say the user decides they want a table of 1000000 entries.  And the backend says "no way, I have a 16 array"?  Currently you get nothing.

I guess your qemu code does dynamic allocation.  But I'm sure you put a limit in there, right? :)

We don't want some complex negotiation, and I don't think the guest has any more clue than the host, nor can do much about it.

Cheers,
Rusty.
--

From: Alex Williamson
Date: Wednesday, January 28, 2009 - 10:48 am

Hi Rusty,


Ok, I think I see what you're getting at.  Linux can handle that, but

Right, it allocates the table only when the alloc call is made.  The
limit is currently what will fit in a host page size, which is perhaps a
little arbitrary, but I had some FUD about multi-page sg lists.  The

For a typical Linux guest, I agree, and I would guess that most users
won't know or care about the module option to specify the size, nor
should qemu ever balk at allocating the default size Linux requests.
However, I also know of a more embedded, proprietary OS that lives in a
fairly rigid network environment and does have a specific number of
entries they're looking for.  Other OSes might have different numbers.

Here's what I believe to be the parameters around which I've designed
the current interface:

     A. Receiving unwanted packets is a waste of resources.
     B. Proprietary OSes may have dependencies on hardware filtering and
        may not be able to handle unwanted packets.
     C. Different guests may want different filter table sizes.
     D. The guest can make better decisions than the host about what
table, needs to be able to specify the size of the filter table, and is
in the best position to program the filter table.  A pseudo-infinite
table violates the condition of not sending the guest unwanted packets.
We could also go down the path of deciding what a "big enough" table is,
and making it part of the ABI to avoid the alloc, but then we may have
to revise the ABI if we underestimate (whereas the current limit is an
implementation detail).

Weaving in your comments from other parts of this series, would it help
at all to distinguish EINVAL from ENOMEM for the alloc call?  Maybe the
caller could make some kind of intelligent decision based on that (ex.
unimplemented feature vs try something smaller).  Thanks for the
comments,

Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.

--

From: Rusty Russell
Date: Wednesday, January 28, 2009 - 4:55 pm

Hi Alex,


A general OS has to handle the "unwanted packets" case if the NIC's table isn't big enough.  A guest may want arbitrary filter table sizes, but you're not giving it to them anyway, so that argument is bogus too.

The host *has* to decide the max table size, the guest *doesn't*.  Your implementation is the worst of both worlds.  The host doesn't tell the guest what the limit is, but does fail it for exceeding it.  Your guest implementation limits itself arbitrarily, but you feel better because you've invoked yet-another party (the admin of the user box) to fix it!



OK, say the host bonds a NIC directly to the virtio nic for the guest.  That NIC uses a hash to filter MAC addresses.  In your scheme, you have numerous problems:
(1) we've defined filtering to be perfect, so we need to filter in the host,
(2) the guest will stop giving us filtering information after 16 entries, and

None of the currently defined calls should fail.  Feature bits should guarantee the existence of features, and failure should be greeted with horror by the guest, which should then try to limp along.

So, this conversation has convinced me that the host should accept arbitrary filtering entries, and the guest should accept that it is best effort.

Cheers,
Rusty.
--

From: Herbert Xu
Date: Wednesday, January 28, 2009 - 5:34 pm

I agree with this completely.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Stevens
Date: Wednesday, January 28, 2009 - 11:17 pm

I haven't been following this closely, so apologies if the point's been 
made, or
if you're talking about unicast addresses here too, but just to be clear:

For multicasting, false positives are ok, false negatives are not 
(non-functional),
and if the fixed-size address filter is exceeded, _multicast_promiscuous_
(but not all unicasts, so not promiscuous mode) is the "good" thing to do.
So "best effort" still shouldn't lead to an address you previously joined 
not
being passed because a new one is added.

Also, if you can't keep all the MAC multicast addresses (ie,
the limit is memory and not look-up speed), then getting
out of multicast-promiscuous mode correctly isn't easy
since you don't know what groups you "forgot". You could
rebuild from the protocol memberships, if you know when
you've left enough groups to fit, but otherwise the MAC
multicast addresses you didn't keep of course won't work if you
leave multicast-promiscuous mode and the filter doesn't
have them.

So, if you're talking about not being able to fit all
the address (vs. not wanting to search that many), then
I'd suggest either staying in MP mode until ifdown, or
making the join a hard failure at the limit in the first
place.

                                                        +-DLS

--

From: Rusty Russell
Date: Friday, January 30, 2009 - 12:03 am

The command is simply "set the filter", not "add" and "delete", so the host doesn't have this problem.

VLAN has more of an issue: that has add and del.  So the host can do a counting hash, or say remember 16 and then a count of overflows, but both will be suboptimal over time.

Cheers,
Rusty.
--

Previous thread: [PATCH] iproute2: 0 out memory for tc_skbedit struct prior to using it. by Jeff Kirsher on Monday, January 26, 2009 - 8:20 pm. (3 messages)

Next thread: Re: [PATCH 2/5] virtio_net: Add a virtqueue for outbound control commands by Alex Williamson on Monday, January 26, 2009 - 9:00 pm. (4 messages)