Re: PATCH: Multicast: Filter multicast traffic per socket mc_list

Previous thread: [PATCH net-next] myri10ge: add MODULE_DEVICE_TABLE by Brice Goglin on Thursday, April 16, 2009 - 5:29 am. (2 messages)

Next thread: [RFC] add missing MODULE_DESCRIPTION to drivers/net/* by devzero on Thursday, April 16, 2009 - 8:32 am. (2 messages)
From: Christoph Lameter
Date: Thursday, April 16, 2009 - 7:38 am

Do what David Stevens suggest: Add a per socket option



Subject: Multicast: Filter Multicast traffic per socket mc_list

If two processes open the same port as a multicast socket and then
join two different multicast groups then traffic for both multicast groups
is forwarded to either process. This means that application will get surprising
data that they did not ask for. Applications will have to filter these out in
order to work correctly if multiple apps run on the same system.

These are pretty strange semantics but they have been around since the
beginning of multicast support on Unix systems. Most of the other operating
systems supporting Multicast have since changed to only supplying multicast
traffic to a socket that was selected through multicast join operations.

This patch does change Linux to behave in the same way. But there may be
applications that rely on the old behavior. Therefore we provide a means
to switch back to the old behavior using a new multicast socket option

	IP_MULTICAST_ALL

If set then all multicast traffic to the port is forwarded to the socket
(additional constraints are the SSM inclusion and exclusion lists!).
If not set (default) then only traffic for multicast groups that were
joined by thesocket is received.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/in.h      |    1 +
 include/net/inet_sock.h |    3 ++-
 net/ipv4/igmp.c         |    4 ++--
 net/ipv4/ip_sockglue.c  |   11 +++++++++++
 4 files changed, 16 insertions(+), 3 deletions(-)

Index: linux-2.6/include/net/inet_sock.h
===================================================================
--- linux-2.6.orig/include/net/inet_sock.h	2009-04-16 08:59:20.000000000 -0500
+++ linux-2.6/include/net/inet_sock.h	2009-04-16 09:04:47.000000000 -0500
@@ -130,7 +130,8 @@ struct inet_sock {
 				freebind:1,
 				hdrincl:1,
 				mc_loop:1,
-				transparent:1;
+				transparent:1,
+				mc_all:1;
 	int			mc_index;
 	__be32			mc_addr;
 	struct ...
From: David Stevens
Date: Thursday, April 16, 2009 - 8:09 am

This isn't what I suggested-- you have the default backwards. It must 
default
to current behavior, or it's pointless.

The text you have with it is overstated, too. Of course applications using
your model can still receive unexpected data-- it does not reserve the
port or multicast address to just your sender or to multicast traffic 
alone.

My suggestion is to do nothing. :-) But if that's too difficult, an 
alternative
would be a socket option that delivers traffic for joined groups only and
defaults off. In fact, it'd probably be most useful if it also prevents 
unicast
traffic for sockets using that port, too. None of these things have the 
magic
effect of preventing unwanted data delivery, but it'd allow you to receive
multiple, specific groups on a single socket with just the joins to 
indicate
which.

                                                +-DLS



--

From: Christoph Lameter
Date: Thursday, April 16, 2009 - 8:36 am

If it would default to the current behavior then it would be incompatible
with the behavior of other operating systems and the surprising behavior
of the Linux multicast stack would continue to exist. The unusual behavior

The application will no longer receive traffic from multicast groups that
it did not subscribe to. Yes unicast can still result in unexpected
traffic.
--

From: David Miller
Date: Thursday, April 16, 2009 - 3:15 pm

From: Christoph Lameter <cl@linux.com>

Umm, no.

We don't break existing applications "by default".

You're being entirely selfish here, you want your application to work
without having to specify the socket option to get the new behavior.

Well guess what?  Under Linux you will have to!
--

From: Neil Horman
Date: Thursday, April 16, 2009 - 8:15 am

I think your comment is reveresed here isn't it?  the default you have below is
that mc_all is set, which defaults you to the existing behavior, rather than the
new behavior introduced by this patch.


Ack to the patch though
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Neil


--

From: Christoph Lameter
Date: Thursday, April 16, 2009 - 8:36 am

Thanks.

--

From: Neil Horman
Date: Thursday, April 16, 2009 - 10:44 am

I'm sorry, I misread it (confused the definiton of a bitfield with its default
value.  As Dave noted, the default needs to be the current behavior, not your
new behavior.  Until thats changed, I rescind my Ack
--

From: Christoph Lameter
Date: Thursday, April 16, 2009 - 12:12 pm

Well guess then we need the global proc setting after all. With the
current misbehavior as a default applications need to be rebuilt and
source code that is running on multiple OSes now would have to customized
to special case for Linux.

So add a global proc setting to determine the initial setting of IP_MULTICAST_ALL?

--

From: David Stevens
Date: Thursday, April 16, 2009 - 1:56 pm

The current behavior, as either your or Vlad's RFC quotes pointed
out as easily as the history to go with it, is exactly the expected 
behavior
for decades. I think it is not misbehavior so much as your misconception,

        No, actually. If you write it for the current behavior, it'll work
fine on an OS like Solaris that has departed from the original socket
behavior. If you're sloppy and don't handle unexpected traffic, it'll be
wrong on both-- you just won't know it until someone runs something with
IP_MULTICAST_ALL?

        This breaks unknown existing applications that are correctly
written. I think it's clearly wrong to change the behavior of someone
else's socket to match your idea of how it should've been done 25 years
too late. An option that enables new behavior for your own socket, which
must be a new app, is fine. Adding a socket option as part of a port
is no great hurdle, and I'm guessing you aren't trying to run a Solaris
binary on Linux. So what's the problem?

                                                                +-DLS

--

From: Christoph Lameter
Date: Thursday, April 16, 2009 - 2:04 pm

Guess its the obvious: Software should run on multiple OSes without
too much special casing. Linux is the only special case that I am aware of
that misbehaves.

Adding a socket is no easy thing given the architecture of the software
(and of other software) that did not consider that Linux faithfully
replicating bugs from 25 years ago that no longer exist in other OSes.

Cannot imagine there to be too much software out there that relies on this
strange behavior. Otherwise the software would not work on various other
platforms.

Can you give us a list of products that verifiably rely on the current
behavior?
--

From: David Stevens
Date: Thursday, April 16, 2009 - 2:54 pm

All flavors of UNIX did it this way originally. I never tried
it on Windows. I heard years ago when Solaris changed their behavior
and it's been reported in this thread that current BSD does, too.
But, again, this is not in the least misbehavior. It simply doesn't
follow your model of how you thought it behaved. Linux does exactly
what Steve Deering wanted multicasting to do when he wrote the RFC
for it. It adds an address on the interface, and the binding determines
whether it's delivered to a particular socket or not. That is the
"ANY" in INADDR_ANY, just like unicasting. If you want particular
addresses only, the bind system call does that already. It makes

        I don't have any say in what other OSes do, but I'd call it a bug

        I don't know the extent of your survey, but Linux legacy is the
problem with changing the default behavior for sockets other than your
app. You don't need any special code at all-- write them all to assume
they may receive packets not for them, because they are broken if they

        I don't do app surveys any more than you do OS surveys. But
I don't want to change the semantics of multicast sockets and you do.
Can you guarantee nothing will break from this change?

                                                                +-DLS

--

From: David Miller
Date: Thursday, April 16, 2009 - 3:19 pm

From: Christoph Lameter <cl@linux.com>

Christoph just drop this, we're not creating a system-wide default
selection that backs away from 15+ years of precedence.

Maybe Solaris has so few users that it's OK for them to go down
that path, but for us it's unacceptable to do things like this.

Fix your application.  And as David noted, it will be not only
more robust, but also still work on those "other systems."

So even your "works on all systems" argument is groundless.  If
you make it work under Linux it will in fact work on all systems,
and be more robust in the case of other applications using the
same multicast address and port.
--

From: Vlad Yasevich
Date: Thursday, April 16, 2009 - 2:19 pm

What seems to be happening though, is that there is an expectation that
this behavior would change with advent of IGMPv3, which adds the additional
filtering text.  Now, we could point out that there is no normative text
that requires this filtering on groups, only on sources, but the expectation

I'd have to reluctantly agree here.  Any application that expects original
multicast behavior will be broken by a system-wide change.  I think existing

I wonder how BSD and Solaris got away with it?  They both filter on multicast
groups and source addresses.  This is not meant as rhetorical or provocative,
just genuinely wondering.

-vlad
--

From: David Miller
Date: Thursday, April 16, 2009 - 3:20 pm

From: Vlad Yasevich <vladislav.yasevich@hp.com>

Smaller user base.
--

From: David Stevens
Date: Thursday, April 16, 2009 - 3:22 pm

I have no such expectation. :-) The additional filters are 
(already)
applied per-socket, but existing apps not using source filters behave as
they did before IGMPv3. That's what I'd expect.
        The RFC you quoted for SSM applies to only the SSM address space,
mentions this behavior explicitly as the norm for outside of that space,
and Linux doesn't support that RFC. If it did, it would include an

        I think in practice, it doesn't come up much. That's why people
seem so surprised to learn it works this way, and not the way they
thought it did after using it, sometimes for years. But the documentation
doesn't say a join limits what you receive on a socket, or that it
has to be the same socket you're doing I/O on; people simply assume it.

                                                                +-DLS

--

From: Stephen Hemminger
Date: Thursday, April 16, 2009 - 4:30 pm

On Thu, 16 Apr 2009 15:22:49 -0700

You could always use packet/socket filter to keep the packets from
coming out to user space.
--

From: Vlad Yasevich
Date: Thursday, April 16, 2009 - 5:01 pm

Yes, after reading more of SSM spec, it definitely only applies to SSM
addresses that we don't support yet.  Just to clear this one item up,
I think the expectation comes from the IGMPv3 spec:

     Filtering of packets based upon a socket's multicast reception
     state is a new feature of this service interface.  The previous
     service interface [RFC1112] described no filtering based upon
     multicast join state; rather, a join on a socket simply caused the
     host to join a group on the given interface, and packets destined
     for that group could be delivered to all sockets whether they had
     joined or not.

I could be inferred from this rather vague text that in addition to source
filtering, group filters should be done.  Thus the expectation that we've
been dealing with.

That's the last I'll mention this, since most salient points have been
agreed on.

Thanks

--

From: David Miller
Date: Thursday, April 16, 2009 - 3:16 pm

From: Christoph Lameter <cl@linux.com>

No Christoph, do this right.

Linux by default will behave the way it has for 15+ years.  And if an
application wants new behavior, you have to ask for it.

End of story.
--

From: Christoph Lameter
Date: Friday, April 17, 2009 - 6:56 am

This is not right. All other OSes filter multicast traffic according to
the multicast groups subscribed too (and that includes the evil one).
There is no requirement of asking for "new" behavior. Why should multicast
applications have to add special code to request something that comes by
default on other platforms?

The old behavior does not seem to be usable anyways and its certainly
looks buggy if multicast packets are duplicated by the kernel and sent to
applications that never have asked for it. And OS should do the sane thing
by default and not only if someone asks for it.


--

From: Nivedita Singhvi
Date: Friday, April 17, 2009 - 8:37 am

I need the current behaviour to not change, as it would
break some people I support.  DaveM is making the right
decision here, and I fully support this.

And I'm one of those people working on low latency and
hoping messaging clients get better in their multicast
usage..just that this is not one of those ways.

Ideally, you could tweak OS environment configuration
setting, if you don't want per socket. But it cannot
be the default.


thanks,
Nivedita

--

From: Christoph Lameter
Date: Friday, April 17, 2009 - 9:02 am

People or applications? There are applications that only run on Linux and
fail on other OS? How does this work? Special casing depending on the OS

Would you support an additional OS config variable that would set the
default for socket operations? Then we could have a per socket option that
would allow overriding the OS config variable?
--

From: Nivedita Singhvi
Date: Friday, April 17, 2009 - 9:28 am

That would be my choice personally, because it would be
easier than scripting some solution to modify potentially
hundreds of sockets on a system...

Does that sound acceptable?

thanks,
Nivedita
--

From: David Miller
Date: Friday, April 17, 2009 - 3:24 pm

From: Christoph Lameter <cl@linux.com>

Christoph I just want to let you know that I'm totally ignoring
everything further you say on this issue, becuase you're way out of
line and totally ignoring the real issues here.

What's next?  Tomorrow, if you think Linux's open() system call
behavior doesn't suit your needs, I want you to send a sysctl patch to
Al Viro that changes the system wide behavior and we'll see how far
you get with that.

The fact is, you cannot just say "oops we didn't mean to do that" when
something has behaved a certain way, visible to users, for more that
15 years.

And the fact is, WE DID MEAN to do things this way.

As David Stevens explained, the original creator of multicasting, the
original BSD code, and the RFCs, INTENDED this behavior from the very
beginning.

You want to ignore all of this, as if none of it matters and that what
you want to achieve is so much more important.
--

From: Christoph Lameter
Date: Monday, April 20, 2009 - 11:10 am

I am not ignoring it. It seems just that other OSes have moved from this
and we are one of the last holdouts. Its not only Solaris but also BSD and
Windoze. Best to have a solution that is consistent across multiple OSes.


--

From: David Stevens
Date: Friday, April 17, 2009 - 2:31 pm

Linux is not Solaris. I think Solaris is wrong to change the
behavior from the original BSD behavior, but it should be no surprise
that there are other differences in the API's, too. It's not difficult
to write code that works as intended on both, and the case Solaris is
trying to avoid is not really avoided since you can still receive
unicast traffic, or totally unrelated multicast traffic on the shared
port and multicast address space. If the app doesn't use the port to
distinguish it, it simply should bind the multicast address it wants,
use PKTINFO, SO_BINDTODEVICE or the like as well. In your case, multiple
sockets or filtering based on the "to" address are possibilties that
work on Solaris too, and fix more unintended traffic problems than
just a different group.
        A per-socket option is a more trivial way to do this, but
turning it on for sockets that want the existing, intended and
long-standing behavior is obviously wrong.

                                                                +-DLS

--

From: Christoph Lameter
Date: Monday, April 20, 2009 - 9:43 am

By that you mean unrelated multicast traffic destined to the same
multicast address and port?

--

From: David Stevens
Date: Monday, April 20, 2009 - 11:46 am

Yes. If neither the port nor the multicast address are
registered than anyone on your network can use them for anything. 
Even if they are registered, someone may still use it; sending
requires no special privilege, and neither does joing groups or
binding to ports above 1024. Anyone on your network, or within
your multicast routing domain, may reuse both (even if they
intend it for a different machine) and your app will receive
them.

        I think generally the best approach is to bind to the
particular multicast address and use SO_BINDTODEVICE if it
matters to the app. But the app still has to handle receiving
data from a different source or totally unrelated data;
it certainly can receive those, because anyone can send those.

        I can see the value of a per-socket, default-off option
in the case where you want multiple groups on a single socket,
and I encourage you to submit that as a patch. It reduces the
work the receiver has to do, but doesn't eliminate it. The
way I'd do that is to use multiple sockets, one bound to each
group, but ok. As long as it doesn't change the existing
behavior out from under existing, unknown apps.

                                                                +-DLS

--

From: Vlad Yasevich
Date: Thursday, April 16, 2009 - 8:24 am

I don't think this change is needed.  ipv4_is_lbcast() checks if the
address is 255.255.255.255.  That address is already !ipv4_is_multicast().


You might need to set inet->mc_all to 1 in inet_create() since I am not sure if
we want to change the default behavior.  The knowledge that some apps have
a very "unique" way of doing multicast makes me a little hesitant.

-vlad
--

From: Christoph Lameter
Date: Thursday, April 16, 2009 - 8:39 am

Those "unique" applications would only be able to run on Linux.
Application mostly are written for multiple Unix variants. Since the
other Unix variants have changed their default behavior it is reasonable
to also change the default under Linux.

--

Previous thread: [PATCH net-next] myri10ge: add MODULE_DEVICE_TABLE by Brice Goglin on Thursday, April 16, 2009 - 5:29 am. (2 messages)

Next thread: [RFC] add missing MODULE_DESCRIPTION to drivers/net/* by devzero on Thursday, April 16, 2009 - 8:32 am. (2 messages)