Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0

Previous thread: pull request: wireless-2.6 2009-03-16 by John W. Linville on Monday, March 16, 2009 - 4:14 pm. (3 messages)

Next thread: Re: [Bugme-new] [Bug 12886] New: skge wake on lan by Andrew Morton on Monday, March 16, 2009 - 8:04 pm. (10 messages)
From: Felix von Leitner
Date: Monday, March 16, 2009 - 4:48 pm

Here's an strace:

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)

This is supposed to work, and it works on other operating systems, even
on Mac OS X.

I think it used to work on Linux, too.

I'm using 2.6.29-rc7 right now, but others have reported this not
working on distro kernels, too.

Felix
--

From: Stephen Hemminger
Date: Monday, March 16, 2009 - 5:00 pm

On Tue, 17 Mar 2009 00:48:10 +0100

Most likely you already have same port open on IPV4 and unless
you set IPV6 only, the bind bind will fail. The standard way
of doing servers is to bind only for IPV6 and handle IPV4
clients via the 6-4 address mapping.
--

From: Felix von Leitner
Date: Monday, March 16, 2009 - 5:18 pm

No I don't have anything else on that port.

BTW, just for the record, binding to ::ffff:10.0.0.3 (my eth0 address at
the moment) still works, so the mechanism is not completely broken.

Felix
--

From: Brian Haley
Date: Monday, March 16, 2009 - 7:26 pm

I don't think this ever worked on Linux, from the very beginning of inet6_bind():

        /* Check if the address belongs to the host. */
        if (addr_type == IPV6_ADDR_MAPPED) {
                v4addr = addr->sin6_addr.s6_addr32[3];
                if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
                        err = -EADDRNOTAVAIL;
                        goto out;
                }
        } else {

So if it's a mapped address, the lower 32-bits must contain a local address.
RFC 3493 doesn't specifically mention what to do with ::ffff:0.0.0.0, so this
looks like a gray area to me.

So are you trying to get IPv4-only behavior out of this socket?  Seems like the
wrong way to go about it.

-Brian
--

From: Eric Dumazet
Date: Monday, March 16, 2009 - 7:47 pm

To me, section 3.7 of RFC 3493 is not gray. It is only refering to interoperate
with IPV4 applications. 
Ie *sending* UDP messages to IPV4 nodes, or *connect* to TCP IPV4 nodes.

So "::ffff:0.0.0.0" has no meaning to contact an IPV4 node, since 0.0.0.0 is not
a valid IPV4 address.

RFC 2373 is also clear

Part of RFC 3493 :

   Applications may use AF_INET6 sockets to open TCP connections to IPv4
   nodes, or send UDP packets to IPv4 nodes, by simply encoding the
   destination's IPv4 address as an IPv4-mapped IPv6 address, and
   passing that address, within a sockaddr_in6 structure, in the
   connect() or sendto() call.  When applications use AF_INET6 sockets
   to accept TCP connections from IPv4 nodes, or receive UDP packets
   from IPv4 nodes, the system returns the peer's address to the
   application in the accept(), recvfrom(), or getpeername() call using
   a sockaddr_in6 structure encoded this way.



RFC 2373 states :

 The IPv6 transition mechanisms [TRAN] include a technique for hosts
 and routers to dynamically tunnel IPv6 packets over IPv4 routing
 infrastructure.  IPv6 nodes that utilize this technique are assigned
 special IPv6 unicast addresses that carry an IPv4 address in the low-
 order 32-bits.  This type of address is termed an "IPv4-compatible
 IPv6 address" and has the format:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|0000|    IPv4 address     |
   +--------------------------------------+----+---------------------+

 A second type of IPv6 address which holds an embedded IPv4 address is
 also defined.  This address is used to represent the addresses of
 IPv4-only nodes (those that *do not* support IPv6) as IPv6 addresses.
 This type of address is termed an "IPv4-mapped IPv6 address" and has
 the format:

   |                80 bits               | 16 |      32 bits        |
   ...
From: Brian Haley
Date: Tuesday, March 17, 2009 - 9:00 am

I agree with you Eric :)  I was simply referring to the fact that RFC 3493
doesn't distinguish between valid and invalid use of mapped addresses:

  IPv4-mapped addresses are written as follows:

      ::FFFF:<IPv4-address>

<IPv4-address> could be interpreted as 0.0.0.0 if you take that little section
out of context.

-Brian
--

From: Felix von Leitner
Date: Tuesday, March 17, 2009 - 5:58 am

What is the harm in allowing this?  That way an application ported to
IPv6 can still bind IPv4-only.  Why would it be legal to bind to a
specific IPv4 address but not to all IPv4 addresses?

The specific case is a bittorrent tracker.  The code was ported to IPv6,
but since there is so much overhead in storing IPv6 addresses you are
supposed to run two processes, one on the IPv6 address and one on the
IPv4 address (the IPv4 one then does not have overhead).  The sane way
to do this is to bind the IPv6 socket to ::ffff:0.0.0.0 then.  Otherwise
you would need some kind of giant abstraction layer in the application.
And we specifically added the ipv4 mapped addresses so applications
would not need to have a giant abstraction layer.


Why would you say that?

Felix
--

From: Vlad Yasevich
Date: Tuesday, March 17, 2009 - 6:47 am

Sorry, I just don't buy this.  You imply that you don't want the overhead
of storing IPv6 addresses, but you still get this with ::ffff:0.0.0.0.
In fact, now your overhead is even worse since ever IPv4 address will be
stored stored and interpreted as IPv6 128 bit address.

If you really care about overhead, run 2 services.  Your IPv6 service
will only track real IPv6 addresses and will reduce you total overhead.

If you don't care about overhead, just bind a single socket to :: and
you will get behavior identical for the ::fff:0.0.0.0 case, but with
the added benefit of tracking real ipv6 addresses as well.

Having written support for ::ffff:0.0.0.0, I've always thought it was
a bastardized case that didn't provide any benefits.  It was like saying:
"I've got IPv6 on my system, but I don't really support it, even though

Because that case doesn't provide any benefits.  It only has the drawback that
you have to deal with ipv4-mapped IPv6 addresses witch is the overhead of
the whole thing.

If you are prepared to deal with it, you might as well deal with real ipv6 addresses
at the same time and mitigate your overhead somewhat.


--

From: Felix von Leitner
Date: Tuesday, March 17, 2009 - 7:14 am

I am worried about the overhead of storing the IPv6 addresses.
I am not storing them in the IPv4 case.

But the socket code has been rewritten to use IPv6 addresses only,

You probably mean well but please stick to the problem at hand and don't

The app has a command line option to specify which address to bind to.
The app understands IPv4 addresses and converts them to ipv4 mapped
addresses so it can only deal with sockaddr_in6 when talking to the
kernel and does not need to store info on what kind of socket family it
is dealing with.

If someone specifies 0.0.0.0, it does not work.  It's that easy.

Now it may be a fascinating side discussion on whether you think IPv4
mapped 0.0.0.0 is useful or not, but rest assured: it is useful to at


That is not a drawback.  On the contrary.  It greatly simplifies how the

You are currently proving all the snide remarks by the BSD people about
the Linux IP stack true, and the "professionalism" snide remarks of the
Solaris people.  Great work, man.

Felix
--

From: Vlad Yasevich
Date: Tuesday, March 17, 2009 - 7:57 am

So, what you want to do is provide IPv4 only service on a fully
configured dual-stacked machine by running an IPv6 enabled application?

Why do you not want to provide IPv6 side of the same service?

You mentioned overhead (and I am guessing that's the answer the above question),
but is the number of IPv6 clients so high that your service would
not be able to handle it.

As I've already mentioned, your overhead of tracking IPv6 clients is actually
lower that tracking all the IPv4 clients using mapped addresses.

One way of preventing the tracking IPv6 clients is by disallowing IPv6 traffic
or even not configuring any IPv6 addresses.  That could get what you want

In this case, you are making a trade-off of application complexity against
kernel complexity.  You are making your application much simpler, while demanding
more complexity from the kernel.

It is your right as an application developer, and it our right as kernel developers

This is really a great way to convince someone to do the work... :/

-vlad

--

From: Felix von Leitner
Date: Tuesday, March 17, 2009 - 10:51 am

Yes.
Actually, I want to provide IPv6 and IPv4 service, but it turns out the

As I said, in this particular case, you run two processes.
One for IPv6 and one for IPv4.

The reason is that

  a) it's P2P, so you don't want to provide IPv6 addresses of peers to
  IPv4 users anyway, because if they supported IPv6, they'd be
  connecting via IPv6.

  b) IPv4 users outnumber IPv6 users by a wide margin.  For the IPv4
  case it does not make sense to waste 12 bytes per IP address to even

The overhead is the memory overhead needed to store the IP addresses of
the peers.  For some popular files we are talking about a five digit
number of peers, and we don't want to store the full IPv6 address for
those.  We do want to use IPv6 sockets so we don't have to add code to
differentiate and make it work, because the kernel already has that code
in the form of the ipv4-mapped address handling code.  And it works,

As I said, this is not _me_ who wants to bind there.  It's the user who
uses "-i 0.0.0.0" to get a process that runs only in IPv4 mode.  It took
me a while to see the point in that, too.

But again, it's not my place to argue with the customers on how they
want to use the software.  It's my place to provide software that does

You did not understand the problem then.

We do have IPv6, and we have it enabled, and we run a copy of the
software on the IPv6 address, too.

Now we could bind to the specific address of the PC, but that happens to
inferfere with the load balancing and failover installation we have.  In
the case of one failing node, we configure that IP address on one of the

In fact it's the other way around.

I waited for the kernel to support v4 mapped addresses.
Then I wrote the socket layer on top of it.

You already committed on providing the complexity.  Now I just want you

Hey, I'm just saying.  My middleware runs on Linux, BSD, OSX and
Solaris.  I'm just writing the middleware.  Previously, users of my
middleware switched from BSD to Linux because ...
From: Eric Dumazet
Date: Tuesday, March 17, 2009 - 8:21 am

Trying to understand why you seem furious, lets try to be pragmatic.

Most users of your great program wont have a fix for this until next year.

I am afraid you have no choice but change your program, or loose users.

Still I dont get your point. Having TCP V6 sockets is much more expensive
at kernel level (same for UDP), and bittorrent is known to stress network a bit, so
having application use an IPV4 socket where it can is a win for your
program getting more users, and computers spend less power.

grep TCP /proc/slabinfo

tw_sock_TCPv6          0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
TCPv6                140    140   1600   20    8 : tunables    0    0    0 : slabdata      7      7      0
tw_sock_TCP          256    256    128   32    1 : tunables    0    0    0 : slabdata      8      8      0
TCP                  197    198   1472   22    8 : tunables    0    0    0 : slabdata      9      9      0


Gasp, OSX having this "::ffff:0.0.0.0" right is probably the reason why more computers
 run OSX than linux. Sometime dont implement RFC too literally :)




--

From: Felix von Leitner
Date: Tuesday, March 17, 2009 - 11:01 am

> Trying to understand why you seem furious, lets try to be pragmatic.

I'm not furious.  I just get angry when people I submit a bug report to
tell me they don't want to fix the bug.

Some people think that if I submit a bug to them, they are doing me a
service if they fix the bug.  In fact it's the opposite of that.  If I
submit a bug, I am doing them a service, because I am telling them in

You underestimate my users.  The few ones that run into this kind of
problem are not above patching their kernels to make it work.

But I am not willing to provide a kernel patch and do the customer

No I will not.  My program works.  Just not on Linux.
If my users see that "the Linux people" don't consider running high
profile high throughput messing systems important enough to remove one
if clause of dubious merit, then they go switch to Solaris or FreeBSD

There are two things to say to that:

  1. IPv6 is the future.  If I implement IPv4 code because the IPv6 code
  is slower, there will never be an incentive for the kernel people to
  tune the IPv6 code, and it will continue to suck.

  2. IPv4 users won't ever switch to IPv6 if they hear it's so slow that
  people like me had to provide a legacy code path for performance
  reasons.  That is exactly the wrong message to send.

  3. In my benchmarks the performance difference was negligible.  It was

Your target audience is not the RFCs, it's the people.
And the people just told you that you implemented this part of the code
wrong.

Please listen to your users and don't berate them.

Even if we assume that the RFCs can be read so that the current
implementation is technically not illegal, note that the other operating
systems interpreted it differently.  So you miss the main goal of the
RFCs, providing a fertile ground for interoperability.

Just forget all I said.  Just look at the facts.

The RFCs are unclear.
All the other major IPv6 stacks do it the other way.
Maybe they are right?

Felix
--

From: Brian Haley
Date: Tuesday, March 17, 2009 - 8:59 am

Please show me a porting guide that even mentions supporting IPv4-only mode

That was their decision, and it doesn't mean it's the right thing to do.  It
doesn't mean Linux shouldn't change either, but name-calling isn't going to get
you anywhere on this list.

Compare your bittorrent server to Apache, which is probably the most widely-used
server application in the world.  It doesn't do what you're trying to do.  See

Because if you want IPv4-only you open an AF_INET socket.  There is no
equivalent to IPv6-only, for example when you open an AF_INET6 socket and set
IPV6_ONLY on it.

-Brian
--

Previous thread: pull request: wireless-2.6 2009-03-16 by John W. Linville on Monday, March 16, 2009 - 4:14 pm. (3 messages)

Next thread: Re: [Bugme-new] [Bug 12886] New: skge wake on lan by Andrew Morton on Monday, March 16, 2009 - 8:04 pm. (10 messages)