Re: Scaling Max IP address limitation

Previous thread: NCQ/TCQ performance review (was: SATA RAID5 speed drop of 100 MB/s) by Al Boldi on Sunday, June 24, 2007 - 7:49 am. (1 message)

Next thread: [PATCH] [RFC] Adjust queue unplugging and congestion limits by Patrick Mau on Sunday, June 24, 2007 - 10:27 am. (2 messages)
From: David Jones
Date: Sunday, June 24, 2007 - 10:20 am

Hi,
I am trying to add multiple IP addresses ( v6 ) to my FC7 box on eth0. 
But I am hitting a max limit of 4000 IP address . Seems like there is a 
limiting variable in linux kernel (which one? ) that prevents from 
adding more IP addresses than 4096. What do I need to change in Linux 
kernel  ( and then recompile ) to be able to add more IP addresses than 
4K addresses per system? ..
Thanks,
-d
-

From: Andrew Morton
Date: Sunday, June 24, 2007 - 11:02 am

(cc netdev)
-

From: Robert Iakobashvili
Date: Sunday, June 24, 2007 - 12:59 pm

We are adding tens of thousand IPv4 addresses using netlink interface
without any problems. The maximum we added was 60K of secondary
IPv4 addresses. It consumes some memory, however.

We have also added thousands of IPv6. I will try to test, if there is any
limit for doing it.

-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...........................................................
http://curl-loader.sourceforge.net
A web testing and traffic generation tool.
-

From: David Jones
Date: Sunday, June 24, 2007 - 5:03 pm

I am using the "ip add " command looping sequentially up until RTNETLINK 
starts refusing to add more IP addresses. I am using a simple shell 
script to do the trick. One quick fact : If I exhaust 4K addresses on 
one port , then I can not add more IP's  ( v4/v6 alike ) on any port on 
the system. So seems like its a system wide limitation . Tried digging 
through the kernel source code but no luck so far. So definitely need 
pointers in this regard.
How are you adding via Netlink interface ?
Thanks,
-d
-

From: Robert Iakobashvili
Date: Monday, June 25, 2007 - 1:47 am

David,


Yes.

OK. Now it looks that I am reproducing something.

Running curl-loader with 60K.conf (edit the name of interface) configuration:
#ulimit -n 80000
#curl-loader -f ./conf-examples/60K.conf -w
 it adds successfully 60 000 secondary IPv4 addresses as seen by
#ip addr | wc -l

When I tryed adding IPv6 addresses, using ipv6.conf with addresses
range edited:
IP_ADDR_MIN=  2001:db8:fff5:1::1
IP_ADDR_MAX= 2001:db8:fff5:ffff::1

I am getting after initial successes some errors:
"rtnl_talk(): RTNETLINK answers: Cannot allocate memory"
and
#ip addr | wc-l is 8194.

8K addresses added and no more? It might be a memory issue. Y can dig
into the code and look into the allocation process and limits on the
kernel memory for IPv6.

The physical memory on my computer is 480 MB.
kernel is vanilla 2.6.20.7.

Try to see, what happens, when you increase the memory on your comp,
if an option.

-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...........................................................
http://curl-loader.sourceforge.net
A web testing and traffic generation tool.
-

From: Jan Engelhardt
Date: Monday, June 25, 2007 - 2:30 am

I'd be surprised if it was 4096 on x86 and 8192 on x86_64...


	Jan
-- 
-

From: Robert Iakobashvili
Date: Monday, June 25, 2007 - 2:41 am

Hi


Missed to mention: the CPU is Pentium-4.


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...........................................................
http://curl-loader.sourceforge.net
A web testing and traffic generation tool.
-

From: Jan Engelhardt
Date: Monday, June 25, 2007 - 5:38 am

That's like saying you've got a SPARC. Or a MIPS. Or a PPC.
(I can't infer from your answer whether that is running 32 or 64-bit
kernel, because there are P4s with and without 64-bit extensions.)


	Jan
-- 
-

From: Robert Iakobashvili
Date: Monday, June 25, 2007 - 5:44 am

-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...........................................................
http://curl-loader.sourceforge.net
A web testing and traffic generation tool.
-

From: David Jones
Date: Monday, June 25, 2007 - 10:26 am

Ok I have tried it on a Pentium-M ( 32 Bit ,) with 512 MB RAM and  Core 
2 Duo with 1Gig RAM  ( running SMP kernel , 2 CPUS) with same results. 
Cant go more than ~4K addresses. I have tried them with vanilla and 
custom kernels all 2.6.19+ versions.  Results are same on both systems , 
so thats the reason I am thinking that there is some limit in kernel 
source tree which I cant seem to find . Really appreciate your help in 
this regard.
Thanks,
-d


-

From: Robert Iakobashvili
Date: Sunday, June 24, 2007 - 1:19 pm

How are you doing this?

Could it be some IPv6 issue like scope?


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...........................................................
http://curl-loader.sourceforge.net
A web testing and traffic generation tool.
-

From: Kyle Moffett
Date: Sunday, June 24, 2007 - 12:08 pm

Do you really need that many IP addresses?  When somebody finally  
gets around to implementing REDIRECT support for ip6tables then you  
could just redirect them all to the same port on the local system.   
Then with a happy little getsockopt() you can find out the original  
IP address for use in whatever application you are running.  That's  
likely to be a thousand times more efficient than binary searching  
through 5000+ mostly-sequential IP addresses per received packet.

<Unrelated wishful thinking>
I keep having hopeful dreams that one day netfilter will grow support  
for cross-protocol NAT (IE: NAT a TCPv4 connection over TCPv6 to the  
IPv6-only local web server, or vice versa).  It would seem that would  
require a merged "xtables" program.

Having routing table operations, IPsec transformations, etc just be  
another step in the firewall rules would also be useful.  It would be  
handy to be able to "-j ROUTE", then "-j IPSEC", then "-j ROUTE"  
again, to re-route the now-encapsulated IPsec packets to their proper  
destination.  That would also eliminate the sort-of-hacky problems  
with destination network interface in the bridging code: "-j BRIDGE"  
might be another step in the process, and conceivably you could have  
independent bridge MAC tables too.  You'd probably also want "-j  
BRIDGE_TEST" and "-j ROUTE_TEST" to compute the output network  
interface without actually modifying the addresses.

That would also appear to get rid of the need for all tables other  
than "filter" and all predefined chains other than "INPUT" and  
"OUTPUT".  Default rules would be these:
nettables -A INPUT -j CONNTRACK
nettables -A INPUT -j LOCALMATCH
nettables -A INPUT --for-this-host -j ACCEPT
nettables -A INPUT -j OUTPUT
nettables -A OUTPUT -j ROUTE
nettables -A OUTPUT -j TRANSMIT

Forwarded packets would be sent right into the OUTPUT chain from the  
INPUT chain by appropriate rules.  Instead of turning off  
ip_forwarding in /proc/sys, you could just change the ...
From: David Stevens
Date: Sunday, June 24, 2007 - 12:54 pm

You don't actually need it (at least for easy cases like that),
a v6 socket.
        Unless you're using v6only binding (a sysctl option), you can
connect to v6-only servers using a v4 network and a v4 address of the
server. The peer address on those connections will show up as a v4
mapped address, and all the traffic will be v4, but the socket layer
is all v6.

                                                                +-DLS


-

From: Jan Engelhardt
Date: Sunday, June 24, 2007 - 12:58 pm

The way I see it, it's: "if someone gets around to implement *IPv6 NAT*"


Where's the hack? iptables operates on what it sees, and it sees br0.

Whether a packet goes out a bridge (was that the intention of -j BRIDGE?)
is determined by the routing table, which, in most cases, is just a matter


pkttables it is!

But this idea may have its benefit: by not restricting rules to certain
positions like currently, throughput could be achieved. "Evil packets"
f.e. could be dropped really early. (Well, you could also drop them early



	Jan
-- 
-

From: david
Date: Sunday, June 24, 2007 - 1:44 pm

true, but back in the real world it's sometimes desriable to hid _chich_ 
specific machine somethign comes from. so I expect that implementation of 
NAT is going tohappen at some point before it's widely deployed.

David Lang
-

From: Jan Engelhardt
Date: Sunday, June 24, 2007 - 1:52 pm

Client-transparent SOCKS5 proxy. It already exists today! ;-)
(Not as performant as an in-kernel NAT, though.)


	Jan
-- 
-

From: Kyle Moffett
Date: Sunday, June 24, 2007 - 2:51 pm

I totally agree.  On the other hand, you need REDIRECT for things  
like transparent proxies which by definition aren't visible as  

The problem is this:
I want to be able to filter bridged network traffic *both* based on  
the IP layer *and* the physical device it's going to be routed out.   
Due to fundamental problems with a statically-ordered architecture,  
it's impossible to get both, see commit  
68df071a201f06b08cdc07111c6d4af918e64edd (found here: http:// 
lists.netfilter.org/pipermail/netfilter-devel/2006-December/ 
026388.html).  Basically if you want such cross-layer hooks, right  
now your *ONLY* choice is to use marks with 2 drawbacks:  (1)  There  
are a very small number of marks which must be carefully allocated by  
your firewall-setup script  (2)  Marks are inherently extremely  

No, the intent of "-j BRIDGE" would be _after_ "-j ROUTE" and some  
kind of "-j ARP", to actually compute which physical port a given  

Well the problem is this:  Do you want the packet accepted locally or  
forwarded.  If forwarded, how do you want it routed, and which  
physical port do you want it to go out?  Without a statically-coded  
ordering for all those things there is no way to say what the  
"default" is.  You would need some way to switch between iptables/ 
ip6tables (for compatibility) and pkttables/nettables (for advanced  

It does give you a million more ways to shoot yourself in the foot.   
Some things would have constraints like "output device must be  
set" (BRIDGE/ARP, for example).  If you accidentally stuck non- 
constrained things in the wrong order you could get totally-non-IP- 
compliant behavior.  On the other hand, it does give you many choices  
about IPsec before or after ROUTING (or after one routing step and  
before another), etc.

Cheers,
Kyle Moffett

-

From: Patrick McHardy
Date: Monday, June 25, 2007 - 2:36 am

You don't necessarily need NAT for REDIRECT, so we might actually have
an ip6tables REDIRECT some day. Check out the current TPROXY patches
for an example how to do it without NAT in case you're interested.

-

Previous thread: NCQ/TCQ performance review (was: SATA RAID5 speed drop of 100 MB/s) by Al Boldi on Sunday, June 24, 2007 - 7:49 am. (1 message)

Next thread: [PATCH] [RFC] Adjust queue unplugging and congestion limits by Patrick Mau on Sunday, June 24, 2007 - 10:27 am. (2 messages)