Re: [PATCH] ipv4: remove ip_rt_secret timer (v2)

Previous thread: Re: [net-next-2.6 V5 PATCH 2/3] Add ndo_{set|get}_vf_port_profile op support for enic dynamic vnics by Scott Feldman on Thursday, May 6, 2010 - 9:25 am. (2 messages)

Next thread: [PATCH] net: deliver skbs on inactive slaves to exact matches by John Fastabend on Thursday, May 6, 2010 - 10:24 am. (4 messages)
From: Neil Horman
Date: Thursday, May 6, 2010 - 10:16 am

A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/netns/ipv4.h |    1 
 net/ipv4/route.c         |  108 -----------------------------------------------
 2 files changed, 2 insertions(+), 107 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index ae07fee..d68c3f1 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -55,7 +55,6 @@ struct netns_ipv4 {
 	int sysctl_rt_cache_rebuild_count;
 	int current_rt_cache_rebuild_count;
 
-	struct timer_list rt_secret_timer;
 	atomic_t rt_genid;
 
 #ifdef CONFIG_IP_MROUTE
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a947428..ffd3da1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -129,7 +129,6 @@ static int ip_rt_gc_elasticity __read_mostly	= 8;
 static int ip_rt_mtu_expires __read_mostly	= 10 * 60 * HZ;
 static int ip_rt_min_pmtu __read_mostly		= 512 + 20 + 20;
 static int ip_rt_min_advmss __read_mostly	= 256;
-static int ip_rt_secret_interval __read_mostly	= 10 * 60 * HZ;
 static int rt_chain_length_max __read_mostly	= 20;
 
 static struct delayed_work expires_work;
@@ -918,32 +917,11 @@ void rt_cache_flush_batch(void)
 	rt_do_flush(!in_softirq());
 }
 
-/*
- * We change rt_genid and let gc do the cleanup
- */
-static void rt_secret_rebuild(unsigned long __net)
-{
-	struct net *net = (struct net *)__net;
-	rt_cache_invalidate(net);
-	mod_timer(&net->ipv4.rt_secret_timer, jiffies + ip_rt_secret_interval);
-}
-
-static void rt_secret_rebuild_oneshot(struct net ...
From: Eric Dumazet
Date: Thursday, May 6, 2010 - 10:32 am

Nice cleanup try Neil, but this gives to attackers more time to hit the
cache (infinite time should be enough as a matter of fact ;) )

Hints : 

- What is the initial value of rt_genid ?

- How/When is it changed (full 32 bits are changed or small
perturbations ? check rt_cache_invalidate())



--

From: Neil Horman
Date: Thursday, May 6, 2010 - 11:02 am

Not sure I follow what your complaint is.  I get that this gives attackers
plenty of time to try to attack the cache, but thats rather the point of the
statistics gathering for the cache, and why I don't think we need the secret
timer any more.   With the statistical analysis we do on the route cache every
gc cycle, we can tell if an attacker has guessed our rt_genid value, and is
making any chains in the cache abnormally long.  Thats when we do the rebuild,
modifying the rt_genid, forcing the attacker to re-discover it (which should be
difficult).  Theres no need to change this periodically if you're not being
/*
 * Pertubation of rt_genid by a small quantity [1..256]
 * Using 8 bits of shuffling ensure we can call rt_cache_invalidate()
 * many times (2^24) without giving recent rt_genid.
 * Jenkins hash is strong enough that litle changes of rt_genid are OK.
 */
static void rt_cache_invalidate(struct net *net)
{
        unsigned char shuffle;

        get_random_bytes(&shuffle, sizeof(shuffle));
        atomic_add(shuffle + 1U, &net->ipv4.rt_genid);
}

Clearly, its small changes.  To paraphrase the comment, Changes to rt_genid are
small enough to be confident that we don't repetatively use a gen_id often, but
sufficiently random that attackers cannot easily guess the next gen_id based on
the current value.  Both the timer and the statistics code use this invalidation
technique previously, and the latter continues to do so.

I've not changed anything regarding how we
invalidate, only when we choose to invalidate.  Invalidation can lead to
slowdowns during routing, since it send frames through the slow path after an
invalidation.  It behooves us to avoid preforming this invalidation without
need, and since we have a mechanism in place to do that invalidation specfically
when we need to, lets get rid of the code that handles that, and make it a bit
cleaner.  If there are users that feel strongly that they need to defend against
potential attacks by periodically changing their ...
From: Eric Dumazet
Date: Thursday, May 6, 2010 - 11:33 am

I have some customers that will simply kill me if their routing cache is
disabled by a smart attack, slowing down their server by a 4x factor.

I know its possible, it has been done.

For a quiet machine possible rt_genid values range are known from
attacker, and hash size is also known. Thats really too easy for the bad
guys...

Neil, I think your cleanup should stay a cleanup for the moment, or you
must make sure rt_genid initial value is not 0 (read your patch
again...)

I agree we dont need anymore the complex timer logic. We could keep the
secret_interval (default to 0 if you really want) and force a
rt_cache_invalidate() call once in a while from the periodic
rt_check_expire() for example.



--

From: Neil Horman
Date: Thursday, May 6, 2010 - 12:54 pm

Ok, I can understand that, but I don't think a single user profile needs to
dictate policy here.  the statistics code has a configurable threshold for where
to disable caching, so I would think you can just set that to be high.  And if
you need even more than that, you can do as I suggested, adding a:
echo -1 > /proc/sys/net/ipv4/route/flush
to a cron job on a short interval.  That will preform the _exact_ same operation
that the in-kernel timer did previously.  And the other 99% of our users don't
Well, ok, I can buy your argument, but I think the problem you are describing is
orthogonoal to what my change here does.  If its too easy for attackers to guess
our genid secret, then we need to make it more complex to guess, not force
Yeah, I get that we start the gen_id at 0, that doesn't come from this change,
its always that way in the net_alloc initalization.  I certainly don't have a
problem during inialization starting with a randomization of that value.  I'll
Doing that doesn't solve my aim however, which is to avoid performing rt_genid
updates when no one is attacking you at all.  I completely agree that we can
start the gen_id at some random value (by forcing an initial invalidation),
however.  Beyond that however, if someone is managing to guess our secret value,
then we need to make our secret value more complex to determine.  Perhaps given
the reduction in the number of times we need to iterate our gen_id with the
timer gone, we can use something more heavyweight to determine the the hash
secret (the cprng perhaps?).

Regards
--

From: Eric Dumazet
Date: Thursday, May 6, 2010 - 1:10 pm

Secrets that dont change are known to be honey pots for hackers.

I just dont see why we want to risk security regressions for something
that proved to work well.

Cache invalidation is just a genid change nowadays, and dont have side
effects.

Considering we do cache invalidation when routes are changed anyway, I
dont get why we should avoid the invalidation once every xxx seconds...

If you believe this cache invalidation has problems, maybe we should
address them and not hide them ?



--

From: Neil Horman
Date: Thursday, May 6, 2010 - 1:25 pm

Because we have two ways of doing the same thing now, and I don't see why we
should maintain code for both.  I get that a timer based invalidation works
I disagree with this, changing a genid in and of itself is fast, yes, but it
creates a need for the cache to get repopulated, sending packets through the
slow routing path.  On high volume systems this causes a performance
degradation.  The timer approach makes that a periodic degradation, one that I
would like to avoid if possible.

I get that hackers like secrets to stay unchanged so that they can figure out
what they are.  Its not like we're leaving ourselves vulnerable here, we're just
rebuilding only when we need it, not every X seconds.  And if someone is
_really_ in need of a periodic rebuild, and can cope with the performance hit,
then they can still do that from user space, as I've pointed out.  We just don't
Who says routes are going to change that often?  I know you dont believe that a
former is a substitute for the latter.  As for why we should avoid periodic
Now you're just being intentionally obtuse.  Eric, you know perfectly good and
well what my reasons are for wanting to remove the rt_secret timer.  Its why we
did the statistical analysis code in the first place.  There just not a large
need for it.  If you want to do periodic invalidation, fine, do it.  Just do it
in user space.  We have an on-demand strategy in the kernel that has been
working well for quite some time, and is superior in performance for 99% of the
use cases out there.  So lets lighten the maintenence workload for the code
thats not strictly needed anymore by getting rid of it.

--

From: Neil Horman
Date: Thursday, May 6, 2010 - 1:29 pm

Version 2 of this patch, taking Erics comment about making the rt_genid non-zero
when a netns is created.  This makes sense, and helps prevent attackers from
guessing our initial secret value



A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/netns/ipv4.h |    1 
 net/ipv4/route.c         |  111 ++++-------------------------------------------
 2 files changed, 11 insertions(+), 101 deletions(-)


diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index ae07fee..d68c3f1 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -55,7 +55,6 @@ struct netns_ipv4 {
 	int sysctl_rt_cache_rebuild_count;
 	int current_rt_cache_rebuild_count;
 
-	struct timer_list rt_secret_timer;
 	atomic_t rt_genid;
 
 #ifdef CONFIG_IP_MROUTE
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a947428..e55a066 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -129,7 +129,6 @@ static int ip_rt_gc_elasticity __read_mostly	= 8;
 static int ip_rt_mtu_expires __read_mostly	= 10 * 60 * HZ;
 static int ip_rt_min_pmtu __read_mostly		= 512 + 20 + 20;
 static int ip_rt_min_advmss __read_mostly	= 256;
-static int ip_rt_secret_interval __read_mostly	= 10 * 60 * HZ;
 static int rt_chain_length_max __read_mostly	= 20;
 
 static struct delayed_work expires_work;
@@ -918,32 +917,11 @@ void rt_cache_flush_batch(void)
 	rt_do_flush(!in_softirq());
 }
 
-/*
- * We change rt_genid and let gc do the cleanup
- */
-static void rt_secret_rebuild(unsigned long __net)
-{
-	struct net *net = (struct net ...
From: Eric Dumazet
Date: Thursday, May 6, 2010 - 2:08 pm

I am _sorry_ to be such a paranoiac guy.

Could you please feed more than 8 bits here ?

like :

get_random_bytes(&net->ipv4.rt_genid, sizeof(net->ipv4.rt_genid));

There is no need to comment this in the code, this kind of rnd init is
very common in net tree.





--

From: Neil Horman
Date: Friday, May 7, 2010 - 12:55 pm

Hey-
	Sorry for the delay, but I got interested in Erics suggestion of
changing how we update rt_genid, and had a few thoughts.  Heres version 3 of
this patch:

Change Notes:

1) Removed the secret_interval binary interface entry in the list (forgot to do
that before)

2) Took Erics Suggestion to change the update for net->ipv4.rt_genid.  Now
instead of doing a small incremental change, we simply grab 32 new random bits.

3) The change in (2) got me thinking that part of the reason we used the Jenkins
hash in rt_genid was to ensure non-repetion of id's in a short time period
(which is important, so as not to inadvertently reuse route cache entries that
should be getting expired).  While extra randomness makes it harder for
attackers to guess the secret, it makes it possible to return to previously
guessed values as well (if they're lucky).  As such, I created a configurable
option, CONFIG_GENID_DUPCHECK.  With this option on, the low order 8 bits of the
genid are replaced with a rolling counter, that increments on each new genid.
This creates in effect, a 256 deep list of previously used genid values.  In
rt_drop we compare the genids for duplicates.  If we find that 2 genid values
have different indexes, but idential remaining bits, they are noted as a repeat
genid, and we call rt_cache_invalidate to generate a new genid and avoid the
duplication problem.


I've done some testing with this patch here, and it seems to work well.

Summary

A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.


Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/netns/ipv4.h |  ...
From: Eric Dumazet
Date: Friday, May 7, 2010 - 2:04 pm

My suggestion was to initialize _once_ at boot time, the _full_ 32bits.

Not to change the perturbations, they are very fine, and need no extra
CONFIG_SOME_MAGICAL_SWITCH.

We have a guarantee that no duplicates are delivered unless you perform
2^24 generations in a short period of time.

But because you want to change full 32bits, you need a complex dupcheck

This is not necessary and over engineering if you ask me.

You now rely on probabilistic rules, and depends on get_random_bytes()
be really random, or a new CONFIG setting...

What exact problem do you want to solve Neil ?

Please submit your initial patch, with the small changes :

1) Remove the secret_interval binary interface entry in the list 

2) Initialize full 32bits at struct net init time.

Thanks



--

From: Neil Horman
Date: Friday, May 7, 2010 - 4:15 pm

Apologies, my read of your statement was that you wanted to randomize the genid
every iteration, not just the first, to avoid the genid n+1 being within 256 of
I can't say I disagree, but I was looking at this change based on your
You know good and well what I'm trying to do here, don't be thick.  The only
reason I was making changes to the genid in the first place was because you were
asking for them.  I'm more than happy to make a simpler version of this.
Yeah, ok.  I'll repost on monday.


Thanks
Neil



--

From: Neil Horman
Date: Friday, May 7, 2010 - 6:01 pm

Hey, Had a moment tonight so I spun version 4 of the patch.

Change notes:
1) Redid the initialization, in light of Erics clarification.  We randomly seed
all 32 bits of the rt_genid for a netns, but still just do 256 byte
modifications on cache invalidations

2) got rid of the dup checking crap.


Summary:

A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/netns/ipv4.h |    1 
 kernel/sysctl_binary.c   |    1 
 net/ipv4/route.c         |  108 +++--------------------------------------------
 3 files changed, 8 insertions(+), 102 deletions(-)


diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index ae07fee..d68c3f1 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -55,7 +55,6 @@ struct netns_ipv4 {
 	int sysctl_rt_cache_rebuild_count;
 	int current_rt_cache_rebuild_count;
 
-	struct timer_list rt_secret_timer;
 	atomic_t rt_genid;
 
 #ifdef CONFIG_IP_MROUTE
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 5903057..937d31d 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -224,7 +224,6 @@ static const struct bin_table bin_net_ipv4_route_table[] = {
 	{ CTL_INT,	NET_IPV4_ROUTE_MTU_EXPIRES,		"mtu_expires" },
 	{ CTL_INT,	NET_IPV4_ROUTE_MIN_PMTU,		"min_pmtu" },
 	{ CTL_INT,	NET_IPV4_ROUTE_MIN_ADVMSS,		"min_adv_mss" },
-	{ CTL_INT,	NET_IPV4_ROUTE_SECRET_INTERVAL,		"secret_interval" },
 	{}
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a947428..dea3f92 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ ...
From: Eric Dumazet
Date: Friday, May 7, 2010 - 11:36 pm

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Sorry for the confusion Neil, thanks !


--

From: David Miller
Date: Saturday, May 8, 2010 - 1:58 am

From: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks guys!
--

From: Neil Horman
Date: Saturday, May 8, 2010 - 5:54 am

Its my fault, it was more clear to me on a second reading.  Apologies!
--

Previous thread: Re: [net-next-2.6 V5 PATCH 2/3] Add ndo_{set|get}_vf_port_profile op support for enic dynamic vnics by Scott Feldman on Thursday, May 6, 2010 - 9:25 am. (2 messages)

Next thread: [PATCH] net: deliver skbs on inactive slaves to exact matches by John Fastabend on Thursday, May 6, 2010 - 10:24 am. (4 messages)