[PATCH] net: use a deferred timer in rt_check_expire

Previous thread: PIM-SM namespace changes by David Miller on Monday, May 18, 2009 - 10:24 pm. (6 messages)

Next thread: [PATCH] IPv6: set RTPROT_KERNEL to initial route by Jean-Mickael Guerin on Tuesday, May 19, 2009 - 12:56 am. (2 messages)
From: Tero.Kristo
Date: Tuesday, May 19, 2009 - 1:13 am

Hi,

I have been looking at network stack timer optimization for=20
power saving in embedded ARM environment, basically trying to=20
avoid as many wakeups as possible. I have changed several=20
timers in the network stack into deferred ones, i.e. they do=20
not wake up the device from low power modes but instead they=20
are deferred until next wakeup from some other source, like=20
another (non-deferred) timer or some I/O. Attached a patch=20
about the changes I've done, is something like this safe to do?

-Tero=
From: Eric Dumazet
Date: Tuesday, May 19, 2009 - 2:04 am

Hi Tero


When tcp communications are active, we setup a timer for *every* frame
we receive or we send. These timers wont be deferrable anyway.

delaying one wakeup every 60 seconds (if I take your net/ipv4/route.c change)
wont change that much power savings, or did I missed something ?

On big routers, we need to set ip_rt_gc_interval from 60 seconds to one second,
in order to perform an effective garbage collection.

So, if we use a deferred timer and :

schedule_delayed_work(&expires_work, HZ);

How many times worker will be started every minute ?


--

From: Tero.Kristo
Date: Tuesday, May 19, 2009 - 2:46 am

Hi Eric,



I think big routers do not enter any low power states, due to heavy network traffic keeping them busy. Even if they do enter low power mode, I guess network HW will basically wake them up rather quickly causing the delayed work to be executed approximately around the time (or probably exactly at the time) it was scheduled. I might be wrong here, as I do not really know too much about network router power management.

Here is some data I grabbed from /proc/timer_stats before doing these changes (just added the network stack stuff here and calculated expiry rates.) I already changed most of the timers to deferrable in this sample, and also made a hack to show workqueues properly. Device was basically just idling during this measurement, not doing any frequent net communication.

Timer Stats Version: v0.2
Sample period: 59672.362 s
7455D,     1 swapper          neigh_table_init_no_netlink (neigh_periodic_timer) [neighbour.c, rate 1 per 8s]
  995,     0 workqueue        queue_delayed_work (rt_worker_func) [route.c, rate 1 per min]
 498D,     1 swapper          inet_initpeers (peer_check_expire) [inetpeer.c, rate 1 per 2 min]
  99D,     1 swapper          flow_cache_init (flow_cache_new_hashrnd) [flow.c, rate 1 per 10 min]
  99D,     1 swapper          inet_frags_init (inet_frag_secret_rebuild) [inet_fragment.c, rate 1 per 10 min]

-Tero--

From: Eric Dumazet
Date: Tuesday, May 19, 2009 - 11:56 am

Here is the patch I cooked and tested on a machine where ip_rt_gc_interval 
is set to minimal value (1 second), where equilibrium depends on garbage collection
done in time.

I found that delayed timers could be *really* delayed so I think we must take
into account the elapsed time (in jiffies) between two rt_check_expire()
calls, to "guarantee" a full scan of rt cache in a ip_rt_gc_timeout period.


Not for inclusion, as undergoing work is happening in this function
for a bug correction. I'll redo the patch later once stabilized.

[PATCH] net: use a deferred timer in rt_check_expire

For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire()

As some big routers cache equilibrium depends on garbage collection done in time,
we take into account elapsed time between two rt_check_expire() invocations 
to adjust the amount of slots we have to check.

Based on an initial idea and patch from Tero Kristo

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Tero Kristo <tero.kristo@nokia.com>
---
 net/ipv4/route.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..b2c6793 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -131,8 +131,8 @@ static int ip_rt_min_advmss __read_mostly	= 256;
 static int ip_rt_secret_interval __read_mostly	= 10 * 60 * HZ;
 static int rt_chain_length_max __read_mostly	= 20;
 
-static void rt_worker_func(struct work_struct *work);
-static DECLARE_DELAYED_WORK(expires_work, rt_worker_func);
+static struct delayed_work expires_work;
+static unsigned long expires_ljiffies;
 
 /*
  *	Interface to generic destination cache.
@@ -787,9 +787,12 @@ static void rt_check_expire(void)
 	struct rtable *rth, **rthp;
 	unsigned long length = 0, samples = 0;
 	unsigned long sum = 0, sum2 = 0;
+	unsigned long delta;
 	u64 mult;
 
-	mult = ((u64)ip_rt_gc_interval) << rt_hash_log;
+	delta = jiffies - ...
Previous thread: PIM-SM namespace changes by David Miller on Monday, May 18, 2009 - 10:24 pm. (6 messages)

Next thread: [PATCH] IPv6: set RTPROT_KERNEL to initial route by Jean-Mickael Guerin on Tuesday, May 19, 2009 - 12:56 am. (2 messages)