Hi, I have been looking at network stack timer optimization for=20 power saving in embedded ARM environment, basically trying to=20 avoid as many wakeups as possible. I have changed several=20 timers in the network stack into deferred ones, i.e. they do=20 not wake up the device from low power modes but instead they=20 are deferred until next wakeup from some other source, like=20 another (non-deferred) timer or some I/O. Attached a patch=20 about the changes I've done, is something like this safe to do? -Tero=
Hi Tero When tcp communications are active, we setup a timer for *every* frame we receive or we send. These timers wont be deferrable anyway. delaying one wakeup every 60 seconds (if I take your net/ipv4/route.c change) wont change that much power savings, or did I missed something ? On big routers, we need to set ip_rt_gc_interval from 60 seconds to one second, in order to perform an effective garbage collection. So, if we use a deferred timer and : schedule_delayed_work(&expires_work, HZ); How many times worker will be started every minute ? --
Hi Eric, I think big routers do not enter any low power states, due to heavy network traffic keeping them busy. Even if they do enter low power mode, I guess network HW will basically wake them up rather quickly causing the delayed work to be executed approximately around the time (or probably exactly at the time) it was scheduled. I might be wrong here, as I do not really know too much about network router power management. Here is some data I grabbed from /proc/timer_stats before doing these changes (just added the network stack stuff here and calculated expiry rates.) I already changed most of the timers to deferrable in this sample, and also made a hack to show workqueues properly. Device was basically just idling during this measurement, not doing any frequent net communication. Timer Stats Version: v0.2 Sample period: 59672.362 s 7455D, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer) [neighbour.c, rate 1 per 8s] 995, 0 workqueue queue_delayed_work (rt_worker_func) [route.c, rate 1 per min] 498D, 1 swapper inet_initpeers (peer_check_expire) [inetpeer.c, rate 1 per 2 min] 99D, 1 swapper flow_cache_init (flow_cache_new_hashrnd) [flow.c, rate 1 per 10 min] 99D, 1 swapper inet_frags_init (inet_frag_secret_rebuild) [inet_fragment.c, rate 1 per 10 min] -Tero--
Here is the patch I cooked and tested on a machine where ip_rt_gc_interval is set to minimal value (1 second), where equilibrium depends on garbage collection done in time. I found that delayed timers could be *really* delayed so I think we must take into account the elapsed time (in jiffies) between two rt_check_expire() calls, to "guarantee" a full scan of rt cache in a ip_rt_gc_timeout period. Not for inclusion, as undergoing work is happening in this function for a bug correction. I'll redo the patch later once stabilized. [PATCH] net: use a deferred timer in rt_check_expire For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire() As some big routers cache equilibrium depends on garbage collection done in time, we take into account elapsed time between two rt_check_expire() invocations to adjust the amount of slots we have to check. Based on an initial idea and patch from Tero Kristo Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Tero Kristo <tero.kristo@nokia.com> --- net/ipv4/route.c | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index c4c60e9..b2c6793 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -131,8 +131,8 @@ static int ip_rt_min_advmss __read_mostly = 256; static int ip_rt_secret_interval __read_mostly = 10 * 60 * HZ; static int rt_chain_length_max __read_mostly = 20; -static void rt_worker_func(struct work_struct *work); -static DECLARE_DELAYED_WORK(expires_work, rt_worker_func); +static struct delayed_work expires_work; +static unsigned long expires_ljiffies; /* * Interface to generic destination cache. @@ -787,9 +787,12 @@ static void rt_check_expire(void) struct rtable *rth, **rthp; unsigned long length = 0, samples = 0; unsigned long sum = 0, sum2 = 0; + unsigned long delta; u64 mult; - mult = ((u64)ip_rt_gc_interval) << rt_hash_log; + delta = jiffies - ...
