On Wed, Aug 25, 2010 at 6:17 PM, Chetan Ahuja <chetan.ahuja@gmail.com> wrote:
There is another potential bug that may cause zero power.
I actually saw similar problem a while back, while I was playing
around with rt_avg stuff. But, I have only seen the problem with my
modified kernel (playing with softirq accouting part) and not with
vanilla kernel. May be your workload includes some RT tasks that
triggers this in vanilla kernel as well.
This is what I saw happening in my case:
scale_rt_power()
- In some corner case total ends up being less than rq->rt_avg.
- As a result, available is negative
- scale_rt_power returns negative value
update_cpu_power()
- returns negative power
When we later sum up power across groups, we can end up with a group
having negative power making the sum power a zero for a parent group,
causing div_by_zero.
I now have this change in my tree along with my softirq+rt_avg changes
and haven't seen the failure since.
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ae2a225..8c31e38 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2858,6 +2858,9 @@ unsigned long scale_rt_power(int cpu)
u64 total, available;
total = sched_avg_period() + (rq->clock - rq->age_stamp);
+ if (unlikely(total < rq->rt_avg))
+ return 0;
+
available = total - rq->rt_avg;
if (unlikely((s64)total < SCHED_LOAD_SCALE))
The theory I had on why total can be less than rq->rt_avg.
- (Time T0) update_curr calls sched_rt_avg_update and delta accumulation starts
- (Time T1) lb scale_rt_power calls sched_avg_update() and we say
there was aging at this point (while rt_avg doesn't have delta added
yet)
- (Time T2) timer int -> update_curr calls sched_rt_avg_update() with
no aging of timestamp, but delta added to rt_avg.
- Time passes
- (Time T3) lb calls scale_latc_power again, and at this point
total = period + (T3 - T1)
rt_avg = halved at T1 (which can be ~period) + T3 - T0
which can be slightly greater than total, spoiling the math from there on.
I tried reproducing the problem with vanilla kernel and couldn't
reproduce it. May be your workload is triggering this some how?
Also, Suresh did recent fixes to rt_avg and this may not be a problem
after that. I Haven't looked at this case closely after that recent
change.
Thanks,
Venki
--