I was running some tests with the "fairtest" testcase and noticed that
successive runs could give wildly different results.
I was originally using the tip/master tree as of Sep 16, but I also
confirmed the behaviour with Linus' tree as of Sep 14 (with the
__load_balance_iterator() fix applied). The same behaviour is present
in both cases.
I'm using the test config listed at the bottom. It's pretty
straightforward.
The first run gave the following results. As expected, the system
picked a static task distribution and didn't migrate tasks during the test.
group actual(%) expected(%) avg latency(ms) max_latency(ms)
1 33.31(33.33/33.2 30.00 23/23 37/37
2 36.29 40.00 5 25
3 30.40(27.40/33.40) 30.00 22/23 60/40
On the second run, the task distribution is almost perfect, but the
system was only using one of the two cpus as seen by the difference
between actual and expected cpu time.
Warning, actual cpu time different than expected. actual: 10033.011108,
expected: 20000.000000
group actual(%) expected(%) avg latency(ms) max_latency(ms)
1 0.24(30.59/29.88) 30.00 26/27 68/58
2 39.87 40.00 20 36
3 29.89(29.87/29.91) 30.00 28/27 47/60
Any ideas what's going on?
Chris
test config file:
#delay (secs)
1
#duration (secs)
10
#groupname,share,numhogs
1,750,n
2,1000,1
3,750,n
--
This behaviour (that load balancing is messed up) is now almost continuous with both current tip/master and current Linus git. On the first test after booting, it seems to work okay (although there are still issues with fairness). On every subsequent test, fairness is good but it only uses one of the two cpus. Also, building a kernel with "-j10" results in one cpu being mostly idle while the other one is 100% busy. It used to be both 100% busy--if I get time today I may try bisecting it. Chris --
It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load balancing problem go away and causes all cpus to be used. With this option enabled, the problem seems to be present as far back as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26 doesn't have ftrace). I have no idea why turning on dynamic ftrace would affect load balancing behaviour, but it's very repeatable. The very first test run after booting works fine, and all successive runs fail to balance properly. Chris --
very weird. Would be very nice to figure it out. and in tip/master we dont have the 'ftraced' kernel-patching kernel thread anymore, so ftrace should be passive by all means. OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in the .config, or also activating it via /debug/tracing/current_tracer? Ingo --
Just enabling it in the .config is enough to trigger the behaviour change. I'm not explicitly activating any traces. Chris --
ok, that would be a clear ftrace bug i guess? Ingo --
It's either an ftrace bug or a fragile load balancer bug. I wonder if it's related somehow to the stop_machine() call in ftrace_dynamic_init()? Chris --
