Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Willy Tarreau <w@...>
Cc: Mike Galbraith <efault@...>, <linux-kernel@...>, Peter Zijlstra <a.p.zijlstra@...>, Ingo Molnar <mingo@...>
Date: Sunday, February 10, 2008 - 1:29 am

On Sat, Feb 09, 2008 at 05:19:57PM +0100, Willy Tarreau wrote:

Right. I should have tinkered a bit more with it before I posted it, the
version posted had too little going on in the first loop and thus got
hung up on the second busywait loop instead.

I did a bunch of runs with various loop sizes. Basically, what seems to
happen is that the older kernels are quicker at rebalancing a new thread
over to the other cpu, while newer kernels let them share the same cpu
longer (and thus increases wall clock runtime).

All of these are built with gcc without optimization, larger loop size
and an added sched_yield() in the busy-wait loop at the end to take that
out as a factor. As you've seen yourself, runtimes can be quite noisy
but the trends are quite clear anyway. All of these numbers were
collected with default scheduler runtime options, same kernels and
configs as previously posted.

Loop to 1M:
2.6.22		time 4015 ms
2.6.23		time 4581 ms
2.6.24		time 10765 ms
2.6.24-git19	time 8286 ms

2M:
2.6.22		time 7574 ms
2.6.23		time 9031 ms
2.6.24		time 12844 ms
2.6.24-git19	time 10959 ms

3M:
2.6.22		time 8015 ms
2.6.23		time 13053 ms
2.6.24		time 16204 ms
2.6.24-git19	time 14984 ms

4M:
2.6.22		time 10045 ms
2.6.23		time 16642 ms
2.6.24		time 16910 ms
2.6.24-git19	time 16468 ms

5M:
2.6.22		time 12055 ms
2.6.23		time 21024 ms
<missed 2.6.24 here>
2.6.24-git19	time 16040 ms

10M:
2.6.22		time 24030 ms
2.6.23		time 33082 ms
2.6.24		time 34139 ms
2.6.24-git19	time 33724 ms

20M:
2.6.22		time 50015 ms
2.6.23		time 63963 ms
2.6.24		time 65100 ms
2.6.24-git19	time 63092 ms

40M:
2.6.22		time 94315 ms
2.6.23		time 107930 ms
2.6.24		time 113291 ms
2.6.24-git19	time 110360 ms

So with more work per thread, the differences become less but they're
still there. At the 40M loop, with 500 threads it's quite a bit of
runtime per thread.


It seems generally unfortunate that it takes longer for a new thread to
move over to the second cpu even when the first is busy with the original
thread. I can certainly see cases where this causes suboptimal overall
system behaviour.

I agree that the testcase is highly artificial. Unfortunately, it's
not uncommon to see these kind of weird testcases from customers tring
to evaluate new hardware. :( They tend to be pared-down versions of
whatever their real workload is (the real workload is doing things more
appropriately, but the smaller version is used for testing). I was lucky
enough to get source snippets to base a standalone reproduction case on
for this, normally we wouldn't even get copies of their binaries.


-Olof
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-..., Olof Johansson, (Sun Feb 10, 1:29 am)