Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Willy Tarreau <w@...>
Cc: Mike Galbraith <efault@...>, <linux-kernel@...>, Peter Zijlstra <a.p.zijlstra@...>, Ingo Molnar <mingo@...>
Date: Sunday, February 10, 2008 - 3:00 am

On Sun, Feb 10, 2008 at 07:15:58AM +0100, Willy Tarreau wrote:

Hmm? I get that to be 227ms per loop, which is way more than a full
timeslice. Running the program took in the range of 2 minutes, so it's
110000 milliseconds, not microseconds.


The math doesn't add up for me. Even if it rebalanced at the end of
the first slice (i.e. after 1ms), that would be a 1ms penalty per
iteration. With 500 threads that'd be a total penalty of 500ms.


Right, likely because the long-running cases got stuck at the busy loop
at the end, which would end up aborting quicker if the other thread got
scheduled for just a bit. It was a mistake to post that variant of the
testcase, it's not as relevant and doesn't mimic the original workload I
was trying to mimic as well as if the first loop was made larger.


There's always the tradeoff of losing cache warmth whenever a thread is
moved, so I'm not sure if it's a good idea to always migrate it at
creation time. It's not a simple problem, really.


I believe it started out as a simple attempt to parallelize a workload
that sliced the problem too low, instead of slicing it in larger chunks
and have each thread do more work at a time. It did well on 2.6.22 with
almost a 2x speedup, but did worse than the single-treaded testcase on a
2.6.24 kernel.

So yes, it can clearly be handled through explanations and education
and fixen the broken testcase, but I'm still not sure the new behaviour
is desired.


-Olof
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-..., Olof Johansson, (Sun Feb 10, 3:00 am)