On Sat, Sep 11, 2010 at 2:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
quoted text > On Fri, 2010-09-10 at 17:28 -0700, Andrew Morton wrote:
>> Where is this scheduler bug? Did it occur because someone didn't
>> understand __smp_call_function_single()? Or did it occur because the
>> scheduler code is doing something which its implementors did not expect
>> or intend?
>
>
> It comes from 83cd4fe2 (sched: Change nohz idle load balancing logic to
> push model), where nohz_balance_kick() simply needs to kick the
> designated driver into action.
>
> I take it Venki assumed __smp_call_function_single() works like
> smp_call_function_single() where you can use it for the local cpu as
> well.
Yes. This was an oversight while moving from using send_remote_softirq
to using __smp_call_function_single.
Also, as we don't have rq lock around this point, it seems possible
that the CPU that was busy and wants to kick idle load balance on
remote CPU, could have become idle and nominated itself as idle load
balancer.
Below patch looks good to me.
Acked-by: Venkatesh Pallipadi <venki@google.com>
I guess, we also need a WARN_ON_ONCE for (cpu == smp_processor_id())
in __smp_call_function_single(), as the eventual result of this bug
that Heiko saw was a deadlock
Thanks,
Venki
quoted text >
> I guess we could do something like the below as well, which would be
> slightly faster since we don't actually need to call raise_softirq()
> since we already set it for self a bit earlier in order to have it do
> the regular load-balance actions.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/sched_fair.c | 10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9b5b4f8..c8ca1cb 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3404,11 +3404,13 @@ static void nohz_balancer_kick(int cpu)
> }
>
> if (!cpu_rq(ilb_cpu)->nohz_balance_kick) {
> - struct call_single_data *cp;
> -
> cpu_rq(ilb_cpu)->nohz_balance_kick = 1;
> - cp = &per_cpu(remote_sched_softirq_cb, cpu);
> - __smp_call_function_single(ilb_cpu, cp, 0);
> +
> + if (ilb_cpu != cpu) {
> + struct call_single_data *cp;
> + cp = &per_cpu(remote_sched_softirq_cb, cpu);
> + __smp_call_function_single(ilb_cpu, cp, 0);
> + }
> }
> return;
> }
>
>
--
unsubscribe notice To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Messages in current thread:
Re: [PATCH] generic-ipi: fix deadlock in __smp_call_functi ... , Venkatesh Pallipadi , (Sat Sep 11, 9:42 am)