Re: [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: David Miller
Date: Wednesday, April 23, 2008 - 5:29 am

From: David Miller <davem@davemloft.net>
Date: Wed, 23 Apr 2008 03:55:44 -0700 (PDT)


Ok, Ingo, none of your patches fix even the initial buggy
changeset, for reference:

commit 27ec4407790d075c325e1f4da0a19c56953cce23
Author: Ingo Molnar <mingo@elte.hu>
Date:   Thu Feb 28 21:00:21 2008 +0100

    sched: make cpu_clock() globally synchronous
    
    Alexey Zaytsev reported (and bisected) that the introduction of
    cpu_clock() in printk made the timestamps jump back and forth.
    
    Make cpu_clock() more reliable while still keeping it fast when it's
    called frequently.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

I checked out a tree to the changeset before this one, just
to double check, and there are no problems.

I add that changeset and I get softlockup warnings like crazy
in my logs.

I added your "move touch_softlockup_watchdog() earlier in
tick_nohz_update_jiffies()" patch:

--------------------
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -133,8 +133,6 @@ void tick_nohz_update_jiffies(void)
 	if (!ts->tick_stopped)
 		return;
 
-	touch_softlockup_watchdog();
-
 	cpu_clear(cpu, nohz_cpu_mask);
 	now = ktime_get();
 	ts->idle_waketime = now;
@@ -142,6 +140,8 @@ void tick_nohz_update_jiffies(void)
 	local_irq_save(flags);
 	tick_do_update_jiffies64(now);
 	local_irq_restore(flags);
+
+	touch_softlockup_watchdog();
 }
 
 void tick_nohz_stop_idle(int cpu)
--------------------

and still I get mountains of softlockup messages, see first
attachment, below.

I then added your patch, just to make sure, which adds the
missing prev_cpu_time assignment, specifically:

--------------------
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1001,6 +1001,8 @@ unsigned long long notrace cpu_clock(int
 	if (unlikely(delta_time > time_sync_thresh))
 		time = __sync_cpu_clock(time, cpu);
 
+	per_cpu(prev_cpu_time, cpu) = time;
+
 	return time;
 }
 EXPORT_SYMBOL_GPL(cpu_clock);

--------------------

Same problem, see second attachment, below.

But, to be honest, this is starting to become an exercise in futility.
None of your patches fix anything.  Something is buggy about how your
new cpu_clock() stuff works.  I'm trying to figure out when you're
going to finally at least go: "I can't figure out the problem, let's
revert until I have a better idea."

FWIW, I have a perfect globally synchronized TICK source on this
system.

And even with this fix there are so many other regressions that cause
similar spurious socklockup reports and even full on cpu hangs, all
seemingly added by the sched tree.

In my opinion this sched tree merge the other day is one of THE WORST
merges in recent memory.  Linus's tree is currently a sizzling pile of
poo, I can't get any of my own merge work done, and I'm stuck here
hunting down regressions you've added because of it.  :-/

We can't even get past one of the regressions added by that tree, and
it's been two days of my working on this non-stop.
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Soft lockup regression from today's sched.git merge., David Miller, (Tue Apr 22, 1:59 am)
Re: Soft lockup regression from today's sched.git merge., Peter Zijlstra, (Tue Apr 22, 5:45 am)
Re: Soft lockup regression from today's sched.git merge., David Miller, (Tue Apr 22, 10:42 pm)
Re: Soft lockup regression from today's sched.git merge., Dhaval Giani, (Wed Apr 23, 12:32 am)
Re: [patch] softlockup: fix false positives on nohz if CPU ..., David Miller, (Wed Apr 23, 5:29 am)
Re: Soft lockup regression from today's sched.git merge., Rafael J. Wysocki, (Tue May 6, 3:41 pm)
Re: Soft lockup regression from today's sched.git merge., Rafael J. Wysocki, (Wed May 7, 11:56 am)