[REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]

Previous thread: [PATCH] sunrpc - fixup userspace buffer possible overrun v3 by Cyrill Gorcunov on Sunday, August 31, 2008 - 8:25 am. (3 messages)

Next thread: [RFC: 2.6 patch] remove the broken i82443bxgx_edac driver by Adrian Bunk on Sunday, August 31, 2008 - 9:10 am. (6 messages)
From: Avi Kivity
Date: Sunday, August 31, 2008 - 8:43 am

Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high 
values for the kvm process in top (30%-70% cpu), where one would 
normally expect 0%-1%.  Surprisingly, the per-cpu system counters show 
almost 100% idle, leading me to believe this is an accounting error and 
that the process does not actually consume this much cpu.

I bisected this to a scheduler change, namely

commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Sat May 3 18:29:28 2008 +0200

    sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
   
    this replaces the rq->clock stuff (and possibly cpu_clock()).
   
     - architectures that have an 'imperfect' hardware clock can set
       CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
   
     - the 'jiffie' window might be superfulous when we update tick_gtod
       before the __update_sched_clock() call in sched_clock_tick()
   
     - cpu_clock() might be implemented as:
   
         sched_clock_cpu(smp_processor_id())
   
       if the accuracy proves good enough - how far can TSC drift in a
       single jiffie when considering the filtering and idle hooks?
   
    [ mingo@elte.hu: various fixes and cleanups ]
   
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

Which is a bit too complex for me to work out.

Further information:
- the kvm thread which has the incorrect counter is the one that 
actually executes guest code
- this thread mostly sleeps in schedule(), as one would expect
- it is periodically woken up by a timer; perhaps the problem is that 
the process is sampled using the same timer, so it always shows as 
running (though I'd expect it to report 100% cpu in that case).

Any help will be appreciated (or provided).

-- 
error compiling committee.c: too many arguments to function

--

From: Parag Warudkar
Date: Sunday, August 31, 2008 - 11:09 am

Busted process accounting - This looks the same as
http://bugzilla.kernel.org/show_bug.cgi?id=11209 .
Please verify. Peter's patch in latest git stops showing "incorrect
looking" CPU usage but at least the process times are still wrong,
horribly.
In fact the CPU usage thing in -rc5 is likely also incorrect but I
need to analyze that bit a little more.

From Today's Git -

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND


That patch sounds like it had open questions?
Really giving this is a long standing bad regression, all the
offending patches should be reverted in absence of a fix, no?

Parag
--

From: Avi Kivity
Date: Monday, September 1, 2008 - 2:16 am

Yes, it looks very similar.  In my tests, %CPU is consistent with TIME; 
it's just not consistent with what's actually happening and with the 
global statistics.


-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, September 1, 2008 - 7:58 am

Avi, the below fixes it for me..


---
Subject: sched_clock: fix NOHZ interaction
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Sep 01 16:44:23 CEST 2008

If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when 
leaving nohz mode.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/time/tick-sched.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -162,6 +162,8 @@ void tick_nohz_stop_idle(int cpu)
 		ts->idle_lastupdate = now;
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
 		ts->idle_active = 0;
+
+		sched_clock_idle_wakeup_event(0);
 	}
 }
 
@@ -177,6 +179,7 @@ static ktime_t tick_nohz_start_idle(stru
 	}
 	ts->idle_entrytime = now;
 	ts->idle_active = 1;
+	sched_clock_idle_sleep_event();
 	return now;
 }
 


--

From: Avi Kivity
Date: Monday, September 1, 2008 - 9:17 am

For me too.  Thanks!

-- 
error compiling committee.c: too many arguments to function

--

From: Ingo Molnar
Date: Friday, September 5, 2008 - 9:13 am

applied to tip/sched/urgent - thanks!

	Ingo
--

Previous thread: [PATCH] sunrpc - fixup userspace buffer possible overrun v3 by Cyrill Gorcunov on Sunday, August 31, 2008 - 8:25 am. (3 messages)

Next thread: [RFC: 2.6 patch] remove the broken i82443bxgx_edac driver by Adrian Bunk on Sunday, August 31, 2008 - 9:10 am. (6 messages)