I have a simple way to reproduce this:
1. Boot the system, run top, confirm everything is normal
2. Run latencytop, and quit (I used version 0.5)
3. Run top, see 5124095h in the TIME column
For example:
6649 daemon 20 0 74500 8892 3796 S 6 0.2 0:00.03 debsecan
4255 root 20 0 8908 360 260 S 2 0.0 0:00.04 irqbalance
1 root 20 0 10332 692 580 S 0 0.0 5124095h init
The processes that get the 5124095h seem random, but there are plenty of
them (if you sort by 'T', the entire top display is filled with
processes running since 5124095h).
Best regards,
--Edwin
--
Indeed, and I don't even have CONFIG_LATENCYTOP set. It bisected to...
761b1d26df542fd5eb348837351e4d2f3bc7bffe is the first bad commit
commit 761b1d26df542fd5eb348837351e4d2f3bc7bffe
Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Thu Nov 12 13:33:45 2009 +0900
sched: Fix granularity of task_u/stime()
Originally task_s/utime() were designed to return clock_t but
later changed to return cputime_t by following commit:
commit efe567fc8281661524ffa75477a7c4ca9b466c63
Author: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Thu Aug 23 15:18:02 2007 +0200
It only changed the type of return value, but not the
implementation. As the result the granularity of task_s/utime()
is still that of clock_t, not that of cputime_t.
So using task_s/utime() in __exit_signal() makes values
accumulated to the signal struct to be rounded and coarse
grained.
This patch removes casts to clock_t in task_u/stime(), to keep
granularity of cputime_t over the calculation.
v2:
Use div_u64() to avoid error "undefined reference to `__udivdi3`"
on some 32bit systems.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiyou.wangcong@gmail.com
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com>
git bisect start
# good: [7f5e918e62cbc9ac27c2f47d3c3dd4b86f67ff0e] Linux 2.6.32.9
git bisect good 7f5e918e62cbc9ac27c2f47d3c3dd4b86f67ff0e
# bad: [baac35c4155a8aa826c70acee6553368ca5243a2] security: fix error return path in ima_inode_alloc
git bisect bad baac35c4155a8aa826c70acee6553368ca5243a2
# bad: [baac35c4155a8aa826c70acee6553368ca5243a2] security: fix error return path in ima_inode_alloc
git bisect bad baac35c4155a8aa826c70acee6553368ca5243a2
# bad: ...Oh lovely, and reverting that cures the issue? --
Haven't done so yet, will do later this afternoon. -Mike --
Wow, it's easy to reproduce. I'll check it later... Thanks, H.Seto --
Quick report:
The reason why this commit have bisected is because it changed
the type of time values from signed clock_t to unsigned cputime_t,
so that the following if-block become to be always taken:
> - stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
> - cputime_to_clock_t(task_utime(p));
> + stime = nsecs_to_cputime(p->se.sum_exec_runtime) - task_utime(p);
> - p->prev_stime = max(p->prev_stime, clock_t_to_cputime(stime));
> + p->prev_stime = max(p->prev_stime, stime);
>
> return p->prev_stime;
From strace of latancytop, it does write to /proc/<pid>/sched:
5891 open("/proc/1/sched", O_RDWR) = 5
5891 fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
5891 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fc6668f3000
5891 read(5, "init (1, #threads: 1)\n----------"..., 1024) = 776
5891 close(5) = 0
It results in:
[kernel/sched_debug.c]
void proc_sched_set_task(struct task_struct *p)
{
:
p->se.sum_exec_runtime = 0;
p->se.prev_sum_exec_runtime = 0;
p->nvcsw = 0;
p->nivcsw = 0;
}
So soon some task will have great (in fact negative) stime.
There would be no doubt that this initialize in sched_debug.c
will break monotonicity of sum_exec_runtime. I confirmed that
the issue is disappeared by comment-out of lines above.
Reverting the bisected commit is wrong solution, because it
will bring another issue, i.e. lost of runtime, and u/stime
seems to be frozen because these values restart from 0 so
prev_* is used for a while.
How to fix? Is this a bug of latencytop? Kernel?
Please comment.
Thanks,
H.Seto
--
I looked at it in tip this morning (where task_stime doesn't exist), and did the below to swat the underflow. Seems ok, but I haven't yet enabled CONFIG_LATENCYTOP. diff --git a/kernel/sched.c b/kernel/sched.c index 81575f1..cdf4fa4 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3366,6 +3366,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st) * Compare with previous values, to keep monotonicity: */ p->prev_utime = max(p->prev_utime, utime); + rtime = max(rtime, p->prev_utime); p->prev_stime = max(p->prev_stime, cputime_sub(rtime, p->prev_utime)); *ut = p->prev_utime; @@ -3396,6 +3397,7 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st) utime = rtime; sig->prev_utime = max(sig->prev_utime, utime); + rtime = max(rtime, sig->prev_utime); sig->prev_stime = max(sig->prev_stime, cputime_sub(rtime, sig->prev_utime)); --
I think the correct fix is to not allow resetting of those 4 fields, as they're used elsewhere in kernel. Latencytop doesn't seem to miss being allowed to scribble. Dunno why it wants to reset any of these though. sched: fix proc_sched_set_task() Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks task_times(). Other places in kernel use nvcsw and nivcsw, which are being cleared as well, Clear task statistics only. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Reported-by: Török Edwin <edwintorok@gmail.com> LKML-Reference: <new-submission> diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 8a46a71..ea2c690 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -491,8 +491,4 @@ void proc_sched_set_task(struct task_struct *p) #ifdef CONFIG_SCHEDSTATS memset(&p->se.statistics, 0, sizeof(p->se.statistics)); #endif - p->se.sum_exec_runtime = 0; - p->se.prev_sum_exec_runtime = 0; - p->nvcsw = 0; - p->nivcsw = 0; } --
Urgh, that code should really die, bugger that latencytop relies on that though. Arjan anything you can do to cure latencytop of this madness? --
I killed those 4 fields without _seeming_ to upset it. -Mike --
OK, if Arjan doesn't complain I'll take it ;-) --
