"The aim of these four patches is to introduce Virtual Machine time accounting," began Laurent Vivier. He described the first two patches as:
"1) As recent CPUs introduce a third running state, after 'user' and 'system', we need a new field, 'guest', in cpustat to store the time used by the CPU to run virtual CPU. Modify /proc/stat to display this new field.
"2) Like for cpustat, introduce the 'gtime' (guest time of the task) and 'cgtime' (guest time of the task children) fields for the tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display these new fields."
Both Ingo Molnar and Rik van Riel responded favorably to the patch. Ingo replied, "the concept certainly looks sane to me," adding, "I'd suggest inclusion into 2.6.24." Regarding concerns that the new information at the end of the line could break utilities such as top or ps, Rik assured that it would not, "we have added numbers to the cpu lines in /proc/stat since early 2.6. All the programs parsing /proc/stat should just scan for a number of numbers from the start of the line, without trying to scan for the terminating newline."
From: Laurent Vivier [email blocked]
Subject: [RESEND][PATCH 0/4] Virtual Machine Time Accounting
Date: Mon, 10 Sep 2007 14:02:37 +0200
Ingo, please, could you have a look to these patches ?
The aim of these four patches is to introduce Virtual Machine time accounting.
[PATCH 1/4] as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time used by
the CPU to run virtual CPU. Modify /proc/stat to display this new field.
[PATCH 2/4] like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display
these new fields.
[PATCH 3/4] modify account_system_time() to add cputime to cpustat->guest if we
are running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code although it
is executed in the kernel. We duplicate VCPU time between guest and user to
allow an unmodified "top(1)" to display correct value. A modified "top(1)" is
able to display good cpu user time and cpu guest time by subtracting cpu guest
time from cpu user time. Update "gtime" and "cgtime" in signal_struct and
task_struct accordingly.
[PATCH 4/4] Modify KVM to update guest time accounting.
Signed-off-by: Laurent Vivier [email blocked]
--
------------- Laurent.Vivier --------------
"Software is hard" - Donald Knuth
From: Ingo Molnar [email blocked]
Subject: Re: [RESEND][PATCH 0/4] Virtual Machine Time Accounting
Date: Mon, 10 Sep 2007 14:07:58 +0200
* Laurent Vivier wrote:
> Ingo, please, could you have a look to these patches ?
>
> The aim of these four patches is to introduce Virtual Machine time
> accounting.
>
> [PATCH 1/4] as recent CPUs introduce a third running state, after
> "user" and "system", we need a new field, "guest", in cpustat to store
> the time used by the CPU to run virtual CPU. Modify /proc/stat to
> display this new field.
the concept certainly looks sane to me.
The heavy-handed use of #ifdefs uglifies the code to a large degree, but
this is not a fundamental problem: since basically all distros have KVM
enabled (and lguest benefits from this too), could you just make all
this new code unconditional?
Ingo
From: Laurent Vivier [email blocked]
Subject: [RESEND 2][PATCH 0/4] Virtual Machine Time Accounting
Date: Mon, 10 Sep 2007 16:12:39 +0200
This new version remove conditional compilation on GUEST_ACCOUNTING.
----------
The aim of these four patches is to introduce Virtual Machine time accounting.
[PATCH 1/4] as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time used by
the CPU to run virtual CPU. Modify /proc/stat to display this new field.
[PATCH 2/4] like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display
these new fields.
[PATCH 3/4] modify account_system_time() to add cputime to cpustat->guest if we
are running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code although it
is executed in the kernel. We duplicate VCPU time between guest and user to
allow an unmodified "top(1)" to display correct value. A modified "top(1)" is
able to display good cpu user time and cpu guest time by subtracting cpu guest
time from cpu user time. Update "gtime" and "cgtime" in signal_struct and
task_struct accordingly.
[PATCH 4/4] Modify KVM to update guest time accounting.
Signed-off-by: Laurent Vivier [email blocked]
--
------------- Laurent.Vivier --------------
"Software is hard" - Donald Knuth
From: Ingo Molnar [email blocked]
Subject: Re: [RESEND 2][PATCH 0/4] Virtual Machine Time Accounting
Date: Mon, 10 Sep 2007 21:41:43 +0200
* Laurent Vivier wrote:
> This new version remove conditional compilation on GUEST_ACCOUNTING.
excellent! For all 4 patches:
Acked-by: Ingo Molnar [email blocked]
i'd suggest inclusion into 2.6.24.
can the /proc change break anything? Any old procps version perhaps?
Ingo
From: Laurent Vivier [email blocked]
Subject: Re: [RESEND 2][PATCH 0/4] Virtual Machine Time Accounting
Date: Tue, 11 Sep 2007 11:38:48 +0200
Ingo Molnar wrote:
> * Laurent Vivier wrote:
>
>> This new version remove conditional compilation on GUEST_ACCOUNTING.
>
> excellent! For all 4 patches:
>
> Acked-by: Ingo Molnar [email blocked]
>
> i'd suggest inclusion into 2.6.24.
Thank you.
> can the /proc change break anything? Any old procps version perhaps?
I've tested top and ps from procps 3.0.5, 3.1.8, 3.1.14, 3.2.1 and 3.2.7 without
any problem.
And as values are read with a sscanf() by procps, I think adding a field at the
end of the line is not a problem.
For those who want to play, I've attached a patch to procps-3.2.7 to display
guest time in top.
Regards,
Laurent
--
------------- Laurent.Vivier --------------
"Software is hard" - Donald Knuth
From: Rik van Riel [email blocked]
Subject: Re: [RESEND 2][PATCH 0/4] Virtual Machine Time Accounting
Date: Tue, 11 Sep 2007 10:05:11 -0400
Ingo Molnar wrote:
> * Laurent Vivier wrote:
>
>> This new version remove conditional compilation on GUEST_ACCOUNTING.
>
> excellent! For all 4 patches:
>
> Acked-by: Ingo Molnar [email blocked]
>
> i'd suggest inclusion into 2.6.24.
>
> can the /proc change break anything? Any old procps version perhaps?
We have added numbers to the cpu lines in /proc/stat since
early 2.6. All the programs parsing /proc/stat should just
scan for a number of numbers from the start of the line, without
trying to scan for the terminating newline.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
Bloat?
From Ingo Molnar:
>>since basically all distros have KVM enabled (and lguest benefits from this too), could you just make all this new code unconditional?<<
Uh? IMO embedded distro won't use virtualisation, not only to reduce the size of the kernel but also less code --> fewer bugs.
I hope that the virtualisation craziness will not bloat the kernel too much..
also noted in original lkml discussion
Avi Kivity highlighted the same concern. Laurent Vivier followed up and summarized the size increase from the patch noting, "I'm going to repost patches without #ifdefs for readability. Then we could discuss if we should introduce #ifdefs and how."
What's the distinction?
I am not particularly familiar with this, but I have one basic question. Why is it necessary to distinguish between guest time and the existing fields?
The virtual machine appears to the supervisor kernel as a single process, right? Aren't the user time and system time adequate to store the execution time for that virtual machine process? What makes "guest time" so special?
Because a guest kernel is a
Because a guest kernel is a *kernel* in itself. It has its own scheduler, timers and a lot of pseudo stuff which a real kernel will have. Missing timing information can be dangerous like it can be for a normal kernel. Therefore it is necesssary that the guest timing is consistent within the guest too, though the actual timing info can be a lot different than what it appears to anything inside a guest kernel.
Guest time
I can understand the use of internal variables to make sure the guest kernel operates correctly. What I do not understand is the external distinction between guest time and any other execution time the guest might consume.
Making such an arbitrary distinction visible in /proc would seem to cause more problems than it solves, if it solves any problem at all.