v2:
Rebased off of Thomas Renninger's patch for cgroups_cpuacct refactoring,
which is based off of linus's tree. Thomas it might be easier to merge our
patches if you take these patches and put them in a series ontop of your
original patches (presuming there are no objections to your patch).
This patch series introduces cpu frequency and power tracking for cpuacct
cgroups. A similar patch set was discussed a while back and it was concluded
that due to varying architectures (ppc, x86 with overboot) you cannot account
for frequencies and their power consumption generically in sched.c, thus we
have platform specific hooks the cpuacct can call into (if available).
This patch series is not 3 instead of 4. I have left out the power
implementation for OMAP due to implementation conflicts in linux-next.
Mike Chan (3):
scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU
frequencies
scheduler: cpuacct: Enable platform callbacks for cpuacct power
tracking
omap: cpu: Implement callbacks for cpu frequency tracking in cpuacct
Documentation/cgroups/cpuacct.txt | 7 ++++
arch/arm/plat-omap/cpu-omap.c | 67 +++++++++++++++++++++++++++++++++++-
include/linux/cpuacct.h | 43 +++++++++++++++++++++++
kernel/cgroup_cpuaccount.c | 69 +++++++++++++++++++++++++++++++++++++
4 files changed, 185 insertions(+), 1 deletions(-)
create mode 100644 include/linux/cpuacct.h
Signed-off-by: Mike Chan <mike@android.com>
--
V2:
- Rebased off Thomass Renninger's cgroup_cpuacct refactoring
Platform must register cpu power function that return power in
milliWatt seconds.
New file:
cpuacct.power reports the power consumed in milliWatt seconds
Signed-off-by: Mike Chan <mike@android.com>
---
Documentation/cgroups/cpuacct.txt | 3 +++
include/linux/cpuacct.h | 4 +++-
kernel/cgroup_cpuaccount.c | 24 ++++++++++++++++++++++--
3 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt
index 600d2d0..84e471b 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroups/cpuacct.txt
@@ -44,6 +44,9 @@ cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU
frequency. Platform hooks must be implemented inorder to properly track
time at each CPU frequency.
+cpuacct.power file gives CPU power consumed (in milliWatt seconds). Platform
+must provide and implement power callback functions.
+
cpuacct controller uses percpu_counter interface to collect user and
system times. This has two side effects:
diff --git a/include/linux/cpuacct.h b/include/linux/cpuacct.h
index 6205d29..c17a634 100644
--- a/include/linux/cpuacct.h
+++ b/include/linux/cpuacct.h
@@ -31,7 +31,9 @@ struct cpuacct_charge_calls {
*/
void (*init) (void **cpuacct_data);
void (*charge) (void *cpuacct_data, u64 cputime, unsigned int cpu);
- void (*show) (void *cpuacct_data, struct cgroup_map_cb *cb);
+ void (*cpufreq_show) (void *cpuacct_data, struct cgroup_map_cb *cb);
+ /* Returns power consumed in milliWatt seconds */
+ u64 (*power_usage) (void *cpuacct_data);
};
int cpuacct_charge_register(struct cpuacct_charge_calls *fn);
diff --git a/kernel/cgroup_cpuaccount.c b/kernel/cgroup_cpuaccount.c
index 11799a7..d9bf889 100644
--- a/kernel/cgroup_cpuaccount.c
+++ b/kernel/cgroup_cpuaccount.c
@@ -226,12 +226,28 @@ static int cpuacct_cpufreq_show(struct cgroup *cgrp, struct cftype *cft,
...Implement OMAP platform specific scheduler callbacks for tracking cpu frequencies per cpuacct cgroup. Signed-off-by: Mike Chan <mike@android.com> --- arch/arm/plat-omap/cpu-omap.c | 67 ++++++++++++++++++++++++++++++++++++++++- 1 files changed, 66 insertions(+), 1 deletions(-) diff --git a/arch/arm/plat-omap/cpu-omap.c b/arch/arm/plat-omap/cpu-omap.c index 6d3d333..176417a 100644 --- a/arch/arm/plat-omap/cpu-omap.c +++ b/arch/arm/plat-omap/cpu-omap.c @@ -21,6 +21,8 @@ #include <linux/err.h> #include <linux/clk.h> #include <linux/io.h> +#include <linux/slab.h> +#include <linux/cpuacct.h> #include <mach/hardware.h> #include <plat/clock.h> @@ -38,6 +40,10 @@ static struct cpufreq_frequency_table *freq_table; static struct clk *mpu_clk; +#ifdef CONFIG_CGROUP_CPUACCT +static int freq_index; +#endif + /* TODO: Add support for SDRAM timing changes */ int omap_verify_speed(struct cpufreq_policy *policy) @@ -96,6 +102,11 @@ static int omap_target(struct cpufreq_policy *policy, freqs.old, freqs.new); #endif ret = clk_set_rate(mpu_clk, freqs.new * 1000); +#ifdef CONFIG_CGROUP_CPUACCT + /* Update freq_index before cpufreq transition post notification. */ + cpufreq_frequency_table_target(policy, freq_table, freqs.new, + CPUFREQ_RELATION_L, &freq_index); +#endif cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); return ret; @@ -125,7 +136,14 @@ static int __init omap_cpu_init(struct cpufreq_policy *policy) policy->cpuinfo.max_freq = clk_round_rate(mpu_clk, VERY_HI_RATE) / 1000; } - +#ifdef CONFIG_CGROUP_CPUACCT + /* + * Update freq_index, since we are using the extact frequency + * we can use any relation. + */ + cpufreq_frequency_table_target(policy, freq_table, policy->cur, + CPUFREQ_RELATION_L, &freq_index); +#endif /* FIXME: what's the actual transition time? */ policy->cpuinfo.transition_latency = 300 * 1000; @@ -169,3 +187,50 @@ arch_initcall(omap_cpufreq_init); * ...
Hi Mike,
thanks.
A general comment:
I don't know much about the cgroup stuff.
I am also not sure how exactly power can be measured on this arch based on
frequency accounting (there also were some threads I was not aware of?)
A signed-off-by or reviewed-by from someone who is more involved in this omap
stuff would probably not that bad.
Still I Iike this interface and I could imagine others hook into it as well,
for whatever has to be cpu cgroup accounted.
My two cents...,
Thomas
--
If you know how much time was spent at each frequency executing code, you can calculate how much power was consumed if the platform (with hooks) provide power numbers (in milliWatts) for the power at frequency X. I did some initial testing on Motorola Droid comparing to a power OMAP was the closest with mainline support I could provide an example how to use these hooks. I'm hoping for some blessing from some people on the linux-omap list for that. However can we possibly just stack the first two patches to get the API in? This will make it easier to fixup the omap hooks if they don't get in. --
This looks like a great enhancement to me. Speaking for OMAP PM... I'd suggest getting the generic stuff upstream (or into -next) soon and then work out the OMAP specifics after. Since the OMAP OPP layer is going through some churn (but stabilizing and will be submitted for 2.6.36), I'd suggest we queue the OMAP-specific parts of this along with the OPP layer changes. Kevin --
On Fri, May 21, 2010 at 10:05 AM, Kevin Hilman So it looks like there is no objections to this API and I'm OK with dropping the omap hooks for now until things are settled in 2.6.36. So are things good with Thomas' re-factoring patch for cpuacct as well as the first 2 patches? --
