Re: [PATCH v2] pm: Add runtime PM statistics

Previous thread: Re: [PATCH V3] 2.6.34: simple IOMMU API extension to check safe interrupt remapping by Joerg Roedel on Saturday, July 10, 2010 - 8:31 am. (1 message)

Next thread: [PATCH net-next] drivers/net/mlx4: Use %pV, pr_<level>, printk_once by Joe Perches on Saturday, July 10, 2010 - 10:22 am. (2 messages)
From: Arjan van de Ven
Date: Saturday, July 10, 2010 - 9:52 am

From: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Subject: [PATCH v2] pm: Add runtime PM statistics

In order for PowerTOP to be able to report how well the new runtime PM is working
for the various drivers, the kernel needs to export some basic statistics in sysfs.

This patch adds two sysfs files in the runtime PM domain that expose the
total time a device has been active, and the time a device has been suspended.

With this PowerTOP can compute the activity percentage

Active %age = 100 * (delta active) / (delta active + delta suspended)

and present the information to the user.

I've written the PowerTOP code (slated for version 1.12) already, and the output looks
like this:

Runtime Device Power Management statistics
Active  Device name
 10.0%	06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller 



[version 2: fix stat update bugs noticed by Alan Stern]

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index b0ec0e9..b78c401 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -123,6 +123,45 @@ int pm_runtime_idle(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_idle);
 
+
+/**
+ * update_pm_runtime_accounting - Update the time accounting of power states
+ * @dev: Device to update the accounting for
+ *
+ * In order to be able to have time accounting of the various power states
+ * (as used by programs such as PowerTOP to show the effectiveness of runtime
+ * PM), we need to track the time spent in each state.
+ * update_pm_runtime_accounting must be called each time before the
+ * runtime_status field is updated, to account the time in the old state
+ * correctly.
+ */
+void update_pm_runtime_accounting(struct device *dev)
+{
+	unsigned long now = jiffies;
+	int delta;
+
+	delta = now - dev-&gt;power.accounting_timestamp;
+
+	if (delta &lt; 0)
+		delta = ...
From: Rafael J. Wysocki
Date: Sunday, July 11, 2010 - 2:26 pm

On a second thought, &quot;active_time&quot; and &quot;suspended_time&quot; should be sufficient
(ie. the &quot;runtime_&quot; prefix is not really necessary).

Rafael
--

From: Arjan van de Ven
Date: Sunday, July 11, 2010 - 10:16 pm

On Sun, 11 Jul 2010 23:26:07 +0200

it's not necessary but it's consistent with the others... so yes
I can change it but then it's no longer consistent naming.. are you sure
you want this changed?


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Rafael J. Wysocki
Date: Tuesday, July 13, 2010 - 2:28 pm

No, you're right, sorry.

But can you rebase your patch on top of linux-next, please, and move the
definitions of the new attributes next to 'control' and 'runtime_status' (so
that they don't depend on 'debug')?

Rafael
--

From: Arjan van de Ven
Date: Thursday, July 15, 2010 - 8:44 am

From: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Subject: [PATCH v3] pm: Add runtime PM statistics

In order for PowerTOP to be able to report how well the new runtime PM is
working for the various drivers, the kernel needs to export some basic
statistics in sysfs.

This patch adds two sysfs files in the runtime PM domain that expose the
total time a device has been active, and the time a device has been
suspended.

With this PowerTOP can compute the activity percentage

Active %age = 100 * (delta active) / (delta active + delta suspended)

and present the information to the user.

I've written the PowerTOP code (slated for version 1.12) already, and the
output looks like this:

Runtime Device Power Management statistics
Active  Device name
  10.0%    06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL8101E/RTL8102E PCI Express Fast Ethernet controller


[version 2: fix stat update bugs noticed by Alan Stern]
[version 3: rebase to -next and move the sysfs declaration]

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index b0ec0e9..b78c401 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -123,6 +123,45 @@ int pm_runtime_idle(struct device *dev)
  }
  EXPORT_SYMBOL_GPL(pm_runtime_idle);

+
+/**
+ * update_pm_runtime_accounting - Update the time accounting of power 
states
+ * @dev: Device to update the accounting for
+ *
+ * In order to be able to have time accounting of the various power states
+ * (as used by programs such as PowerTOP to show the effectiveness of 
runtime
+ * PM), we need to track the time spent in each state.
+ * update_pm_runtime_accounting must be called each time before the
+ * runtime_status field is updated, to account the time in the old state
+ * correctly.
+ */
+void update_pm_runtime_accounting(struct device *dev)
+{
+    unsigned long now = jiffies;
+    int delta;
+
+    delta = now - ...
From: Kevin Hilman
Date: Thursday, August 5, 2010 - 4:20 pm

By using jiffies, I think we might miss events in drivers that are doing
runtime PM transitions in short bursts.  On embedded systems with slow
HZ, there could potentially be lots of transitions between ticks.

It would be nicer to use clocksource-based time so transitions between
jiffies could still be factored into the accounting.

Kevin


--

From: Rafael J. Wysocki
Date: Thursday, August 5, 2010 - 4:45 pm

Patch please?

Rafael
--

From: Arjan van de Ven
Date: Thursday, August 5, 2010 - 4:59 pm

you're absolutely right that the current mechanism is more &quot;sampling 
accuracy&quot; (similar to most /proc info that shows up with top and such).

on the &quot;slow HZ&quot;.. there is no more valid reason to not set HZ to 
1000... so we'll get 1 msec sampling rate basically.

the problem with a more accurate clocksource is that it's expensive. And 
more... the path to such clocksource itself might be subject to power 
management ;-)

--

From: Kevin Hilman
Date: Friday, August 6, 2010 - 4:37 pm

Probably, especially with tickless idle, but not so sure there is total

What about using read_persistent_clock() then?  Then the arch/platform
definition of this will determine the max sampling rate.

Kevin



--

Previous thread: Re: [PATCH V3] 2.6.34: simple IOMMU API extension to check safe interrupt remapping by Joerg Roedel on Saturday, July 10, 2010 - 8:31 am. (1 message)

Next thread: [PATCH net-next] drivers/net/mlx4: Use %pV, pr_<level>, printk_once by Joe Perches on Saturday, July 10, 2010 - 10:22 am. (2 messages)