Re: [linux-pm] [PATCH] cpuidle: avoid singing capacitors

Previous thread: Re: 100% C0 with 2.6.25-rc by Venki Pallipadi on Friday, February 29, 2008 - 2:24 pm. (2 messages)

Next thread: 2008/3/1 ∵◆■加入成為我的 by 謝雅嵐 on Friday, February 29, 2008 - 2:54 pm. (1 message)
To: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>
Cc: <linux-pm@...>, LKML <linux-kernel@...>
Date: Friday, February 29, 2008 - 2:38 pm

Many devices today are of a less than stellar quality, and singing
transistors are a common problem. A high-pitch noise is created, caused
by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power). This
patch merely reduces the number of times it is entered so that the
frequency doesn't exceed 500 Hz. That should make sure the problem is
inaudible.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
--

The basic idea is above, but the implementation doesn't quite work. It seems jiffies is not a good time source in this scenario. The time spent in C3 takes a much bigger hit than I expect. The fact that powertop says that it has an average residency of 200 ms in C2 tells me that those should have been spent in C3.

So, pointers on what else to do?

(Patch also needs an on/off switch, but that's a later problem)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..171d838 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,12 @@

#define BREAK_FUZZ 4 /* 4 us */

+/*
+ * The minimum number of ticks needed to not oscillate faster than
+ * 500 Hz.
+ */
+#define MIN_DEEP_INTERVAL (HZ / 500)
+
struct menu_device {
int last_state_idx;

@@ -23,6 +29,8 @@ struct menu_device {
unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep_jif;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);
@@ -50,9 +58,16 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+ if ((dev->states[i].flags & CPUIDLE_FLAG_DEEP) &&
+ time_before_eq(jiffies, data->last_deep_ji...

To: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>
Cc: <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 4:18 pm

Many devices today are of a less than stellar quality, and singing
capacitors are a common problem. A high-pitch noise is created, caused
by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power). This
patch merely reduces the number of times it is entered so that the
frequency doesn't exceed 500 Hz. That should make sure the problem is
inaudible.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
--

I've been playing around with this for a bit, and the jiffies approach
just had too many gotchas. So I tried using the information that was
already present in the menu governor. This is what I came up with.

It solves the primary problem of getting rid of the noise. I've also
tried some crude power measurements and I couldn't get any difference
with and without the patch. Powertop also nicely shows that the CPU is
still spending almost all of its time in C3. Therefore, I'm letting the
effects of it be enabled by default.

So I'm taking the [RFC] off it. Please still test and see that it
solves the issue on other machines, and the it does not cause any big
power surges.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..d9c43e3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 2000;
+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,19 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",
+ * don't enter a deep sleep for short durations (by
+ * default, nothing shorter than 2 ms). This will,
+ ...

To: Pierre Ossman <drzeus-list@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 4:46 pm

Well, why not, but I believe we should default to old behaviour... not
all machines are cheaply-build.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 5:03 pm

On Mon, 3 Mar 2008 21:46:03 +0100

One would hope. ;)

But the problem is that most people will not be able to find this
option (or even know such an option exists). I'd guess the distros will
just end up having this on by default anyway. And since I could not
measure any extra power drain, I believe it's hard to justify having it
off by default (more than by pure principle).

(Then there's also the whole "But Windows doesn't have this problem!"
line of reasoning...)

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 5:08 pm

So just leave it off by default, and let distros break their own
kernels ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>, Pierre Ossman <drzeus-list@...>
Cc: Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 5:14 pm

I prefer leaving it off my default and enabling it on faulty hardware by
some blacklist.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Pavel Machek <pavel@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 5:17 pm

On Mon, 3 Mar 2008 13:14:28 -0800

But are these kind of shoddy components really trackable by model? I'd suspect it has more to do with the shipment of parts the day when the computer was manufactured.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pavel Machek <pavel@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 6:04 pm

But, with this patch:
- we are penalizing good hardware and making them less power efficient
to match the bad ones.
- There may also be server systems which first may not have these sort
of power fluctuations and even when buggy and have this noise, system
may be in some corner of some lab with fans making more noise than the
capacitors.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Pierre Ossman <drzeus-list@...>, Pavel Machek <pavel@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Monday, March 3, 2008 - 7:09 pm

The question is how much it would be penalized. Most likely the penalty
is very little. I would agree with Pierre's reasoning for enabling
it by default: distributions will enable it anyways and it's better
to have the same defaults in the mainline kernel as with distros.

It might be worth researching a good default value though. Perhaps
it can be a lot shorter than the current one, lessing the penalty
even more?

-Andi
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Monday, March 3, 2008 - 7:05 pm

Can you make it configurable through sysfs? Default to disabled, but
allow the user to turn it on if the machine makes too much noise.

Alan Stern

--

To: Alan Stern <stern@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Monday, March 3, 2008 - 7:10 pm

99+% of the users wouldn't be able to figure that out.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Alan Stern <stern@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Tuesday, March 4, 2008 - 12:00 am

On Tue, Mar 04, 2008 at 12:10:33AM +0100, Andi Kleen wrote:
> On Mon, Mar 03, 2008 at 06:05:35PM -0500, Alan Stern wrote:
> > On Mon, 3 Mar 2008, Pallipadi, Venkatesh wrote:
> >
> > > But, with this patch:
> > > - we are penalizing good hardware and making them less power efficient
> > > to match the bad ones.
> > > - There may also be server systems which first may not have these sort
> > > of power fluctuations and even when buggy and have this noise, system
> > > may be in some corner of some lab with fans making more noise than the
> > > capacitors.
> >
> > Can you make it configurable through sysfs?
>
> It already is, through a writable module_parm()
>
> > Default to disabled, but
> > allow the user to turn it on if the machine makes too much noise.
>
> 99+% of the users wouldn't be able to figure that out.

99+% of users don't have singing capacitors. (Or don't care enough to complain)
For those that do can't figure out what to do from google,
we have a documentation problem.

Dave

--
http://www.codemonkey.org.uk
--

To: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Tuesday, March 4, 2008 - 5:40 am

The big question is if there is a measurable difference in power consumption
from a reasonable default value. I doubt it actually. The normal rule
of thumb is that if you average sleeps are long enough (and they still
stay this way even with the patch enabled) you're doing ok. If there is
no or not significant penalty there is no reason to not enable it by default.

-Andi
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 2:14 am

Many devices today are of a less than stellar quality, and singing
capacitors are a common problem. A high-pitch noise is created, caused
by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power), this
patch allows you to reduces the number of times it is entered so that
the frequency can be kept inaudible.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
--

I'm not religious about the default value, and since Dave Jones piped
in I guess one of my major arguments are gone. :)

Here's the same patch with the default set to 0 (effectively disabling
the patch).

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..d9c43e3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;
+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,19 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",
+ * don't enter a deep sleep for short durations (a value
+ * of 2 ms is usually sufficient). This will, hopefully,
+ * keep the problem inaudible.
+ */
+ if (s->flags & CPUIDLE_FLAG_DEEP) {
+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > data->predicted_us)
+ break;
+ }
}

data->last_state_idx = i - 1;
@@ -132,6 +147,9 @@ static void __exit exit_menu(void)
cpuidle_unregister_governo...

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 1:19 pm

On Tue, 4 Mar 2008 07:14:23 +0100

Hold off on this. It turns out it still has major problems. The menu algorithm now and then gets really bad at predicting sleep times, completely killing this avoidance scheme.

(On that subject, does anyone except Adam understand that algorithm? Some comments wouldn't hurt...)

So for now, I'm back to thinking that measuring the interval between deep sleeps is the better approach. I could use some ideas for a good clock source though. I haven't dug much deeper than jiffies when it comes to kernel timekeeping.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 3:01 pm

Prediction is based on cumulative time till "non expected wakeup". So,
prediction will come into play only when there are very short wakeups

I think best solution is to use get_last_residency that is already
there. If the last residency or expected_us is very low, you can avoid
deep idle states. That way you don't have to depend on jiffies being
updated at the time you are checking it.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 2:02 am

On Tue, 4 Mar 2008 11:01:28 -0800

I tried using predicted_us and last_measured_us, and those didn't work (see the #if 0 code in my last patch). And since cpuidle_get_last_residency() is part of predicted_us, I don't think it is reporting useful values.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 4:40 am

On Wed, 5 Mar 2008 07:02:01 +0100

I take this back. They might be working just fine. It seems I've been looking at a too small piece of the puzzle. This machine has a dual core processor, and the governors control each core independently. Unfortunately it's the power fluctuations of the entire socket that causes noise, not just each processor.

So I need to build some global algorithm instead of one per core. Ideas are welcome.

From what I can tell, disabling one core makes the noise go away. So I guess both cores need to go into C3 (or perhaps one C2 and one C3) at the same time to cause the problem. I'm not 100% sure of this as the damn noise comes and goes, but I've been running for an hour or so now with one core disabled and without my anti-noise patch.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: <linux-pm@...>
Cc: Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Wednesday, March 12, 2008 - 4:31 pm

[Empty message]
To: <unlisted-recipients@...>, <@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Thursday, March 6, 2008 - 4:27 am

On Wed, 5 Mar 2008 09:40:23 +0100

I'm currently trying this variant. No noise yet, but the time spent in C3 seems to take a massive hit. I'll see if I can get around to some kind of power measurement (ideas for doing that in a sane way are still welcome btw).

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..960aa39 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,11 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;
+static unsigned int failed_deep = 0;
+
+static unsigned long global_last_deep;
+
struct menu_device {
int last_state_idx;

@@ -23,6 +28,8 @@ struct menu_device {
unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);
@@ -50,9 +57,51 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",
+ * don't enter a deep sleep for short durations (2 ms seems
+ * to do the trick). This will, hopefully, keep the problem
+ * inaudible.
+ */
+ if (min_deep_sleep && (s->flags & CPUIDLE_FLAG_DEEP)) {
+ if (time_before_eq(jiffies, data->last_deep +
+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+
+ rmb();
+
+ if (time_before_eq(jiffies, global_last_deep +
+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+
+#if 0
+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > cpuidle_get_last_residency(dev))
+ break;
+#endif
+#if 0
+ if (min_deep_sleep > data->predicted_us)
+ break;
+ if (min_deep_sleep > data->last_measured_us)
+ break;
+#endif
+ }
}

data->last_state_idx = i - 1;
+
+ if (dev->states[i - 1].flags & CPUIDLE_FL...

To: LKML <linux-kernel@...>, <linux-pm@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 10:16 am

I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.

In case anyone else has any more ideas, I'll detail what I've found influences the noise:

1. C state

This is the big one. There is no noise as long as C3 is avoided (processor.max_cstate).

2. uhci_hcd driver

USB is somehow involved in this problem. Unloading the uhci_hcd driver almost entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ kernel, the effect is very small, but still there.

3. Low speed USB devices

Related, the noise goes away if I insert a USB mouse (low speed). A high-speed device does not effect the noise, neither does the two built-in low speed devices (a fingerprint reader and a bluetooth host).

4. Battery and AC

The noise increases with the battery present and also when the AC supply is removed.

5. Second core

Disabling the second core makes the noise go away. This might be a subset of 1., as I've been told that a stopped core enters C1.

Changing HZ and NO_HZ has no noticeable effect on the problem (except the odd behaviour in 2.). This is further supported by the fact that Windows also has the problem (which should behave close to 100 HZ without NO_HZ).

So for now, the only viable workaround is processor.max_cstate....

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 11, 2008 - 3:51 am

I had a sudden surge of inspiration and decided to follow up on this

I have now found a very hacky workaround that is slightly better than
disabling C3 altogether; making C3 exclusive to one core at a time. It
seems to kill the noise and the system now spends 50% in C3, instead of
0%.

I read somewhere that the Intel Duo:s have an extra low power mode that
is entered if both cores are suspended at the same time (disabling the
shared cache or something like that I guess). So perhaps that mode is
the culprit. I'm hoping there's a way to disable just that feature. Do
any of you know a way? Venkatesh perhaps?

Here's the patch I'm currently running at least:

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..9a859fb 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static atomic_t cur_deep_cpu = ATOMIC_INIT(-1);
+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,13 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ if (s->flags & CPUIDLE_FLAG_DEEP) {
+ rmb();
+ if (atomic_cmpxchg(&cur_deep_cpu, -1, dev->cpu) != -1)
+ break;
+ wmb();
+ }
}

data->last_state_idx = i - 1;
@@ -80,6 +89,11 @@ static void menu_reflect(struct cpuidle_device *dev)
if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us = USEC_PER_SEC / HZ;

+ if (atomic_read(&cur_deep_cpu) == dev->cpu) {
+ atomic_set(&cur_deep_cpu, -1);
+ wmb();
+ }
+
/* Predict time remaining until next break event */
if (measured_us + BREA...

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 11, 2008 - 6:48 am

I suspect it won't do much of C3 if only one core is idle this way.
Most likely you just disabled C3 this way in a way invisible to software.

-Andi

--

To: <linux-pm@...>
Cc: Andi Kleen <andi@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Wednesday, March 12, 2008 - 3:17 pm

Andi is correct.
C-states are coordinated in hardware.
ie. if both cores enter core-C3, then the hardware allows the package
to enter package C3.

Package C-states are a big deal -- that is where most of the power savings is.

So preventing all the cores from entering C3 at the same time is
the opposite of what we want to be doing.

-Len
--

To: Andi Kleen <andi@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 11, 2008 - 11:20 am

On Tue, 11 Mar 2008 11:48:22 +0100

What a letdown... I'll pull out the old watt meter again and see if I can confirm that.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Andi Kleen <andi@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 11, 2008 - 1:31 pm

On Tue, 11 Mar 2008 16:20:26 +0100

It seems that you are correct Andi. Power usage is in line with C3 completely disabled. Bummer :/

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Monday, March 10, 2008 - 6:00 am

On battery, some machines swap C3 for "secret" C4, which is slower but
saves power.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Monday, March 10, 2008 - 8:49 am

On Mon, 10 Mar 2008 11:00:08 +0100

I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: <linux-pm@...>
Cc: Pierre Ossman <drzeus-list@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Wednesday, March 12, 2008 - 3:11 pm

Modern Intel mobile processors have a feature called "C2 popup"
that allows the processor to retire DMA from C3 without
breaking into C0. Instead the processor pops up to C2
where the cache snoop can allow the DMA to retire --
then it returns to C3, all transparent to software.

ak> Normally such things are not visible in the DSDT, but hidden in SMM traps.

no, this mechanism is totally visible to the OS via a _CST re-evaluation on AC/DC transition.

This is very commmon, as the reference BIOS code does it.
It isn't a secret. There are two common techniques -- either

the latest cpuidle code actually publishes the details of the C-states being used.

$ pwd
/sys/devices/system/cpu/cpu0/cpuidle/state3
$ grep . *
desc:ACPI FFH INTEL MWAIT 0x20
latency:17
name:C3
power:250
time:1862590932
usage:9028970

if you disable it at run-time, Linux puts it in C1.
If you never boot it in the first place (eg. maxcpusp=1), the BIOS leaves it in the deepest available C-sate.

The former will prevent the package from entering a deep package C-state,
as c-states are coordinated in hardware. The later will allow the package

yup, that's why we put it in. Is it insufficient?

-Len

--

To: Len Brown <lenb@...>
Cc: <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 12:34 pm

On Wed, 12 Mar 2008 15:11:17 -0400

It does not. But my C3 desc looks like this:

state3/desc:ACPI FFH INTEL MWAIT 0x50

What's the meaning of the last number?

I also have different latency and power:

state3/latency:162

For some value of insufficient. It is sub-optimal. I'm hoping there is a way to enter C3 in a pattern that avoids the noise and still gives a reduced power usage.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Len Brown <lenb@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Friday, March 14, 2008 - 3:40 pm

On Thu, 13 Mar 2008 17:34:37 +0100

I must have done something wrong. I now see a switch between C6 and C3
when I play with the AC cord.

On that theme, I've tested fiddling with the real C-states. I've added
a new max_hwcstate that makes ACPI downgrade MWAIT hints. I also made
some odd discoveries:

C3: More or less completely silent (I haven't tested it in a really
quiet environment yet).
C4: Constant noise
C5: Constant noise
C6: Intermittent noise

When I say constant, I mean that the noise is not generated as a result
of switching between modes (to any extent I can see at least). The
average time spent in C3 (as reported by Powertop) is over 200 ms. So
that would give a frequency of around 5 Hz, not a consistent tone of
several kHz.

Here's said patch. Please comment as I hope this can be merged:

diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 8ca3557..389ea8b 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -47,6 +47,9 @@ EXPORT_SYMBOL(acpi_processor_power_init_bm_check);

/* The code below handles cstate entry with monitor-mwait pair on Intel*/

+static unsigned int max_hwcstate __read_mostly = -1;
+module_param(max_hwcstate, uint, 0644);
+
struct cstate_entry {
struct {
unsigned int eax;
@@ -80,6 +83,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
unsigned int edx_part;
unsigned int cstate_type; /* C-state type and not ACPI C-state type */
unsigned int num_cstate_subtype;
+ unsigned int hint;

if (!cpu_cstate_entry || c->cpuid_level < CPUID_MWAIT_LEAF )
return -1;
@@ -100,16 +104,40 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);

/* Check whether this particular cx_type (in CST) is supported or not */
- cstate_type = (cx->address >> MWAIT_SUBSTATE_SIZE) + 1;
- edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
- num_cstate_subtype = edx_part & MWAIT_SU...

To: Pierre Ossman <drzeus-list@...>, Len Brown <lenb@...>
Cc: <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Friday, March 14, 2008 - 5:15 pm

Why go only one C-state down. Why not directly drop to max_hwcstate? We
don't need to loop through that way.

There are few other concerns which make me feel that the patch will be
not as simple as this.
1) BIOS may already be exporting the lower C-states as separate states.
In which case we just have to ignore this deep C-state and return. I
mean, on your system if BIOS exports C1, C3 and C6 hardware C-states and
you give max_hwcstate as C3, then we don't want to have C1, C3 and C3 in
our list.

2) On same lines the other information in ACPI like (low power of 100
and higher latency for C6 on your system) corresponds to hardware C6
state. If we change the hardware C-state here to C3, then continue to
use latency info from ACPI for hw C6, that may lead to inefficient state
selection in the governor. Also, ther are assumptions related DMA,
lapic, TSC etc in upper level ACPI based on "ACPI C-state" and changing
underlying hardware C-state to C1 for example will change some of those
behavior.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Len Brown <lenb@...>, <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Friday, March 14, 2008 - 8:41 pm

On Fri, 14 Mar 2008 14:15:46 -0700

It is still way better going to C3 less than possible than not at all (i.e. the previous workaround of processor.max_cstate). As far as I can tell from the documentation, every C-state includes all of the negative side effects of the ones preceding it. So it should just be a matter of sub-optimal selection by the governor. And I don't see how we can fix that without a big table for each processor and possibly chipset.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Len Brown <lenb@...>
Cc: <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 1:49 pm

On Thu, 13 Mar 2008 17:34:37 +0100

I've surfed Intel's web site for a bit, and found the MWAIT instruction in the arch docs. Unfortunately, it does not explain the value 0x50. According to the docs, that would be C6, which is not listed in the regular set of states for the Core 2. I did find this however:

http://www.trustedreviews.com/cpu-memory/review/2007/03/30/Intel-Process...

It says that C6 is some ultra low power mode where more or less the entire chip is powered down.

Len, Venkatesh, do you have some insight? And is it harmless to move to more power hungry modes (i.e. C4 or C3).

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>, Len Brown <lenb@...>
Cc: <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 12:47 pm

Can you try using /dev/cpu_dma_latency from an user space application.
Refer Documentation/pm_qos_interface.txt for usage. With this interface
you can change menu governor behavior from userspace, by dynamically
allowing it to use C3 or not. I am thinking that something like allowing
C3 for few seconds and blocking it for few seconds may help.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Len Brown <lenb@...>, <linux-pm@...>, Pavel Machek <pavel@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 1:44 pm

On Thu, 13 Mar 2008 09:47:30 -0700

I don't quite follow. Is the theory that you get a window of a few seconds where the system stops making a noise, even as it toggles C3?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Len Brown <lenb@...>
Cc: <linux-pm@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 4:10 am

Does that mean we should go to C3 on modern intels, even with

It would be nice to fix this BTW. offlined cpu in C1 eats too much
power, and makes measurements on the second core impossible...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>
Cc: Len Brown <lenb@...>, <linux-pm@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Andi Kleen <andi@...>, Lee Revell <rlrevell@...>
Date: Thursday, March 13, 2008 - 6:42 am

C3 is still more expensive power wise to enter, so entering C3 just
to let it immediately go back to C2 for bus mastering would be likely
still a loss over staying at C2.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Pavel Machek <pavel@...>, <linux-pm@...>, Pierre Ossman <drzeus-list@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Friday, March 14, 2008 - 12:13 am

That decision has already been made for us.
BM_STS has been made a no-op on recent processors.
It reports bus activity only for a small sub-set of
south-bridge devices. Otherwise it tells us there

The newer the processor, the less exposed we area to this scenario.

cheers,
-Len
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pavel Machek <pavel@...>, LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Monday, March 10, 2008 - 9:04 am

Normally such things are not visible in the DSDT, but hidden in SMM traps.

-Andi

--

To: Andi Kleen <andi@...>
Cc: Pavel Machek <pavel@...>, LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>
Date: Monday, March 10, 2008 - 9:29 am

On Mon, 10 Mar 2008 14:04:15 +0100

Delightful. There is something about the lower layers of PC hardware that just makes me want to cry...

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 3:30 pm

THIS will work:
http://www.nordichardware.com/image3.php?id=1446

ThinkPad owners have known this for a while...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
--

To: Henrique de Moraes Holschuh <hmh@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 4:14 pm

On Sun, 9 Mar 2008 16:30:09 -0300

Odd. The thinkpad sites I've been looking at all agree that glue doesn't help at all. Do you have some more info about that "fix"?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 4:41 pm

I asked for some pictures for thinkwiki, but nobody sent them in yet :(

You also need to know what glue to use (it needs to be one that is *not* too
hard when cured or it will just propagate the high frequencies instead of
dampening them), and where to apply it (it may be the chip, the capacitor,
or one of the inductors, etc).

As soon as the warranty on my T43 is over, I will do some heavy research on
the subject and hardware-mod it with Arcric ice, a PLL/ICH heatsink and
noise dampening.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 4:54 pm

I mean Arctic Silver, the thermal-transfer paste (must be the heat here
playing tricks with my mind :p).

It is better than what Lenovo used on the T43 (but not by much, unless you
did get a defective part to begin with).

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 2:50 pm

The relation to UHCI probably has to do with ongoing DMA. The amount
of DMA varies according to whether or not the USB devices are
suspended, whether or not they have drivers, and the speed at which
they run.

You can test whether suspending the onboard USB devices changes
anything. See Documentation/usb/power-management.txt.

Alan Stern

--

To: Pierre Ossman <drzeus-list@...>
Cc: LKML <linux-kernel@...>, <linux-pm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, Pavel Machek <pavel@...>
Date: Sunday, March 9, 2008 - 2:19 pm

Well, there may be some users willing to trade some battery life for having a
quiet box. :-)

Perhaps it's worth documenting?

Thanks,
Rafael
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Wednesday, March 5, 2008 - 5:03 am

Actually, there are more uncertaininties. Suspecting some machines
seemed to produce noise whenever they were in low-power state. Not
with fluctuations: when I forced them to low-power, it just produced
steady beep.

(Thinkpad 560X and toshiba 4030cdt, IIRC).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Wednesday, March 5, 2008 - 9:42 am

On Wed, 5 Mar 2008 10:03:04 +0100

Ouch. How do I quickly test that here?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Wednesday, March 5, 2008 - 9:47 am

Setting cpufreq to lowest frequency reproduced it for me, but I guess
notebooks are different.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Pavel Machek <pavel@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>
Date: Wednesday, March 5, 2008 - 9:52 am

On Wed, 5 Mar 2008 14:47:38 +0100

That has no effect here unfortunately.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 1:29 pm

> So for now, I'm back to thinking that measuring the interval between deep sleeps is the better approach. I could use some ideas for a good clock source though. I haven't dug much deeper than jiffies when it comes to kernel timekeeping.

jiffies should work, you just need to make sure you measure them at the
right place. In particular there is some code in dyntick that catches
up on jiffies after the deep sleep when the normal timer handler didn't run
and jiffies are only usable again after that code executed.

-ANdi
--

To: Andi Kleen <andi@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Andi Kleen <andi@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 1:30 pm

On Tue, 4 Mar 2008 18:29:18 +0100

And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 1:43 pm

> And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.

After irq_enter->tick_nohz_update_jiffies()

You might need a new callback into the idle governours for this though.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 2:04 pm

On Tue, 4 Mar 2008 18:43:15 +0100

I don't think I quite see the point. If we can't force an update, then
it doesn't really help us knowing if the jiffies value is stale or not.

Anyway, I'm running the following patch for now. I'll keep it on my
machine for a few days and see if I still hear crickets.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..73f954d 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,9 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;
+static unsigned int failed_deep = 0;
+
struct menu_device {
int last_state_idx;

@@ -23,6 +26,8 @@ struct menu_device {
unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);
@@ -50,9 +55,33 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",
+ * don't enter a deep sleep for short durations (2 ms seems
+ * to do the trick). This will, hopefully, keep the problem
+ * inaudible.
+ */
+ if (min_deep_sleep && (s->flags & CPUIDLE_FLAG_DEEP)) {
+ if (time_before_eq(jiffies, data->last_deep +
+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+#if 0
+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > data->predicted_us)
+ break;
+ if (min_deep_sleep > data->last_measured_us)
+ break;
+#endif
+ }
}

data->last_state_idx = i - 1;
+
+ if (dev->states[i - 1].flags & CPUIDLE_FLAG_DEEP)
+ data->last_deep = jiffies;
+
return i - 1;
}

@@ -80,6 +109,10 @@ static void menu_reflect(struct cpuidle_device *dev)
if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us =...

To: Pierre Ossman <drzeus-list@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Tuesday, March 4, 2008 - 2:34 pm

What do you mean with force an update? The system always does an
jiffies update automatically after idle wakeup. It's just that if you add
code to the code path that uses jiffies you have to make sure it is after the
standard update, otherwise you'll see a stale value.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 2:04 am

On Tue, 4 Mar 2008 19:34:06 +0100

I still don't follow. Perhaps you can express it in pseudo code? If I have a stale value that I cannot refresh, knowing that it is stale doesn't change anything.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 11:48 am

Start: You discovered at some point where you currently have code a variable
is not updated yet.
Fact: You have some new code that runs before that point
Information: The variable is updated later eventually in the idle exit path.
Fact II: You require the variable to be updated in your new code
Possible solutions:
(1) you move your new code in a point of the idle exit path after the variable
is updated
(2) you move the code that updates the variable earlier before your code
Solution: I described the first variant which is likely easier.
How: I told you where it is updated, so that shouldn't be too difficult.
Action: Implement solution (1) or (2)
Action: Test if it works
Check: If test succeeded exit
Otherwise: Restart at Start

-Andi
--

To: Andi Kleen <andi@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 12:53 pm

On Wed, 5 Mar 2008 16:48:56 +0100

Now you're just being a smart-ass. :)

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Andi Kleen <andi@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Dave Jones <davej@...>, Alan Stern <stern@...>, LKML <linux-kernel@...>, Adam Belay <abelay@...>, Lee Revell <rlrevell@...>, <linux-pm@...>, Pavel Machek <pavel@...>
Date: Wednesday, March 5, 2008 - 1:32 pm

> But I don't use jiffies in the idle exit path, only the entry path.

On entry jiffies should be always uptodate. Only on exit there

Well you asked for pseudo code ...

-Andi
--

To: Pierre Ossman <drzeus-list@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>
Date: Monday, March 3, 2008 - 8:36 am

I like the patch in principle, although I think the threshold should
be configurable, not hard coded.

I haven't checked in detail if that is the case, but you need to make
sure that you only reference jiffies on idle exit after the clock
source has caught up with lost jiffies after idle. That might be
the cause of your problem.

-Andi
--

To: Pierre Ossman <drzeus-list@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>
Date: Saturday, March 1, 2008 - 10:27 pm

Should there be a comment stating the motivation for the change?

Thanks for trying to address this problem, I know many users who are afflicted.

Lee
--

To: Lee Revell <rlrevell@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>
Date: Sunday, March 2, 2008 - 10:17 am

On Sat, 1 Mar 2008 21:27:09 -0500

As always, the quickest way to get things done is to make sure a kernel developer suffers from the same issue. ;)

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>
Cc: <linux-pm@...>, LKML <linux-kernel@...>
Date: Saturday, March 1, 2008 - 9:40 am

On Fri, 29 Feb 2008 19:38:12 +0100

I guess another approach would be to refuse to enter deep sleep if the sleep time is less than 2 ms. That would mean we would not lose the long sleeps, but if it is just doing short sleeps then we would never enter C3...

Is there a decent way of testing which approach is actually doing the least damage?

Rgds

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

To: Pierre Ossman <drzeus-list@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>
Date: Friday, February 29, 2008 - 5:44 pm

What happens here if HZ < 500? Or does the fact that you have less than
500HZ jiffies automatically imply that you can't go to sleep more than
the jiffy rate times per second?

--
Len Sorensen
--

To: Lennart Sorensen <lsorense@...>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@...>, Adam Belay <abelay@...>, <linux-pm@...>, LKML <linux-kernel@...>
Date: Saturday, March 1, 2008 - 8:31 am

On Fri, 29 Feb 2008 16:44:07 -0500

A low HZ will still go to sleep very often (provided NO_HZ is in effect). But a HZ < 500 makes that number up there turn to zero. But the check further down makes sure that at least 1 tick passes. So that means it will not enter C3 more often than min(HZ, 500) Hz. Another reason to stop using jiffies.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--

Previous thread: Re: 100% C0 with 2.6.25-rc by Venki Pallipadi on Friday, February 29, 2008 - 2:24 pm. (2 messages)

Next thread: 2008/3/1 ∵◆■加入成為我的 by 謝雅嵐 on Friday, February 29, 2008 - 2:54 pm. (1 message)