I noticed this with 2.6.25-rc2 (if not before), and the problem is still there with 2.6.25-rc6-git (as of this AM). System is an Athlon64 single CPU laptop, and instead of reading a few dozen wakeups per second, it says a many tens of thousands... clearly wrong. In previous kernels it gave more plausible counts; unfortunately high because of various un-evolved desktop tools in this Ubuntu system (Feisty). Possibly more truthful, it says that the system never enters C1 or C2, and spends all its time in C0. Though if I look at /sys/devices/system/cpu/cpu0/cpuidle/state[01]/usage, that seems to tell a different story ... it's C0 that's never used. In previous kernels it reported time in both C0 and C2. ISTR some patch to avoid C2, which would explain part of this. Comments or fixes, anyone? - Dave --
This is likely to be an acpi regression, isn't it? A git-bisect would be nice, please. --
There are patches in #9998, which fix irq storm in ACPI EC GPE. It looks like this storm was already present at least in .22 kernel. I was able to trace it down to HW failure to clear status bit of corresponding GPE, so as soon as we return from serving one interrupt, we get another. It would be great if we find why we can't clear this bit. It does not seem to be IO access issue, as enable bit is in adjacent 8-bit register and write to it succeeds. I've seen patch for calling _PSW for all possible wake devices, as it might be constantly waking us even in runtime, Might be long too, if it was present in .22... --
A 2.6.24 kernel I still had stashed away didn't act odd; the The git-bisect says the 120K wakeups/second comes from a patch which unfortunately can't be directly reverted, so I didn't verify that reverting it resolves that problem. I've not tried to do anything with the other C0/C1/C2 stuff. $ git bisect good bc71bec91f9875ef825d12104acf3bf4ca215fa4 is first bad commit commit bc71bec91f9875ef825d12104acf3bf4ca215fa4 Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> Date: Thu Jan 31 17:35:04 2008 -0800 ACPI: enable MWAIT for C1 idle Add MWAIT idle for C1 state instead of halt, on platforms that support C1 state with MWAIT. Renames cx->space_id to something more appropriate. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> $ --
There was a fix that went in recently (one or two days back) that should have fixed this. Commit 8e92b6605da989 Can you check with latest git to see whether this is still a problem? Thanks, Venki --
The problem is still there in GIT as of this morning, which includes 8e92b6605da989 ... Comments on why bc71bec91f9875ef825d12104acf3bf4ca215fa4 seems to be causing this? I think it's just bad reports getting to userspace, rather than an actual 120K wakeups/second. (The report is of course not always 120K, but it's usually in that range.) - Dave --
No. If powertop is reporting that many wakeups, there should be that many as
powertop gets wakeups from usage in /proc/acpi/processor/CPU*/power
Can you send me the output of
# grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
with upstream kernel.
Below is a test patch which should effectively revert commit bc71bec91f987
Can you check with this patch on latest git and see whether things come back
to normal.
Also, it will help if you can send the output of acpidump from this system.
Thanks,
Venki
---
drivers/acpi/processor_idle.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
Index: linux-2.6/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_idle.c 2008-03-28 12:07:23.000000000 -0700
+++ linux-2.6/drivers/acpi/processor_idle.c 2008-03-28 13:17:31.000000000 -0700
@@ -943,16 +943,15 @@ static int acpi_processor_get_power_info
if (acpi_processor_ffh_cstate_probe
(pr->id, &cx, reg) == 0) {
cx.entry_method = ACPI_CSTATE_FFH;
- } else if (cx.type == ACPI_STATE_C1) {
+ } else if (cx.type != ACPI_STATE_C1) {
/*
* C1 is a special case where FIXED_HARDWARE
* can be handled in non-MWAIT way as well.
* In that case, save this _CST entry info.
+ * That is, we retain space_id of SYSTEM_IO for
+ * halt based C1.
* Otherwise, ignore this info and continue.
*/
- cx.entry_method = ACPI_CSTATE_HALT;
- snprintf(cx.desc, ACPI_CX_DESC_LEN, "ACPI HLT");
- } else {
continue;
}
} else {
@@ -1398,8 +1397,6 @@ static inline void acpi_idle_do_entry(st
if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */
acpi_processor_ffh_cstate_enter(cx);
- } else if (cx->entry_method == ACPI_CSTATE_HALT) {
- acpi_safe_halt();
} else {
int unused;
/* IO port based C-state */
@@ -1443,7 +1440,7 @@ static int acpi_idle_enter_c1(struct cpu
...This is "upstream" to 3085354de635179d70c240e6d942bcbd1d93056c: /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0 /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:<null> /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1 /sys/devices/system/cpu/cpu0/cpuidle/state1/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:2781727265 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:927242422 ... as I mentioned, powertop says C2 doesn't get used any more, moreover that only C0 ever gets used. The "usage" seems to be Sent off-list. - Dave --
Looks like C2 state detection got broken somehow.. There was new state0 (polling) that was added recently which should never get used. state1 is actually C1 which for some reason is not staying idle and waking up immediately (that is the reason for high usage). And C2 is missing. I do see, in acpidump that you sent, there is a C2 state info there. You should have a dmesg line which looks like ACPI: CPU0 (power states: C1[C1] C2[C2] Do you see C2 in such line? Thanks, Venki --
Yes: ACPI: CPU0 (power states: C1[C1] C2[C2]) --
David, I think I figured out the bug... Can you try the below patch and confirm that it works (over upstream - ignore the earlier revert patch I sent to you). Thanks, Venki ---- Patch to fix huge number of wakeups reported due to recent changes in processor_idle.c. The problem was that the entry_method determination was broken due to one of the recent commits (bc71bec91f987) causing C1 entry to not to go to halt. This should also fix the hang reported here. http://bugzilla.kernel.org/show_bug.cgi?id=10093 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- drivers/acpi/processor_idle.c | 4 ++++ 1 file changed, 4 insertions(+) Index: linux-2.6/drivers/acpi/processor_idle.c =================================================================== --- linux-2.6.orig/drivers/acpi/processor_idle.c 2008-03-28 15:31:13.000000000 -0700 +++ linux-2.6/drivers/acpi/processor_idle.c 2008-03-28 15:40:50.000000000 -0700 @@ -848,6 +848,7 @@ static int acpi_processor_get_power_info /* all processors need to support C1 */ pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; pr->power.states[ACPI_STATE_C1].valid = 1; + pr->power.states[ACPI_STATE_C1].entry_method = ACPI_CSTATE_HALT; } /* the C0 state only exists as a filler in our array */ pr->power.states[ACPI_STATE_C0].valid = 1; @@ -960,6 +961,9 @@ static int acpi_processor_get_power_info cx.address); } + if (cx.type == ACPI_STATE_C1) { + cx.valid = 1; + } obj = &(element->package.elements[2]); if (obj->type != ACPI_TYPE_INTEGER) --
Ah, thanks for figuring that out. As a regression fix, it should go upstream --
Lets just wait for confirmation from either David or in bug #10093. Yes. Once we get that confirmation this should go upstream. Thanks, Venki --
.. Would this have any applicability to 2.6.24 as well? I have seen/reported a similar bug there many times in the past, with no resolution. There's even a bugzilla entry for it somewhere. --
No. This patch is for a regression that happened post .24. pre .24 will be a different problem. Thanks, Venki --
That seems to do the job ... thanks! One more regression fixed, it feels like RC7 is almost done. :) It still says odd things about C0 vs C1 though: powertop says 100% of the time is in C0, but this patch would seem to be irrelevant if that were true. - Dave --
Great! 100% C0 is not real reading. The problem behind that is there is no wat to measure exact C1 idle time with halt based C1s. So, we always used to report 0 time in acpi and that's what is reported by powertop. This should be fixed in future, as we now export approx time (even though not exact) in cpuidle and powertop is about to start using it. Thanks, Venki --
I just pulled the latest powertop SVN and see it's smarter now. It says over 90% in C1 (doing normal desktop stuff), with nasty IRQ rates but that's the fault of silly desktop code. ;) - Dave --
