Re: 2.6.25 regression: powertop says 120K wakeups/sec

Previous thread: [GIT pull] x86 revert dma32 gart by Thomas Gleixner on Saturday, March 22, 2008 - 12:03 pm. (1 message)

Next thread: hi by Elena on Saturday, March 22, 2008 - 9:24 am. (1 message)
From: David Brownell
Date: Saturday, March 22, 2008 - 1:24 pm

I noticed this with 2.6.25-rc2 (if not before), and the problem
is still there with 2.6.25-rc6-git (as of this AM).

System is an Athlon64 single CPU laptop, and instead of reading a
few dozen wakeups per second, it says a many tens of thousands...
clearly wrong.  In previous kernels it gave more plausible counts;
unfortunately high because of various un-evolved desktop tools in
this Ubuntu system (Feisty).

Possibly more truthful, it says that the system never enters
C1 or C2, and spends all its time in C0.  Though if I look at
/sys/devices/system/cpu/cpu0/cpuidle/state[01]/usage, that
seems to tell a different story ... it's C0 that's never used.
In previous kernels it reported time in both C0 and C2.  ISTR
some patch to avoid C2, which would explain part of this.

Comments or fixes, anyone?

- Dave

--

From: Andrew Morton
Date: Saturday, March 22, 2008 - 5:35 pm

This is likely to be an acpi regression, isn't it?

A git-bisect would be nice, please.
--

From: Alexey Starikovskiy
Date: Sunday, March 23, 2008 - 11:04 am

There are patches in #9998, which fix irq storm in ACPI EC GPE.
It looks like this storm was already present at least in .22 kernel.
I was able to trace it down to HW failure to clear status bit of
corresponding GPE, so as soon as we return from serving one interrupt,
we get another. It would be great if we find why we can't clear this bit.
It does not seem to be IO access issue, as enable bit is in adjacent 8-bit
register and write to it succeeds. I've seen patch for calling _PSW for all
possible wake devices, as it might be constantly waking us even in runtime,
Might be long too, if it was present in .22...

--

From: David Brownell
Date: Friday, March 28, 2008 - 12:01 pm

A 2.6.24 kernel I still had stashed away didn't act odd; the

The git-bisect says the 120K wakeups/second comes from a patch
which unfortunately can't be directly reverted, so I didn't
verify that reverting it resolves that problem.  I've not tried
to do anything with the other C0/C1/C2 stuff.


$ git bisect good
bc71bec91f9875ef825d12104acf3bf4ca215fa4 is first bad commit
commit bc71bec91f9875ef825d12104acf3bf4ca215fa4
Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
Date:   Thu Jan 31 17:35:04 2008 -0800

    ACPI: enable MWAIT for C1 idle
    
    Add MWAIT idle for C1 state instead of halt, on platforms that support
    C1 state with MWAIT.
    
    Renames cx->space_id to something more appropriate.
    
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
$



--

From: Pallipadi, Venkatesh
Date: Friday, March 28, 2008 - 12:13 pm

There was a fix that went in recently (one or two days back) that should
have fixed this.
Commit 8e92b6605da989

Can you check with latest git to see whether this is still a problem?

Thanks,
Venki
--

From: David Brownell
Date: Friday, March 28, 2008 - 12:44 pm

The problem is still there in GIT as of this morning, which
includes 8e92b6605da989 ...

Comments on why bc71bec91f9875ef825d12104acf3bf4ca215fa4 seems
to be causing this?  I think it's just bad reports getting to
userspace, rather than an actual 120K wakeups/second.  (The
report is of course not always 120K, but it's usually in that
range.)

- Dave
--

From: Venki Pallipadi
Date: Friday, March 28, 2008 - 1:30 pm

No. If powertop is reporting that many wakeups, there should be that many as
powertop gets wakeups from usage in /proc/acpi/processor/CPU*/power

Can you send me the output of
# grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
with upstream kernel.

Below is a test patch which should effectively revert commit bc71bec91f987
Can you check with this patch on latest git and see whether things come back
to normal.

Also, it will help if you can send the output of acpidump from this system.

Thanks,
Venki


---
 drivers/acpi/processor_idle.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_idle.c	2008-03-28 12:07:23.000000000 -0700
+++ linux-2.6/drivers/acpi/processor_idle.c	2008-03-28 13:17:31.000000000 -0700
@@ -943,16 +943,15 @@ static int acpi_processor_get_power_info
 			if (acpi_processor_ffh_cstate_probe
 					(pr->id, &cx, reg) == 0) {
 				cx.entry_method = ACPI_CSTATE_FFH;
-			} else if (cx.type == ACPI_STATE_C1) {
+			} else if (cx.type != ACPI_STATE_C1) {
 				/*
 				 * C1 is a special case where FIXED_HARDWARE
 				 * can be handled in non-MWAIT way as well.
 				 * In that case, save this _CST entry info.
+				 * That is, we retain space_id of SYSTEM_IO for
+				 * halt based C1.
 				 * Otherwise, ignore this info and continue.
 				 */
-				cx.entry_method = ACPI_CSTATE_HALT;
-				snprintf(cx.desc, ACPI_CX_DESC_LEN, "ACPI HLT");
-			} else {
 				continue;
 			}
 		} else {
@@ -1398,8 +1397,6 @@ static inline void acpi_idle_do_entry(st
 	if (cx->entry_method == ACPI_CSTATE_FFH) {
 		/* Call into architectural FFH based C-state */
 		acpi_processor_ffh_cstate_enter(cx);
-	} else if (cx->entry_method == ACPI_CSTATE_HALT) {
-		acpi_safe_halt();
 	} else {
 		int unused;
 		/* IO port based C-state */
@@ -1443,7 +1440,7 @@ static int acpi_idle_enter_c1(struct cpu
 ...
From: David Brownell
Date: Friday, March 28, 2008 - 2:09 pm

This is "upstream" to 3085354de635179d70c240e6d942bcbd1d93056c:

/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0
/sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/time:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:<null>
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state1/power:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/time:2781727265
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage:927242422

... as I mentioned, powertop says C2 doesn't get used any more,
moreover that only C0 ever gets used.  The "usage" seems to be


Sent off-list.

- Dave

--

From: Pallipadi, Venkatesh
Date: Friday, March 28, 2008 - 2:55 pm

Looks like C2 state detection got broken somehow..
There was new state0 (polling) that was added recently which should
never get used.
state1 is actually C1 which for some reason is not staying idle and
waking up immediately (that is the reason for high usage). And C2 is
missing. I do see, in acpidump that you sent, there is a C2 state info
there.
You should have a dmesg line which looks like
ACPI: CPU0 (power states: C1[C1] C2[C2]
Do you see C2 in such line?

Thanks,
Venki
--

From: David Brownell
Date: Friday, March 28, 2008 - 3:09 pm

Yes:

ACPI: CPU0 (power states: C1[C1] C2[C2])



--

From: Venki Pallipadi
Date: Friday, March 28, 2008 - 3:56 pm

David,

I think I figured out the bug...

Can you try the below patch and confirm that it works (over upstream - ignore
the earlier revert patch I sent to you).

Thanks,
Venki

----


Patch to fix huge number of wakeups reported due to recent changes in
processor_idle.c. The problem was that the entry_method determination was
broken due to one of the recent commits (bc71bec91f987) causing
C1 entry to not to go to halt. This should also fix the hang reported here.
http://bugzilla.kernel.org/show_bug.cgi?id=10093


Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 drivers/acpi/processor_idle.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_idle.c	2008-03-28 15:31:13.000000000 -0700
+++ linux-2.6/drivers/acpi/processor_idle.c	2008-03-28 15:40:50.000000000 -0700
@@ -848,6 +848,7 @@ static int acpi_processor_get_power_info
 		/* all processors need to support C1 */
 		pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
 		pr->power.states[ACPI_STATE_C1].valid = 1;
+		pr->power.states[ACPI_STATE_C1].entry_method = ACPI_CSTATE_HALT;
 	}
 	/* the C0 state only exists as a filler in our array */
 	pr->power.states[ACPI_STATE_C0].valid = 1;
@@ -960,6 +961,9 @@ static int acpi_processor_get_power_info
 				 cx.address);
 		}
 
+		if (cx.type == ACPI_STATE_C1) {
+			cx.valid = 1;
+		}
 
 		obj = &(element->package.elements[2]);
 		if (obj->type != ACPI_TYPE_INTEGER)
--

From: Rafael J. Wysocki
Date: Friday, March 28, 2008 - 4:01 pm

Ah, thanks for figuring that out.  As a regression fix, it should go upstream
--

From: Pallipadi, Venkatesh
Date: Friday, March 28, 2008 - 4:07 pm

Lets just wait for confirmation from either David or in bug #10093. Yes.
Once we get that confirmation this should go upstream.

Thanks,
Venki
--

From: Mark Lord
Date: Monday, March 31, 2008 - 10:42 am

..

Would this have any applicability to 2.6.24 as well?

I have seen/reported a similar bug there many times in the past,
with no resolution.  There's even a bugzilla entry for it somewhere.

--

From: Pallipadi, Venkatesh
Date: Monday, March 31, 2008 - 11:34 am

No. This patch is for a regression that happened post .24.
pre .24 will be a different problem.

Thanks,
Venki
--

From: David Brownell
Date: Friday, March 28, 2008 - 4:36 pm

That seems to do the job ... thanks!  One more regression
fixed, it feels like RC7 is almost done. :)

It still says odd things about C0 vs C1 though:  powertop
says 100% of the time is in C0, but this patch would seem
to be irrelevant if that were true.

- Dave

--

From: Pallipadi, Venkatesh
Date: Friday, March 28, 2008 - 4:51 pm

Great!

100% C0 is not real reading. The problem behind that is there is no wat
to measure exact C1 idle time with halt based C1s. So, we always used to
report 0 time in acpi and that's what is reported by powertop.
This should be fixed in future, as we now export approx time (even
though not exact) in cpuidle and powertop is about to start using it.

Thanks,
Venki
--

From: David Brownell
Date: Friday, March 28, 2008 - 5:15 pm

I just pulled the latest powertop SVN and see it's smarter now.
It says over 90% in C1 (doing normal desktop stuff), with nasty
IRQ rates but that's the fault of silly desktop code.  ;)

- Dave

--

Previous thread: [GIT pull] x86 revert dma32 gart by Thomas Gleixner on Saturday, March 22, 2008 - 12:03 pm. (1 message)

Next thread: hi by Elena on Saturday, March 22, 2008 - 9:24 am. (1 message)