Re: Reading EeePC900 battery info causes stalls (was Re: How how latent should non-preemptive scheduling be?)

Previous thread: [PATCH] leds-pca9532: Mark pca9532_event() static by Sven Wegener on Saturday, September 20, 2008 - 2:58 am. (1 message)

Next thread: [PATCH 1/2] MAINTAINERS: Trivial whitespace cleanups by Jean Delvare on Saturday, September 20, 2008 - 3:33 am. (1 message)
From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 3:10 am

I switched the tracer to ftrace and waited for the problem to occur.
When stall did happen it was far worse than usual with the tracing on
(instead of looping sound of < 0.5 second it looped it for about 2-3
seconds). Looking atthe trace it was filled with hald events. Killing 
off all of hald made the 30 second stalls go away.

A quick strace showed that what hal was doing every 30 seconds was
reading various battery stats from /sys. Doing a simple
cat /sys/class/power_supply/BAT0/manufacturer
is enough to provoke a stall. The stalls only occur if you are on 
battery or are on AC and the battery is not reporting that it is 100% 
charged (but in the latter case the stalls are less pronounced).

I can reliably hear the stalls at runlevel 1 by running
speaker-test -b75000
and
watch --interval=1 cat /proc/acpi/battery/BAT0/info
within separate terminals within screen.

A 5 second ftrace of the stall being provoked is provided on 
<http://sucs.org/~sits/test/eeepc-ftrace.txt.gz> (it's 6Mbytes 
uncompressed).

Putting the alsa buffer size up to 100000 allows you to still hear
stalls but far less frequently. Putting to 150000 will stop you from
hearing stalls. Even though xruns is set to 2 alsa not report any buffer
underruns to syslog.

Another way of seeing the stalls is to run glxgears and every second the
gears' spinning will jump (even on an unloaded system).

(I guess this works as the stalls are over 100ms)

The xandros provided 2.6.21.4 kernel does not exhibit this problem at 
all but my hand compiled 2.6.24something and ubuntu 2.6.24 kernels do.

For some reason latencytop did not really finger ACPI as a cause of
stalls (although some acpi stuff does show up but never in the top

I rebuilt my kernel after a make clean and wakeup was still not there. 
It might be a good idea to modify the kernel documentation currently
provided with 2.6.27 if it has gone away for now (or document the extra
switches needed to turn it on if that's why it didn't show up for ...
From: Alexey Starikovskiy
Date: Saturday, September 20, 2008 - 3:25 am

Hi Sitsofe,

eeePC is known to be affected by EC GPE storm bug, and there is a patch for 
2.6.27 to workaround the problem. please look at
http://bugzilla.kernel.org/show_bug.cgi?id=11549
for latest patch and bugs 10724/9998 for problem discussion.

Thanks,
Alex.


--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 3:51 am

Hi Alexey,


Even with http://bugzilla.kernel.org/attachment.cgi?id=17845&action=view 
(fast transaction) applied the stalls are still present (and seemingly 
just as bad as before).
--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 3:55 am

I should note that the patch DOES fix the slow volume hotkey repeat rate 
that I was seeing.
--

From: Alexey Starikovskiy
Date: Saturday, September 20, 2008 - 4:14 am

Could you cat /proc/interrupts ?
--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 4:23 am

CPU0
   0:     106942   IO-APIC-edge      timer
   1:        426   IO-APIC-edge      i8042
   8:         41   IO-APIC-edge      rtc0
   9:      27346   IO-APIC-fasteoi   acpi
  12:       6840   IO-APIC-edge      i8042
  14:          0   IO-APIC-edge      ata_piix
  15:        203   IO-APIC-edge      ata_piix
  16:      92357   IO-APIC-fasteoi   uhci_hcd:usb5, HDA Intel, 
i915@pci:0000:00:02.0
  18:      38195   IO-APIC-fasteoi   ath, uhci_hcd:usb4
  19:          0   IO-APIC-fasteoi   uhci_hcd:usb3
  23:      25150   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
NMI:          0   Non-maskable interrupts
LOC:     142112   Local timer interrupts
TRM:          0   Thermal event interrupts
SPU:          0   Spurious interrupts
ERR:          0
MIS:          0
--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 4:41 am

and here are the interrupts from the (seemingly working) EeePC's Xandros 
2.6.21.4 install:

            CPU0
   0:       6486    XT-PIC-XT        timer
   1:        107    XT-PIC-XT        i8042
   2:          0    XT-PIC-XT        cascade
   3:          0    XT-PIC-XT        uhci_hcd:usb2
   5:        735    XT-PIC-XT        uhci_hcd:usb1, ehci_hcd:usb5
   9:        219    XT-PIC-XT        acpi
  10:       8171    XT-PIC-XT        uhci_hcd:usb3, pciehp, wifi0
  11:        107    XT-PIC-XT        HDA Intel, i915@pci:0000:00:02.0, 
uhci_hcd:usb4, pciehp
  12:        236    XT-PIC-XT        i8042
  14:          0    XT-PIC-XT        libata
  15:       3475    XT-PIC-XT        libata
NMI:          0
ERR:          0

and here is the kernel cmdline that the Xandros install uses:
quiet rw vga=785 irqpoll root=/dev/sda1

I've got a feeling I know what direction this is going in...

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Alexey Starikovskiy
Date: Saturday, September 20, 2008 - 5:34 am

Right, could you please check if irqpoll is the key to success?

Thanks,
Alex.
--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 5:50 am

So I tried just irqpoll (along with nolapic irqpool which changed the 
output in /proc/interrupts) but it made no difference to the stalling. 
I'm baffled.

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Alexey Starikovskiy
Date: Saturday, September 20, 2008 - 6:07 am

Could you please try following patch on top?
From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 6:30 am

Unfortunately it still stalls (hotkeys still seem to be working 
correctly though).

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Steven Rostedt
Date: Saturday, September 20, 2008 - 6:59 am

The wake up tracing is CONFIG_SCHED_TRACER. If you do not see 
"wakeup" as one of the available tracers in available_tracers, then check
your dmesg and see if you have CONFIG_FTRACE_STARTUP_TEST enabled. There 
was a known bug that would cause the wakeup tracing to fail the test. When 
a trace fails the startup test, it is disabled.

-- Steve

--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 7:11 am

This is exactly the case (dmesg reports the ftrace wakeup startup test 
failed). That's what I get for not reading my dmesg more carefully...

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Steven Rostedt
Date: Saturday, September 20, 2008 - 7:37 am

OK, that is probably the known bug you are hitting. Simply disable the 
CONFIG_FTRACE_STARTUP_TEST and you should have the wakeup tracer. The bug 
is with the test, not the tracer, so it should not hurt you.

-- Steve

--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 10:16 am

Thanks - this made the wakeup tracer appear a you said. I have put two 
wakeup traces up:

http://sucs.org/~sits/test/eeepc-debug/20080920/latency_trace.txt.gz
http://sucs.org/~sits/test/eeepc-debug/20080920/trace.txt.gz
(each file is around 6Mbytes uncompressed)


Is it intentional that the last event has a time earlier closer to that 
of the first event?

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Steven Rostedt
Date: Saturday, September 20, 2008 - 2:53 pm

Peter, these times are crazy, mainly due to the cpu_clock. He probably 
wants to use the sched_clock. Below is a patch to use it instead.

Sitsofe, I notice that the trace states "desktop". This means that you
are running with CONFIG_PREEMPT_VOLUNTARY. You want 
CONFIG_PREEMPT.


Change the config, and see what you get with this patch:

Note this is not compiled tested:

---
 kernel/trace/trace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linus.git/kernel/trace/trace.c
===================================================================
--- linus.git.orig/kernel/trace/trace.c	2008-09-20 17:49:00.000000000 -0400
+++ linus.git/kernel/trace/trace.c	2008-09-20 17:49:35.000000000 -0400
@@ -60,7 +60,7 @@ ns2usecs(cycle_t nsec)
 
 cycle_t ftrace_now(int cpu)
 {
-	return cpu_clock(cpu);
+	return sched_clock();
 }
 
 /*


-- Steve

--

From: Sitsofe Wheeler
Date: Saturday, September 20, 2008 - 4:18 pm

Dagnabit I keep confusing people. This was actually intentional because 
I wanted to know whether the latency I was seeing should have been 
present in a non-preempt (but voluntary) kernel. You can see the subject 
at that start of this thread: http://tinyurl.com/4akxa5 (How how latent 
should non-preemptive scheduling be?). I'm not running a sound studio 
where I need the lowest possible latency at all costs. Further my 
current understanding is that the desktop distros don't tend to ship 
"regular desktop kernels" with preemption (I know Ubuntu 8.04 and Fedora 
9 didn't).

Basically I have the following queries: Do you have to have preemption 
on if you are listening to music (without noticeable skips) and playing 
the odd game (without noticeable pauses) on a desktop? What's the 
allowed highest latency going to be over a few minutes in such kernels? 
Is it simply the case that if it's a non-preemptive kernel latency no 
longer matters?

Just for the record I just tested a preempt kernel I had lying around 
and the speakter-test -b75000 while looking at battery did not stall. 

I'll see if I can test the patch tomorrow. Does the config change also 
have to be made or the timestamps to be "narrower"?

-- 
Sitsofe | http://sucs.org/~sits/
--

From: Ingo Molnar
Date: Sunday, September 21, 2008 - 12:09 am

milliseconds of stalls is definitely excessive under a non-preempt 
kernel - and you have up to 500 msecs of stalls, right?

The simplest way you can fix such latencies is to look at the function 
trace, figure out which loop in the kernel takes so long to execute, and 
add a cond_resched() call to it. [there are other situations where a 
more complicated fix is needed, but this seems like a simpler scenario.]

	Ingo
--

From: Sitsofe Wheeler
Date: Sunday, September 21, 2008 - 11:24 pm

OK I've made the changes you suggested. Without preempt enabled the last 
event will have a stamp closer to the first event and the times are very 
high. With preempt enabled that beahviour has gone. Here are results 
with preempt enabled:

latency: 19657 us, #3268/3268, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0):
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/ath5k/latency_trace.txt 


latency: 104 us, #182/182, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/unplug/latency_trace.txt

latency: 95 us, #134/134, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0):
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/acpi/latency_trace.txt

latency: 91 us, #152/152, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/touchpad/latency_trace.txt

Are these the type of latencies to be expected with preemption on?

-- 
Sitsofe | http://sucs.org/~sits/
--

Previous thread: [PATCH] leds-pca9532: Mark pca9532_event() static by Sven Wegener on Saturday, September 20, 2008 - 2:58 am. (1 message)

Next thread: [PATCH 1/2] MAINTAINERS: Trivial whitespace cleanups by Jean Delvare on Saturday, September 20, 2008 - 3:33 am. (1 message)