I switched the tracer to ftrace and waited for the problem to occur. When stall did happen it was far worse than usual with the tracing on (instead of looping sound of < 0.5 second it looped it for about 2-3 seconds). Looking atthe trace it was filled with hald events. Killing off all of hald made the 30 second stalls go away. A quick strace showed that what hal was doing every 30 seconds was reading various battery stats from /sys. Doing a simple cat /sys/class/power_supply/BAT0/manufacturer is enough to provoke a stall. The stalls only occur if you are on battery or are on AC and the battery is not reporting that it is 100% charged (but in the latter case the stalls are less pronounced). I can reliably hear the stalls at runlevel 1 by running speaker-test -b75000 and watch --interval=1 cat /proc/acpi/battery/BAT0/info within separate terminals within screen. A 5 second ftrace of the stall being provoked is provided on <http://sucs.org/~sits/test/eeepc-ftrace.txt.gz> (it's 6Mbytes uncompressed). Putting the alsa buffer size up to 100000 allows you to still hear stalls but far less frequently. Putting to 150000 will stop you from hearing stalls. Even though xruns is set to 2 alsa not report any buffer underruns to syslog. Another way of seeing the stalls is to run glxgears and every second the gears' spinning will jump (even on an unloaded system). (I guess this works as the stalls are over 100ms) The xandros provided 2.6.21.4 kernel does not exhibit this problem at all but my hand compiled 2.6.24something and ubuntu 2.6.24 kernels do. For some reason latencytop did not really finger ACPI as a cause of stalls (although some acpi stuff does show up but never in the top I rebuilt my kernel after a make clean and wakeup was still not there. It might be a good idea to modify the kernel documentation currently provided with 2.6.27 if it has gone away for now (or document the extra switches needed to turn it on if that's why it didn't show up for ...
Hi Sitsofe, eeePC is known to be affected by EC GPE storm bug, and there is a patch for 2.6.27 to workaround the problem. please look at http://bugzilla.kernel.org/show_bug.cgi?id=11549 for latest patch and bugs 10724/9998 for problem discussion. Thanks, Alex. --
Hi Alexey, Even with http://bugzilla.kernel.org/attachment.cgi?id=17845&action=view (fast transaction) applied the stalls are still present (and seemingly just as bad as before). --
I should note that the patch DOES fix the slow volume hotkey repeat rate that I was seeing. --
Could you cat /proc/interrupts ? --
CPU0 0: 106942 IO-APIC-edge timer 1: 426 IO-APIC-edge i8042 8: 41 IO-APIC-edge rtc0 9: 27346 IO-APIC-fasteoi acpi 12: 6840 IO-APIC-edge i8042 14: 0 IO-APIC-edge ata_piix 15: 203 IO-APIC-edge ata_piix 16: 92357 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel, i915@pci:0000:00:02.0 18: 38195 IO-APIC-fasteoi ath, uhci_hcd:usb4 19: 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 25150 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 NMI: 0 Non-maskable interrupts LOC: 142112 Local timer interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 --
and here are the interrupts from the (seemingly working) EeePC's Xandros
2.6.21.4 install:
CPU0
0: 6486 XT-PIC-XT timer
1: 107 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 0 XT-PIC-XT uhci_hcd:usb2
5: 735 XT-PIC-XT uhci_hcd:usb1, ehci_hcd:usb5
9: 219 XT-PIC-XT acpi
10: 8171 XT-PIC-XT uhci_hcd:usb3, pciehp, wifi0
11: 107 XT-PIC-XT HDA Intel, i915@pci:0000:00:02.0,
uhci_hcd:usb4, pciehp
12: 236 XT-PIC-XT i8042
14: 0 XT-PIC-XT libata
15: 3475 XT-PIC-XT libata
NMI: 0
ERR: 0
and here is the kernel cmdline that the Xandros install uses:
quiet rw vga=785 irqpoll root=/dev/sda1
I've got a feeling I know what direction this is going in...
--
Sitsofe | http://sucs.org/~sits/
--
Right, could you please check if irqpoll is the key to success? Thanks, Alex. --
So I tried just irqpoll (along with nolapic irqpool which changed the output in /proc/interrupts) but it made no difference to the stalling. I'm baffled. -- Sitsofe | http://sucs.org/~sits/ --
Could you please try following patch on top?
Unfortunately it still stalls (hotkeys still seem to be working correctly though). -- Sitsofe | http://sucs.org/~sits/ --
The wake up tracing is CONFIG_SCHED_TRACER. If you do not see "wakeup" as one of the available tracers in available_tracers, then check your dmesg and see if you have CONFIG_FTRACE_STARTUP_TEST enabled. There was a known bug that would cause the wakeup tracing to fail the test. When a trace fails the startup test, it is disabled. -- Steve --
This is exactly the case (dmesg reports the ftrace wakeup startup test failed). That's what I get for not reading my dmesg more carefully... -- Sitsofe | http://sucs.org/~sits/ --
OK, that is probably the known bug you are hitting. Simply disable the CONFIG_FTRACE_STARTUP_TEST and you should have the wakeup tracer. The bug is with the test, not the tracer, so it should not hurt you. -- Steve --
Thanks - this made the wakeup tracer appear a you said. I have put two wakeup traces up: http://sucs.org/~sits/test/eeepc-debug/20080920/latency_trace.txt.gz http://sucs.org/~sits/test/eeepc-debug/20080920/trace.txt.gz (each file is around 6Mbytes uncompressed) Is it intentional that the last event has a time earlier closer to that of the first event? -- Sitsofe | http://sucs.org/~sits/ --
Peter, these times are crazy, mainly due to the cpu_clock. He probably
wants to use the sched_clock. Below is a patch to use it instead.
Sitsofe, I notice that the trace states "desktop". This means that you
are running with CONFIG_PREEMPT_VOLUNTARY. You want
CONFIG_PREEMPT.
Change the config, and see what you get with this patch:
Note this is not compiled tested:
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linus.git/kernel/trace/trace.c
===================================================================
--- linus.git.orig/kernel/trace/trace.c 2008-09-20 17:49:00.000000000 -0400
+++ linus.git/kernel/trace/trace.c 2008-09-20 17:49:35.000000000 -0400
@@ -60,7 +60,7 @@ ns2usecs(cycle_t nsec)
cycle_t ftrace_now(int cpu)
{
- return cpu_clock(cpu);
+ return sched_clock();
}
/*
-- Steve
--
Dagnabit I keep confusing people. This was actually intentional because I wanted to know whether the latency I was seeing should have been present in a non-preempt (but voluntary) kernel. You can see the subject at that start of this thread: http://tinyurl.com/4akxa5 (How how latent should non-preemptive scheduling be?). I'm not running a sound studio where I need the lowest possible latency at all costs. Further my current understanding is that the desktop distros don't tend to ship "regular desktop kernels" with preemption (I know Ubuntu 8.04 and Fedora 9 didn't). Basically I have the following queries: Do you have to have preemption on if you are listening to music (without noticeable skips) and playing the odd game (without noticeable pauses) on a desktop? What's the allowed highest latency going to be over a few minutes in such kernels? Is it simply the case that if it's a non-preemptive kernel latency no longer matters? Just for the record I just tested a preempt kernel I had lying around and the speakter-test -b75000 while looking at battery did not stall. I'll see if I can test the patch tomorrow. Does the config change also have to be made or the timestamps to be "narrower"? -- Sitsofe | http://sucs.org/~sits/ --
milliseconds of stalls is definitely excessive under a non-preempt kernel - and you have up to 500 msecs of stalls, right? The simplest way you can fix such latencies is to look at the function trace, figure out which loop in the kernel takes so long to execute, and add a cond_resched() call to it. [there are other situations where a more complicated fix is needed, but this seems like a simpler scenario.] Ingo --
OK I've made the changes you suggested. Without preempt enabled the last event will have a stamp closer to the first event and the times are very high. With preempt enabled that beahviour has gone. Here are results with preempt enabled: latency: 19657 us, #3268/3268, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0): http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/ath5k/latency_trace.txt latency: 104 us, #182/182, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0) http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/unplug/latency_trace.txt latency: 95 us, #134/134, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0): http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/acpi/latency_trace.txt latency: 91 us, #152/152, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0) http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/touchpad/latency_trace.txt Are these the type of latencies to be expected with preemption on? -- Sitsofe | http://sucs.org/~sits/ --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
