Hi!
My computer suffers from high load average when the system is idle,
introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .Long story:
2.6.20 and all later versions I've tested, including 2.6.21 and
2.6.22, make the load average high. Even when the computer is totally
idle (I've tested in single user mode), the load average end up
at ~0.30. The computer is still responsive, and the only fault seems
to be the too high load average. All versions up to and including
2.6.19.7 is fine, and don't suffer from the problem.I git bisect between 2.6.19 and 2.6.20 gave me
44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
rounding code: JBD" as the first patch with the
problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
reverted also works fine.This fixes the problem:
-------------------------- fs/jbd/transaction.c -----------------------------
index cceaf57..d38e0d5 100644
@@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t *transaction)
spin_lock_init(&transaction->t_handle_lock);/* Set up the commit timer for the new transaction. */
- journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
+ journal->j_commit_timer.expires = transaction->t_expires;
add_timer(&journal->j_commit_timer);J_ASSERT(journal->j_running_transaction == NULL);
I've only seen this problem on my home desktop computer. My work
desktop computer and several other computers at work don't suffer from
this problem. However, all other computers I've tested on is using
AMD64 as architecture, and not i386 as my home desktop computer.Please let me know how I can assist in further debugging of this, if
needed.System info:
A Debian stable system with ABIT KV7 MB, VIA KT600 chipset,
Athlon XP 1500+ CPU, GeForce DDR and Atheros AR5212 wlan
board. Details below.I've tested without nvidia and the madwifi modu...
Another datapoint: I observe a similar effect on both of my alphas:
top - 09:30:43 up 13 min, 18 users, load average: 0.65, 0.64, 0.44
Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1% us, 0.5% sy, 0.0% ni, 99.1% id, 0.2% wa, 0.1% hi, 0.0% si
Mem: 2067792k total, 55792k used, 2012000k free, 4160k buffers
Swap: 1048560k total, 0k used, 1048560k free, 18752k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
637 root 15 0 2904 1552 1192 R 1 0.1 0:01.35 top
556 root 15 0 2008 528 432 S 0 0.0 0:00.01 gpm
1 root 15 0 1960 800 680 S 0 0.0 0:01.43 init
2 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
4 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
7 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1This is the dual-ev6 one, currently 2.6.22-rc5.
I'll check this evening.
Bye,
Thorsten--
| Thorsten Kranzkowski Internet: dl8bcu@dl8bcu.de |
| Mobile: ++49 170 1876134 Snail: Kiebitzstr. 14, 49324 Melle, Germany |
| Ampr: dl8bcu@db0lj.#rpl.deu.eu, dl8bcu@marvin.dl8bcu.ampr.org [44.130.8.19] |
-
can you tell me if you're tuning ext3 in any way to have a much
shorter timeout than the standard 5 seconds?
-
>>>>> "AvdV" == Arjan van de Ven <arjan@linux.intel.com> writes:
AvdV> Anders Boström wrote:
>> Hi!
>>
>> My computer suffers from high load average when the system is idle,
>> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
>>
>> Long story:
>>
>> 2.6.20 and all later versions I've tested, including 2.6.21 and
>> 2.6.22, make the load average high. Even when the computer is totally
>> idle (I've tested in single user mode), the load average end up
>> at ~0.30. The computer is still responsive, and the only fault seems
>> to be the too high load average. All versions up to and including
>> 2.6.19.7 is fine, and don't suffer from the problem.AvdV> can you tell me if you're tuning ext3 in any way to have a much
AvdV> shorter timeout than the standard 5 seconds?No, I'm using the standard 5 seconds timeout.
/ Anders
-
On Tue, 02 Oct 2007 23:37:31 +0200 (CEST)
This is unexpected. High load average is due to either a task chewing a
lot of CPU time or a task stuck in uninterruptible sleep.Can you please work out which of these is happening? Run `top' on an idle
system. Is the CPU less than 1% loaded?Run
ps aux | grep " D"
or something like that on an idle system, see if you can spot a task which
is spending time in D state.If there's a task whcih is spending time in D state, try running
echo w > /proc/sysrq-trigger ; dmesg -c > foo
the check "foo" to see if it has a task in D state (search foo for " D ").
If it's not there, do the sysrq again, repeat until you've managed to
capture a trace of the blocked task.If it turns out that the CPU really is spending excess amounts of time
being busy then a kernel profile would be a good way of finding out where
it is spinning. Or run sysrq-P from the keyboard a few times.-
>>>>> "AM" == Andrew Morton <akpm@linux-foundation.org> writes:
AM> On Tue, 02 Oct 2007 23:37:31 +0200 (CEST)
AM> Anders Bostr__m <anders@bostrom.dyndns.org> wrote:>> My computer suffers from high load average when the system is idle,
>> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
>>
>> Long story:
>>
>> 2.6.20 and all later versions I've tested, including 2.6.21 and
>> 2.6.22, make the load average high. Even when the computer is totally
>> idle (I've tested in single user mode), the load average end up
>> at ~0.30. The computer is still responsive, and the only fault seems
>> to be the too high load average. All versions up to and including
>> 2.6.19.7 is fine, and don't suffer from the problem.
>>
>> I git bisect between 2.6.19 and 2.6.20 gave me
>> 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
>> rounding code: JBD" as the first patch with the
>> problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
>> works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
>> reverted also works fine.
>>
>> This fixes the problem:
>>
>> -------------------------- fs/jbd/transaction.c -----------------------------
>> index cceaf57..d38e0d5 100644
>> @@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t *transaction)
>> spin_lock_init(&transaction->t_handle_lock);
>>
>> /* Set up the commit timer for the new transaction. */
>> - journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
>> + journal->j_commit_timer.expires = transaction->t_expires;
>> add_timer(&journal->j_commit_timer);
>>
>> J_ASSERT(journal->j_running_transaction == NULL);
>>
>>
>> I've only seen this pr...
Or, everybody wakes up at once right when we are taking a sample. :)
-
nice try but we sample every timer tick; this code being timer driven
makes it what you say it is regardless of *which* timer tick it
happens at ;)
-
But we reduce the number of samples because some ticks just never
happen when the timers get rounded:No rounding:
tick ............... tick
1 running 1 runningRounded:
tick
2 runningIn the first case the average is 1, but it's 2 in the second.
-
In fact, I think this is it!
The load average is not calculated every tick, because that's not just
expensive, but we also want to have some time-based decay. So it's
calculated every LOAD_FREQ ticks.And guess what: LOAD_FREQ is defined to be exactly five seconds.
So imagine if the timer gets to be in sync with another event that happens
every five seconds - let's pick at random a 5-second JBD transaction
thing?Anders - does this idiotic patch make a difference for you?
Without this, I can easily imagine that the rounding code tends to try to
round to an even second, and the load-average code generally also runs at
even seconds!Linus
---
include/linux/sched.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)diff --git a/include/linux/sched.h b/include/linux/sched.h
index a01ac6d..643de0f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,7 +113,7 @@ extern unsigned long avenrun[]; /* Load averages */#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
-#define LOAD_FREQ (5*HZ) /* 5 sec intervals */
+#define LOAD_FREQ (5*HZ+1) /* ~5 sec intervals */
#define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5 2014 /* 1/exp(5sec/5min) */
#define EXP_15 2037 /* 1/exp(5sec/15min) */
-
>>>>> "LT" == Linus Torvalds <torvalds@linux-foundation.org> writes:
LT> On Wed, 3 Oct 2007, Chuck Ebbert wrote:
>>
>> But we reduce the number of samples because some ticks just never
>> happen when the timers get rounded:
>>
>> No rounding:
>>
>> tick ............... tick
>> 1 running 1 running
>>
>> Rounded:
>>
>> tick
>> 2 running
>>
>> In the first case the average is 1, but it's 2 in the second.LT> In fact, I think this is it!
LT> The load average is not calculated every tick, because that's not just
LT> expensive, but we also want to have some time-based decay. So it's
LT> calculated every LOAD_FREQ ticks.LT> And guess what: LOAD_FREQ is defined to be exactly five seconds.
LT> So imagine if the timer gets to be in sync with another event that happens
LT> every five seconds - let's pick at random a 5-second JBD transaction
LT> thing?LT> Anders - does this idiotic patch make a difference for you?
Yes, it does, it fixes the load average!!! I guess we have something
here!Why does this problem only show up on my computer? Any idea?
/ Anders
LT> Without this, I can easily imagine that the rounding code tends to try to
LT> round to an even second, and the load-average code generally also runs at
LT> even seconds!LT> Linus
LT> ---
LT> include/linux/sched.h | 2 +-
LT> 1 files changed, 1 insertions(+), 1 deletions(-)LT> diff --git a/include/linux/sched.h b/include/linux/sched.h
LT> index a01ac6d..643de0f 100644
LT> --- a/include/linux/sched.h
LT> +++ b/include/linux/sched.h
LT> @@ -113,7 +113,7 @@ extern unsigned long avenrun[]; /* Load averages */LT> #define FSHIFT 11 /* nr of bits of precision */
LT> #define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
LT> -#define LOAD_FR...
not sure this is going to help; I mean, the load gets only updated in
actual timer interrupts... and on a tickless system there's very few
of those around..... and usually at places round_jiffies() already put
a timer on.(also.. one thing that might make Chuck's theory wrong is that the
sampling code doesn't sample timer activity since that's run just
after the sampler in the same irq)
-
Yeah, you're right. Although in practice, at least on a system running
X, I'd expect that there still is lots of other timers going on, hiding
the issue.Hmm. Maybe Anders' problem stems partly from the fact that he really is
using the tweaks to make that tickless theory more true than it tends to
be on most systems?Linus
-
eh not really; on a normal distro desktop you maybe have 10
we fixed a TON of stuff over the last months.. standard desktops (F8 /
next Ubuntu) will be around 10 wakeups/sec, in a lab environment you
can get below 2 ;)
-
Not necessarily.
We saw high loadaverages with the timer bogosity with "gettimeofday()" and
"select()" not agreeing, so they would do things likedate = time(..)
select(.. , timeout = <time + 1> )and when "date" wasn't taking the jiffies offset into account, and thus
mixing these kinds of different time sources, the select ended up
returning immediately because they effectively used different clocks, and
suddenly we had some applications chewing up 30% CPU time, because they
were in a loop that *tried* to sleep.And I wonder if the same kind thing is effectively happening here: the
code is written so that it *tries* to sleep, but the rounding of the clock
basically means that it's trying to sleep using a different clock than the
one we're using to wake things up with, so some percentage of the time it
doesn't sleep at all!I wonder if the whole "round_jiffies()" thing should be written so that it
never rounds down, or at least never rounds down to before the current
second!I have to say, I also think it's a bit iffy to do "round_jiffies()" at all
in that per-CPU kind of way. The "per-cpu" thing is quite possibly going
to change by the time we actually add the timer, so the goal of trying to
get wakeups to happen in "bunches" per CPU should really be done by
setting a flag on the timer itself - so that we could do that rounding
when the timer is actually added to the per-cpu queues!Now, I think the JBD "t_expires" field should never be "near" in seconds,
so I do find it a bit surprising that this rounding can have any effect,
but on the other hand it clearly *does* have some effect, so.. It migt
just be interacting with some other use, of course.Linus
-
we're talking about a timer that (normally) is 5 seconds.
-
...
On a related note, {set/get}itimer() currently are buggy (since 2.6.11 or so),
also due to this round_jiffies() thing I believe.If one sets ITIMER_PROF to, say, 5.000000 seconds, and then reads it back
very shortly thereafter, it will give 5.200000 seconds as the value (HZ==1000).AFAIK, this should *never* be possible --> any read of get_itimer should never
return a value higher than the starting value. This makes ITIMER_PROF not very
useful for measuring one's own CPU usage, for example.Cheers
-
I very much believe that it is totally unrelated... most of all since
round_jiffies() wasn't in the kernel then an also isn't used anywhere
near these timers.
-
Ah, yes, you're correct. The itimer routines do their *own* rounding.
-ml
-
that's what it is supposed to do already...
166
167 if (j <= jiffies) /* rounding ate our timeout entirely; */
168 return original;
169 return j;
170 }it's pretty much the same thing though
-
| H. Peter Anvin | Re: [RFC 00/15] x86_64: Optimize percpu accesses |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Eric W. Biederman | Remaining straight forward kthread API conversions... |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
git: | |
