i'm pleased to announce release -v18 of the CFS scheduler patchset. The rolled-up CFS patch against today's -git kernel, v2.6.22-rc5, v2.6.22-rc4-mm2, v2.6.21.5 or v2.6.20.14 can be downloaded from the usual place: http://people.redhat.com/mingo/cfs-scheduler/ The biggest change in -v18 are various performance related improvements. Thomas Gleixner has eliminated expensive 64-bit divisions by converting the arithmetics to scaled math (without impacting the quality of calculations). Srivatsa Vaddagiri and Dmitry Adamushko have continued the abstraction and cleanup work. Srivatsa Vaddagiri and Christoph Lameter fixed the NUMA balancing bug reported by Paul McKenney. There were also a good number of other refinements to the CFS code. (No reproducible behavioral regressions were reported against -v17 so far, so the 'behavioral' bits are mostly unchanged.) Changes since -v17: - implement scaled math speedups for CFS. (Thomas Gleixner) - lots of core code updates, cleanups and streamlining. (Srivatsa Vaddagiri, Dmitry Adamushko, me.) - bugfix: fix NUMA balancing. (Srivatsa Vaddagiri, Christoph Lameter, Paul E. McKenney) - feature: SCHED_IDLE now also implies block-scheduler (CFQ) idle-IO-priority. (suggested by Thomas Sattler, picked up from -ck) - build fix for ppc32. (reported, tested and confirmed fixed by Art Haas) - ARM fix. (reported and debugged by Thomas Gleixner) - cleanup: implemented idle_sched_class in kernel/sched_idletask.c as a way to separate out rq->idle handling out of the core scheduler. This made a good deal of idle-task related special-cases go away. - debug: make the sysctls safer by introducing high and low limits. - cleanup: move some of the debug counters to under CONFIG_SCHEDSTATS. - speedup: various micro-optimizations - various other small updates. As usual, any sort of feedback, bugreport, fix and suggestion is more than welcome! Ingo -
Hi Ingo; 23 Haz 2007 Cts tarihinde, Ingo Molnar =C5=9Funlar=C4=B1 yazm=C4=B1=C5=9Ft= caglar@zangetsu linux-2.6 $ LC_ALL=3DC make CHK include/linux/version.h CHK include/linux/utsrelease.h CALL scripts/checksyscalls.sh CHK include/linux/compile.h CC kernel/sched.o kernel/sched.c:745:28: sched_idletask.c: No such file or directory kernel/sched.c: In function `init_idle_bootup_task': kernel/sched.c:4659: error: `idle_sched_class' undeclared (first use in thi= s=20 function) kernel/sched.c:4659: error: (Each undeclared identifier is reported only on= ce kernel/sched.c:4659: error: for each function it appears in.) kernel/sched.c: In function `init_idle': kernel/sched.c:4698: error: `idle_sched_class' undeclared (first use in thi= s=20 function) kernel/sched.c: In function `sched_init': kernel/sched.c:6196: error: `idle_sched_class' undeclared (first use in thi= s=20 function) make[1]: *** [kernel/sched.o] Error 1 make: *** [kernel] Error 2 Cheers =2D-=20 S.=C3=87a=C4=9Flar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in hou= se!
23 Haz 2007 Cts tarihinde, S.=C3=87a=C4=9Flar Onur =C5=9Funlar=C4=B1 yazm=
Ahh and this happens with [1], grabbing sched_idletask.c from .18 one solve=
s=20
the problem...
Index: linux/kernel/sched_idletask.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=2D-- /dev/null
+++ linux/kernel/sched_idletask.c
@@ -0,0 +1,68 @@
+/*
+ * idle-task scheduling class.
+ *
+ * (NOTE: these are not related to SCHED_IDLE tasks which are
+ * handled in sched_fair.c)
+ */
+
+/*
+ * Idle tasks are unconditionally rescheduled:
+ */
+static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p)
+{
+ resched_task(rq->idle);
+}
+
+static struct task_struct *pick_next_task_idle(struct rq *rq, u64 now)
+{
+ schedstat_inc(rq, sched_goidle);
+
+ return rq->idle;
+}
+
+/*
+ * It is not legal to sleep in the idle task - print a warning
+ * message if some code attempts to do it:
+ */
+static void
+dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep, u64 now)
+{
+ spin_unlock_irq(&rq->lock);
+ printk(KERN_ERR "bad: scheduling from the idle thread!\n");
+ dump_stack();
+ spin_lock_irq(&rq->lock);
+}
+
+static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, u6=
4=20
now)
+{
+}
+
+static struct task_struct *load_balance_start_idle(struct rq *rq)
+{
+ return NULL;
+}
+
+static void task_tick_idle(struct rq *rq, struct task_struct *curr)
+{
+}
+
+/*
+ * Simple, special scheduling class for the per-CPU idle tasks:
+ */
+struct sched_class idle_sched_class __read_mostly =3D {
+ /* no enqueue/yield_task for idle tasks */
+
+ /* dequeue is not valid, we print a debug message there: */
+ .dequeue_task =3D dequeue_task_idle,
+
+ .check_preempt_curr =3D check_preempt_curr_idle,
+
+ .pick_next_task =3D pick_next_task_idle,
+ .put_prev_task =3D ...oops, indeed - i've fixed up the -git patch: http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-git-v18.patch Ingo -
So I locally generated the diff to take -mm up to the above version of CFS.
- sys_sched_yield_to() went away? I guess I missed that.
- Curious. the simplification of task_tick_rt() seems to go only
halfway. Could do
if (p->policy != SCHED_RR)
return;
if (--p->time_slice)
return;
/* stuff goes here */
- dud macro:
#define is_rt_policy(p) ((p) == SCHED_FIFO || (p) == SCHED_RR)
It evaluates its arg twice and could and should be coded in C.
There are a bunch of other don't-need-to-be-implemented-as-a-macro
macros around there too. Generally, I suggest you review all the
patchset for macros-which-don't-need-to-be-macros.
- Extraneous newline:
enum cpu_idle_type
{
- Style thing:
struct sched_entity {
struct load_weight load; /* for nice- load-balancing purposes */
int on_rq;
struct rb_node run_node;
unsigned long delta_exec;
s64 delta_fair;
u64 wait_start_fair;
u64 wait_start;
u64 exec_start;
u64 sleep_start, sleep_start_fair;
u64 block_start;
u64 sleep_max;
u64 block_max;
u64 exec_max;
u64 wait_max;
u64 last_ran;
s64 wait_runtime;
u64 sum_exec_runtime;
s64 fair_key;
s64 sum_wait_runtime, sum_sleep_runtime;
unsigned long wait_runtime_overruns, wait_runtime_underruns;
};
I think the one-definition-per-line style is better than the `unsigned
long foo,bar,zot,zit;' thing. Easier to read, easier to read subsequent
patches and it leaves more room for a comment describing what the field
does.
- None of these fields have comments describing what they do ;)
- __exit_signal() does apparently-unlocked 64-bit arith. Is there some
implicit locking here or do we not care about the occasional race-induced
inaccuracy?
(ditto, lots of places, I expect)
(Gee, there's shitloads of 64-bit stuff in there. Does it all _really_
need to be 64-bit on 32-bit?)
- weight_s64() (what does this do?) looks too big to inline on 32-bit.
- update_stats_enqueue() looks too big to ...thx. I released a diff against mm2: http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc4-mm2-v18.patch but indeed the -git diff serves you better if you updated -mm to Linus' latest. yep. Nobody tried it and sent any feedback on it, it was causing patch-logistical complications both in -mm and for packagers that bundle CFS (the experimental-schedulers site has a CFS repo and Fedora rawhide started carrying CFS recently as well), and i dont really agree with adding yet another yield interface anyway. So we can and should do this do you mean the tsk->se.sum_exec_runtime addition, etc? That runs with yes - CFS is fundamentally designed for 64-bit, with still pretty OK SCHED_LOAD_SCALE is the smpnice stuff. CFS reuses that and also makes it clear via this define that a nice-0 task has a 'load' contribution to the CPU as of NICE_0_LOAD. Sometimes, when doing smpnice load-balancing calculations we want to use 'SCHED_LOAD_SCALE', sometimes we want to yep, the plan is to put this all into reciprocal_div.h and to convert existing users of reciprocal_div to the cleaner stuff from Thomas. The this is a reasonable tradeoff i think - update_curr_load()'s slowpath is in __update_curr_load(). Anyway, it probably wont get inlined when the these are mostly ancient macros. I fixed up some of them in my current yep - i'll revisit the inlining picture. This is not really a primary worry i think because it's easy to tweak and people can already express their inlining preference via CONFIG_CC_OPTIMIZE_FOR_SIZE and the main reason is the sched debugging stuff: text data bss dec hex filename 37570 2538 20 40128 9cc0 kernel/sched.o 30692 2426 20 33138 8172 kernel/sched-no_sched_debug.o i can make it depend on CONFIG_SCHEDSTATS, although i'd prefer it to be it doesnt really matter, i fixed them to be initialized to the same 'now' value. i've attached my current ...
I forget ;) There seemed to be rather a lot of 64-bit addition with no It may have been designed for 64-bit, but was that the correct design? The cost on 32-bit appears to be pretty high. Perhaps a round of uninlining That would serve to explain the 18% growth on x86_64. But why did i386 grow by much more: 29%? I'd be suspecting all the new 64-bit arithmetic. -
this is what i see on 32-bit:
text data bss dec hex filename
28732 3905 24 32661 7f95 kernel/sched.o-vanilla
37986 2538 20 40544 9e60 kernel/sched.o-v18
31092 2426 20 33538 8302 kernel/sched.o-v18-no_sched_debug
text is larger but data got smaller. While they are not equivalent in
function, the two almost even out each other (and that's without any of
the uninlining that is in v19). In fact, there's a 1.5K per CPU percpu
data size win with CFS, which is not visible in this stat. So on
agreed, i've done one more round of uninlining.
Ingo
-
Humm, problem methinks. Applying the patch, with 2.6.22-rc5 applied to 2.6.21 completed, from my script: now applying patch sched-cfs-v2.6.22-rc5-v18.patch patching file Documentation/kernel-parameters.txt patching file Documentation/sched-design-CFS.txt patching file Makefile patching file arch/i386/kernel/smpboot.c patching file arch/i386/kernel/tsc.c patching file arch/ia64/kernel/setup.c patching file arch/mips/kernel/smp.c patching file arch/sparc/kernel/smp.c patching file arch/sparc64/kernel/smp.c patching file block/cfq-iosched.c patching file fs/proc/array.c patching file fs/proc/base.c patching file include/asm-generic/bitops/sched.h patching file include/linux/hardirq.h patching file include/linux/sched.h patching file include/linux/topology.h patching file init/main.c patching file kernel/delayacct.c patching file kernel/exit.c patching file kernel/fork.c patching file kernel/posix-cpu-timers.c patching file kernel/sched.c patching file kernel/sched_debug.c patching file kernel/sched_fair.c patching file kernel/sched_idletask.c patching file kernel/sched_rt.c patching file kernel/sched_stats.h patching file kernel/softirq.c patching file kernel/sysctl.c The next patch would delete the file l/kernel/sched.c, which does not exist! Assume -R? [n] How to proceed? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Even more amazing was the realization that God has Internet access. I wonder if He has a full newsfeed? -- Matt Welsh -
answering n for all that, I note the build, at the end of the make bzImage, spits out this: MODPOST vmlinux WARNING: arch/i386/kernel/built-in.o(.text+0x845d): Section mismatch: reference to .init.text:amd_init_mtrr (between 'mtrr_bp_init' and 'mtrr_save_state') WARNING: arch/i386/kernel/built-in.o(.text+0x8462): Section mismatch: reference to .init.text:cyrix_init_mtrr (between 'mtrr_bp_init' and 'mtrr_save_state') WARNING: arch/i386/kernel/built-in.o(.text+0x8467): Section mismatch: reference to .init.text:centaur_init_mtrr (between 'mtrr_bp_init' and 'mtrr_save_state') WARNING: arch/i386/kernel/built-in.o(.text+0x9284): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x9298): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x92bc): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') But then proceeds with the make modules stage. I believe I've seen references to this in other threads. Is It Serious? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Debug is human, de-fix divine. -
Hi,
I'm running -v18 on 2.6.22-rc5, no problems so far. How can I change a
task to SCHED_IDLE or SCHED_BATCH priority under CFS?
Thanks,
~ Antonio
-
pick up schedtool, and these are the choices it gives:
-N for SCHED_NORMAL
-F -p PRIO for SCHED_FIFO only as root
-R -p PRIO for SCHED_RR only as root
-B for SCHED_BATCH
-I -p PRIO for SCHED_ISO
-D for SCHED_IDLEPRIO
then for example to start up something as SCHED_IDLE:
schedtool -D -e ./somecommand.sh
Ingo
-
Thank you very much! I was thinking that schedtool was suitable only for -ck. I've installed schedtool and it works fine. Anyway, I've discovered with great pleasure that CFS has also the SCHED_ISO priority. I may have missed something, but I don't remember to have read this in any of the CFS release notes :). For me this is a really useful feature. Thanks. Regards, -
well, it's only a hack and emulated: SCHED_ISO in CFS is recognized as a policy but it falls back to SCHED_NORMAL. Could you check how well this (i.e. SCHED_NORMAL) works for your workload, compared to SD's SCHED_ISO? If you'd like to increase the priority of a task, i'd suggest to use negative nice levels. (use the 'nice' option in /etc/security/limits.conf with a newer version of PAM to allow unprivileged users to use negative nice levels.) Ingo -
To be fair, my workload is not really "critical". I'm used to
skip-free audio listening (no matter what) since long time running my
audio player with SCHED_ISO. Even in mainline the skips aren't so
frequent, but still annoying. I'm using SCHED_ISO for the confidence
it gives in providing skip-free audio.
For my modest needs also CFS SCHED_NORMAL has been just fine (in these
latest days). I'll report if I can find a more critical workload that
can possibly stress CFS SCHED_NORMAL.
Regards,
~ Antonio
-
Hi Ingo, Today I had a little time to try CFS again (last time it was -v9!). I ran it on top of 2.6.20.14, and simply tried ocbench again. You remember ? With -v9, I ran 64 processes which all progressed very smoothly. With -v18, it's not the case anymore. When I run 64 processes, only 7 of them show smooth rounds, while all the other ones are only updated once a second. Sometimes they only progress by one iteration, sometimes by a full round. Some are even updated once ever 2 seconds, because if I drag an xterm above them and quickly remove it, the xterm leaves a trace there for up to 2 seconds. Also, only one of my 2 CPUs is used. I see the rq vary between 1 and 5, with a permanent 50% idle... : procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 0 874400 7864 90436 0 0 0 0 279 2204 50 0 50 3 0 0 0 874408 7864 90436 0 0 0 0 273 2122 50 1 50 1 0 0 0 874408 7864 90436 0 0 0 0 253 1660 49 1 50 3 0 0 0 874408 7864 90436 0 0 0 0 252 1977 50 0 50 2 0 0 0 874408 7864 90436 0 0 0 0 253 2274 49 1 50 3 0 0 0 874408 7864 90436 0 0 0 0 252 1846 49 1 50 1 0 0 0 874408 7864 90436 0 0 0 0 339 1782 49 1 50 I have no idea about what version brought that unexpected behaviour, but it's clearly something which needs to be tracked down. My scheduler is at 250 Hz, and here are the values I found in /proc/sys/kernel: root@pcw:kernel# grep '' sched_* sched_batch_wakeup_granularity_ns:40000000 sched_child_runs_first:1 sched_features:14 sched_granularity_ns:10000000 sched_runtime_limit_ns:40000000 sched_stat_granularity_ns:0 sched_wakeup_granularity_ns:4000000 I have tried to change each of them, with absolutely no effect. Seems really strange. Unfortunately, I have to leave right ...
hm, the two problems might be related. Could you try v17 perhaps? In v18 i have 'unified' all the sched.c's between the various kernel releases, maybe that brought in something unexpected on 2.6.20.14. (perhaps try could you send me the file the cfs-debug-info.sh script produced. You can pick the script up from: http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh (i'd suggest to send it in private mail, output is large and detailed. If your kernel has no /proc/config.gz then please send me your .config file too.) Ingo -
Hi Ingo, Well, forget this, I'm nuts. I'm sorry, but I did not set any of the -R and -S parameter on ocbench, which means that all the processes ran at full speed and did not sleep. The load distribution was not fair, but since they put a lot of stress on the X server, I think it might be one of the reasons for the unfairness. I got the same behaviour with -v17, -v9 and even 2.4 ! It told me something was wrong on my side ;-) I've retried with 50%/50% run/sleep, and it now works like a charm. It's perfectly smooth with both small and long run/sleep times (between 1 and 100 ms). I think that with X saturated, it might explain why I only had one CPU running at 100% ! OK I got it, but I've not run it since the problem was between the keyboard and the chair. If you want an output anyway, I can give it a run. Sorry again for the wrong alert. regards, willy -
ah, great! :-) My testbox needs a 90% / 10% ratio between sleep/run for an 8x8 matrix of ocbench tasks to not overload the X server. Once the overload happens X starts penalizing certain clients it finds abusive (i think), and that mechanism seems to be wall-clock based and it thus brings in alot of non-determinism and skews the clients. Ingo -
- vin -
Ingo,
I've accidentally discovered a problem with -v18.
Some time ago, I wrote a small program to prevent my laptop from entering
low-power mode, and noticed that after upgrading my laptop's kernel from
2.4.20.9+cfs-v6 to 2.4.20.14+cfs-v18, it completely freezes if I run this
program.
The program is trivial, it just sets its prio to nice +20 and forks a busy
loop. I've added the ability to stop the loop after a user-definable number
of iterations, and I can confirm that it unfreezes when the loop ends. I'm
not even root when I run it.
Everything freezes, including the frame-buffer. It's not 100% reproducible, I
would say 90% only. Sometimes it requires a few seconds before freezing. It
*seems* to me that running another task in parallel (such as vmstat) increases
its chances to freeze. It seems like nicing to +19 does not cause any trouble.
I've tried it on my dual athlon, and with 1 process I see occasional pauses of
1 or 2 seconds, and with 2 processes, I see fairly larger pauses.
Here's the trivial program. Probably you'll find an obvious bug.
Regards,
Willy
---
/*
* cfs-freeze.c
* Fork a busy loop running with idle prio. This often results
* in complete freezes with CFS-v18.
*
* $ gcc -O2 -s -o cfs-freeze cfs-freeze.c
* $ ./cfs-freeze
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <sys/time.h>
#include <sys/resource.h>
int main(int argc, char **argv) {
struct sched_param sch;
long long i;
if (argc > 1)
i = atoll(argv[1]);
if (i <= 0)
i = 4 * 1000 * 1000 * 1000ULL;
sch.sched_priority = 0;
sched_setscheduler(0, SCHED_OTHER, &sch);
setpriority(PRIO_PROCESS, 0, 20);
if (fork() == 0)
while (i--);
return 0;
}
---
-
hm, i tried your test-app and it causes no problems here. (which is not a surprise - your app starts a nice +19 busy loop, which is one of the common tests i do here too.) To further debug this, could you try to create a 'high priority shell' on a text console (i.e. not under X) that is SCHED_FIFO prio 98? Something like: chrt -f -p 98 $$ should do the trick. And then run this script: http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh before the test, during the test and after the test, from the high-prio shell session. (the shell runs at SCHED_FIFO, so the expectation would be for that it will be able to run during the test too) Then please send me the resulting 3 debug files. Thanks, Ingo -
Willy,
could you check whether your current v18 CFS tree has the fix below
included? I discovered it right after having released v18 so i updated
the v18 files in place - but maybe you downloaded an early version? I
thought it's relatively harmless, that it would only affect SCHED_IDLE
tasks, but maybe it affects nice +19 tasks too on your box!
Ingo
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -342,8 +342,9 @@ update_stats_enqueue(struct cfs_rq *cfs_
s64 tmp;
if (se->wait_runtime < 0) {
- tmp = (0 - se->wait_runtime) << NICE_0_SHIFT;
- key += (tmp * se->load.inv_weight) >> WMULT_SHIFT;
+ tmp = -se->wait_runtime;
+ key += (tmp * se->load.inv_weight) >>
+ (WMULT_SHIFT - NICE_0_SHIFT);
} else {
tmp = se->wait_runtime * se->load.weight;
key -= tmp >> NICE_0_SHIFT;
-
Hi Ingo, Good catch, it was the cause of the problem. I've just applied your fix below and rebuilt and the system behaves perfectly now. Thanks very much ! Willy -
Hello,
I have been running cfs-v18 for a couple of days now, and today I
stumbled upon a rather strange problem. Consider the following short
program:
while(1)
printf("%ld\r", 1000 * clock() / CLOCKS_PER_SEC);
Running this in an xterm makes the xterm totally unresponsive. Ctrl-C
takes about two seconds to terminate the program, during which the
program will keep running. In fact, it seems that the longer it runs,
the longer it takes to terminate (towards 5 seconds after running for
a couple of minutes). This is rather surprising, as the rest of the
system is quite responsive (even remarkably so). I think this is also
in contrast with the expected behaviour, that Ctrl-C/program
termination should be prioritized somehow.
Some other observations: X.Org seems to be running at about 75% CPU on
CPU 1, the xterm at about 45% on CPU 0, and a.out at about 20% on CPU
0. (HT processor)
Killing with -2 or -9 from another terminal works immediately. Ctrl-Z
takes the same time as Ctrl-C.
Another thing to note is that simply looping with no output retains
the expected responsiveness of the xterm. Printing i++ is somewhere
halfway in between.
Is this behaviour expected or even intended? My main point is that
Ctrl-C is a safety fallback which suddenly doesn't work as usual. I
might even go so far as to call it a regression.
I'd also like to point out that Folding@Home seems to draw more CPU
than it should. Or, at least, in top, it shows up as using 50% CPU
even though other processes are demanding as much as they can get. The
FAH program should be running with idle priority. I expect it to fall
to near 0% when other programs are running at full speed, but it keeps
trotting along. And I am pretty sure that this is not due to SMP/HT (I
made sure to utilize both CPUs).
Lastly, I'd like to mention that I got BUGs (soft lockups) with -v8,
though it has not been reproducible with -v18, so I suppose it must
have been fixed already.
Otherwise, I am satisfied with the ...Is it running with a default (0) nice value? could you please run the following script when your application is running? As you have pointed out : "... In fact, it seems that the longer it runs, the longer it takes to terminate (towards 5 seconds after running for a couple of minutes ..." , please run the script a few times : say, before starting up your application, 10 sec. after it's got started, 1 minute, a few minute... http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh then send us the resulting files. TIA, -- Best regards, Dmitry Adamushko -
Resulting files at http://vegard.afraid.org:1104/pub/cfs/ cfs-debug-info-2007.07.02-15:18:13 Before running program cfs-debug-info-2007.07.02-15:19:51 ~10 secs after start cfs-debug-info-2007.07.02-15:20:54 ~1 minute after start cfs-debug-info-2007.07.02-15:25:52 ~5 minutes after start cfs-debug-info-2007.07.02-15:30:54 ~10 minutes after start a.out is my program, FahCore_78 is the f@h client. Hope this helps. Vegard -
thx. As an initial matter, could you double-check whether your v18
kernel source has the patch below applied already?
Ingo
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -342,8 +342,9 @@ update_stats_enqueue(struct cfs_rq *cfs_
s64 tmp;
if (se->wait_runtime < 0) {
- tmp = (0 - se->wait_runtime) << NICE_0_SHIFT;
- key += (tmp * se->load.inv_weight) >> WMULT_SHIFT;
+ tmp = -se->wait_runtime;
+ key += (tmp * se->load.inv_weight) >>
+ (WMULT_SHIFT - NICE_0_SHIFT);
} else {
tmp = se->wait_runtime * se->load.weight;
key -= tmp >> NICE_0_SHIFT;
-
ok. Does the xterm slowdown get any better if you do: echo 46 > /proc/sys/kernel/sched_features ? The default on v18 is: echo 14 > /proc/sys/kernel/sched_features Ingo -
No. The Ctrl-C still hangs between 1 and 3 seconds, again seemingly depending on how long the program runs before I terminate it. Vegard -
Hi, This doesn't appear to be a CFS problem. I can reproduce the problem easily in virgin 2.6.22-rc7 by starting xterm-spam at nice -1 or better. As soon as xterm-spam can get enough CPU to keep the xterm fully busy, it's game over, the xterm freezes. The more accurate fairness of CFS to sleepers just tips the balance quicker. In mainline, the xterm has an unfair advantage and maintains it indefinitely... until you tip the scales just a wee bit, at which time it inverts. -Mike -
ah. That indeed makes sense. It seems like the xterm doesnt process the
Ctrl-C/Z keypresses _at all_ when it is 'spammed' with output. Normally,
output 'spam' is throttled by the scroll buffer's overhead. But in
Vegard's case, the printout involves a \r carriage return:
printf("%ld\r", 1000 * clock() / CLOCKS_PER_SEC);
which allows xterm-spam (attached) to easily flood the xterm (without
any scrolling that would act as a throttle) and the xterm to flood Xorg.
I suspect we need the help of an xterm/Xorg expert? (maybe Keith can
give us further pointers? I can reproduce the problem on a T60 with i940
and Core2Duo running Fedora 7 + Xorg 7.1.)
Ingo
It's just an Xterm bug. Xterm will look for X input if it ever manages to fill the input buffer past 32768 bytes. If it manages to get more than 4096 bytes in one read, it will invoke sched_yield. and then check for input. Gotta love that sched_yield call. As it always processes all of the incoming data before trying to read again, there doesn't appear to be any way it can ever have more than 32768 characters in the buffer.=20 And, as the kernel will not buffer more than 4095 bytes from a pty, there isn't any way it will ever read 4096 bytes. So, it sits there carefully reading every byte from the pty and painting them on the screen. You can 'fix' xterm with: $ xterm -xrm '*minBufSize: 4095' I hesitate to even suggest a patch to xterm that would solve this problem correctly. Note that xterm has kludges in several of the output processing steps which explicitly look for input (most vertical cursor motion, it seems), which is why any application which scrolls doesn't cause this problem. Do you need more reasons to switch to another terminal emulator? gnome-terminal has finally gotten reasonable; I expect rxvt or konsole would work just as well. --=20 keith.packard@intel.com
yeah, i use gnome-terminal exclusively. But testers looking for CFS regressions do run every shell on the planet :-) gnome-terminal is also faster all around (at least on my box): $ (echo '#!/bin/bash' ; echo 'for ((i=0; i<100000; i++)); do echo $i; done') > 1.sh; chmod +x 1.sh; time xterm $HOME/1.sh; time gnome-terminal -x ./1.sh real 0m3.193s user 0m2.840s sys 0m0.460s real 0m2.495s user 0m2.430s sys 0m1.520s Ingo -
Xorg seems to have a couple of starvation issues. e.g. I found the Gantt view in icemon during a busy compile session can starve all other X clients for tenths of seconds. -Andi -
This sounds as though it might be related to the issues I see with my "glitch1" script, posted here a while ago. With cfs-v18 the effect of having multiple xterms scrolling is obvious, occasionally they behave as if they were "owed" more CPU and get paid back all at once. I've seen this effect to one degree or another since cfs-v13, which did NOT show I think this is because the shell to read the keypress is getting high latency, rather than the process taking a long time to react. I have been wrong before... I read Ingo's reply to this, I'll gather the same information when the -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot -
your FAH processes are running with nice +19 - that should be enough to
throttle them. With v18 you can also mark it SCHED_IDLE:
schedtool -D $$ # mark the shell idle
SCHED_IDLE gets inherited by child tasks so if you mark the shell that
starts up FEH as SCHED_IDLE, all FEH threads should be SCHED_IDLE too.
(or you can start it up via schedtool -D -e ... )
does it still get more CPU time than you'd expect it to get? A reniced
or SCHED_IDLE task will 'fill in' any idle time that it senses, so in
itself it's not an anomaly if a task gets 50% and FEH fills in the
remaining 50%. Does it still get CPU time if you start two CPU hogs:
for (( N=0; N < 2; N++ )); do ( while :; do :; done ) & done
great! :-)
Ingo
-
