Hi Ingo, Peter Under kernel compilation load (make -j8) audio is very much unusable on current linux-2.6.git - The stutter is so bad I can hardly tell what song I am listening to :) Even with make -j4 it still skips but not that badly. I am running Ubuntu Hardy (Pulseaudio) on a quad core machine and switching back to Ubuntu shipped kernel (2.6.24-16) resolves the issue totally. .config below. Revert of e22ecef1d2658ba54ed7d3fdb5d60829fb434c23 fixed similar issue. Parag # # Automatically generated make config: don't edit # Linux kernel version: 2.6.25 # Thu May 1 19:47:33 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_DEFCONFIG_LIST="arch/x86/configs/x86_64_defconfig" # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not ...
So you're saying reverting e22ecef1 fixes the issue? - as have others.
The initial rationale for the change was: vruntime and walltime are
related by a factor of rq->weight, hence compensate for that.
However that in itself has a flaw, we're comparing vruntime to process
runtime time, _NOT_ walltime. And vruntime and process runtime do not
have that factor so the compensation is unneeded.
However that still leaves me puzzled on why it would cause such bad
regression. The current code reads like:
/*
* delta *= w / rw
*/
static inline unsigned long
calc_delta_weight(unsigned long delta, struct sched_entity *se);
/* sleeps upto a single latency don't count. */
if (sched_feat(NEW_FAIR_SLEEPERS)) {
if (sched_feat(NORMALIZED_SLEEPER))
vruntime -= calc_delta_weight(sysctl_sched_latency, se);
else
vruntime -= sysctl_sched_latency;
}
So what we do is _shrink_ the bonus the busier we get (the more tasks we
get, the higher the total runqueue weight (rw) hence delta will get
smaller)
The thing is, I once asked Frans to test !NEW_FAIR_SLEEPERS and he
reported that that also solves the problem. So no bonus is good, and a
fixed bonus is good, but a variable bonus that is between these two
values is not.
Parag, would you also test with !NEW_FAIR_SLEEPERS to see if that solves
your problem?
The easiest way to disable it is (assumes you have debugfs mounted
at /debug):
# echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
You can also disable NORMALIZED_SLEEPER that way (of course you would
first have to enable NEW_FAIR_SLEEPERS again):
# echo NO_NORMALIZED_SLEEPER > /debug/sched_features
You can get a current status of all the flags using:
# cat /debug/sched_features
So by default we have both enabled; could you report if either
NO_NEW_FAIR_SLEEPERS
NEW_FAIR_SLEEPERS + NO_NORMALIZED_SLEEPERS
works for you?
Also, could you apply this patch, and report the bonus_max value for
your music ...I've restarted amarok and the glibc build each time after change to sched_features. # cat sched_features NEW_FAIR_SLEEPERS WAKEUP_PREEMPT START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS HRTICK NO_DOUBLE_TICK NORMALIZED_SLEEPER DEADLINE => skips in music (amarok) /proc/20734/task/20734/sched:se.bonus_max : 40960.000000 /proc/20734/task/20775/sched:se.bonus_max : 40960.000000 /proc/20734/task/20779/sched:se.bonus_max : 40960.000000 /proc/20734/task/20780/sched:se.bonus_max : 40960.000000 /proc/20734/task/20781/sched:se.bonus_max : 40960.000000 /proc/20734/task/20782/sched:se.bonus_max : 39.960966 /proc/20734/task/20799/sched:se.bonus_max : 40960.000000 # echo NO_NEW_FAIR_SLEEPERS >sched_features => no skips in music /proc/26357/task/26357/sched:se.bonus_max : 0.000000 /proc/26357/task/26362/sched:se.bonus_max : 0.000000 /proc/26357/task/26366/sched:se.bonus_max : 0.000000 /proc/26357/task/26367/sched:se.bonus_max : 0.000000 /proc/26357/task/26368/sched:se.bonus_max : 0.000000 /proc/26357/task/26369/sched:se.bonus_max : 0.000000 /proc/26357/task/9705/sched:se.bonus_max : 0.000000 # echo NEW_FAIR_SLEEPERS >sched_features # echo NO_NORMALIZED_SLEEPER >sched_features => no skips in music /proc/13352/task/13352/sched:se.bonus_max : 0.000000 /proc/13352/task/13357/sched:se.bonus_max : 0.000000 /proc/13352/task/13361/sched:se.bonus_max : 0.000000 /proc/13352/task/13362/sched:se.bonus_max : 0.000000 /proc/13352/task/13363/sched:se.bonus_max : 0.000000 /proc/13352/task/13364/sched:se.bonus_max : 0.000000 /proc/13352/task/27595/sched:se.bonus_max : 0.000000 Cheers, FJP --
Ok, that's _WAY_ too big, something went seriously wrong. Thanks! --
In fact, that is exactly 1024 time too large.
How does this work for you:
Index: linux-2.6-2/kernel/sched_fair.c
===================================================================
--- linux-2.6-2.orig/kernel/sched_fair.c
+++ linux-2.6-2/kernel/sched_fair.c
@@ -685,6 +685,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
* Update run-time statistics of the 'current'.
*/
update_curr(cfs_rq);
+ account_entity_enqueue(cfs_rq, se);
if (wakeup) {
place_entity(cfs_rq, se, 0);
@@ -695,7 +696,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
check_spread(cfs_rq, se);
if (se != cfs_rq->curr)
__enqueue_entity(cfs_rq, se);
- account_entity_enqueue(cfs_rq, se);
}
static void update_avg(u64 *avg, u64 sample)
--
Changes nothing. Values of bonus_max are unchanged and still skips. --
Humm, I saw some bonus_max spikes on my laptop which were fixed by that patchlet, ah well, the hunt goes on. Thanks! --
Oh! Hang on. I just checked and saw that I made the change in dequeue_entity() instead of enqueue_entity(). Let me try again. For code symmetry I guess both should probably be changed? Sorry for the mistake. --
Symetry would actually make it something like: enqueue() A B C dequeu() C B A But I just quickly checked the code and I can't see any ill effect of mis-placing that code in dequeue() too. --
Looks like you've got it after all! With the patch in the correct place there are no more skips and I now get: /proc/5182/task/5182/sched:se.bonus_max : 39.960966 /proc/5182/task/5263/sched:se.bonus_max : 39.960966 /proc/5182/task/5267/sched:se.bonus_max : 39.960966 /proc/5182/task/5268/sched:se.bonus_max : 39.960966 /proc/5182/task/5269/sched:se.bonus_max : 39.960966 /proc/5182/task/5270/sched:se.bonus_max : 19.990234 /proc/5182/task/5661/sched:se.bonus_max : 39.960966 Note (mostly for Parag) that I had GROUP_SCHED disabled for these tests, but for me that did make not any difference with my 2.6.25-rc tests for this issue, nor with my tests using sched-devel earlier in the .26 merge window. Feel free to add my Tested-by: Frans Pop <elendil@planet.nl> Cheers, FJP --
Strange - without the _enqueue patch from Peter, stutter is reduced to an almost acceptable level for me if I just disable GROUP_SCHED. With Peter's patch - I have tested multiple times and the stutter is completely eliminated. So +1 for Peter's patch. Mike recalls a regression that may be related - so it is possible you were running w/o the affecting change. Anyway - all is well in sched-land for now. Rejoice! Parag --
Not really, we can do better. I still see stutters and noticable delays in keypresses making it to the console even with Peter's patches. /me tries to fix this one :-/ Peter, any more patches to try out? thanks, -- regards, Dhaval --
I would say no - the audio still is unusable - slightly better with NO_NEW_FAIR_SLEEPERS, no difference with the other. But per Mike's suggestion if I disable CONFIG_GROUP_SCHED , CONFIG_FAIR_GROUP_SCHED and CONFIG_USER_SCHED, even with NEW_FAIR_SLEEPERS audio is good again, which does not change with NO_NEW_FAIR_SLEEPERS. (No skips for long time under -j8 - still skips a _very_ tiny bit - noticeable if I listen too carefully - but we can ignore that for the time being.) Mike - the GROUP_SCHED stuff is set to y in the Ubuntu kernel and I have no audio skips - may be a regression introduced after 2.6.24? I see Frans already reported that and there was some conclusion - let me know if more data will help. Thanks! Parag --
Hm. I've stumbled across a regression that may be related. Can you please try the attached with group scheduling enabled in current git? -Mike
In file included from kernel/sched.c:1902: kernel/sched_fair.c: In function 'print_cfs_rq_tasks': kernel/sched_fair.c:1602: error: implicit declaration of function 'calc_delta_weight' kernel/sched_fair.c:1603: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'int' make[1]: *** [kernel/sched.o] Error 1 make: *** [kernel] Error 2 'calc_delta_weight' or a replacement not found easily to fix it up! Thanks Parag --
Oops, I have an unrelated patch in my tree which you also need. -Mike
I see that Peter has a fix, so don't bother. -Mike --
Maybe not. It looks like Parag may have been seeing a combination of two different issues: the one that Peter fixed and one related to group scheduling. As I had group scheduling disabled, I only saw the first one. But my audio skips were a lot less severe than Parag described. And disabling group scheduling helped solve the problem for him. So it may well still be worth following up on this. Cheers, FJP --
I have just tested a kernel with GROUP_SCHED enabled (with Peter's patch still included) and can confirm that enabling group scheduling causes serious latencies. Skips are much more frequent than with the issue that was just solved. latencytop shows (different samples) Scheduler: waiting for cpu 537.7 msec Scheduler: waiting for cpu 172.0 msec Scheduler: waiting for cpu 414.6 msec Scheduler: waiting for cpu 455.7 msec Scheduler: waiting for cpu 446.5 msec I think I also see why.... # grep bonus_max `grep -l amarokapp /proc/*/task/*/sched` /proc/4725/task/20645/sched:se.bonus_max : 122640.203776 /proc/4725/task/4725/sched:se.bonus_max : 122640.203776 /proc/4725/task/4793/sched:se.bonus_max : 41902109.884416 /proc/4725/task/4797/sched:se.bonus_max : 41902109.884416 /proc/4725/task/4798/sched:se.bonus_max : 41902109.884416 /proc/4725/task/4799/sched:se.bonus_max : 40880.046080 /proc/4725/task/4800/sched:se.bonus_max : 19.970683 --
Hm. I enabled group scheduling, with Peter's patch + revert of the commit that is giving me grief + the below, and it seems to work ok here. Definitely no sound skips even under quite hefty load. The below _seems_ to make a difference stand-alone, but subjective things like 'lurches' require _very_ much testing... you tend to see what you want to see with such things. As you can see, I thought about submitting it, but I have way too much subjective poison in my system from testing this and that while reading source. Fix undesirable rq.clock update noops. Signed-off-by: Mike Galbraith <efault@gmx.de> Index: linux-2.6.26.git/kernel/sched.c =================================================================== --- linux-2.6.26.git.orig/kernel/sched.c +++ linux-2.6.26.git/kernel/sched.c @@ -668,9 +668,6 @@ static void __update_rq_clock(struct rq s64 delta = now - prev_raw; u64 clock = rq->clock; -#ifdef CONFIG_SCHED_DEBUG - WARN_ON_ONCE(cpu_of(rq) != smp_processor_id()); -#endif /* * Protect against sched_clock() occasionally going backwards: */ @@ -3009,8 +3006,8 @@ static void double_rq_lock(struct rq *rq spin_lock(&rq1->lock); } } - update_rq_clock(rq1); - update_rq_clock(rq2); + __update_rq_clock(rq1); + __update_rq_clock(rq2); } /* @@ -3860,7 +3857,7 @@ redo: /* Attempt to move tasks */ double_lock_balance(this_rq, busiest); /* this_rq->clock is already updated */ - update_rq_clock(busiest); + __update_rq_clock(busiest); ld_moved = move_tasks(this_rq, this_cpu, busiest, imbalance, sd, CPU_NEWLY_IDLE, &all_pinned); @@ -3959,8 +3956,8 @@ static void active_load_balance(struct r /* move a task from busiest_rq to target_rq */ double_lock_balance(busiest_rq, target_rq); - update_rq_clock(busiest_rq); - update_rq_clock(target_rq); + __update_rq_clock(busiest_rq); + __update_rq_clock(target_rq); /* Search for an sd spanning us and the target CPU. */ for_each_domain(target_cpu, sd) ...
On Fri, 2008-05-02 at 21:27 +0200, Mike Galbraith wrote:
That's a quite horrible patch for some of us, but I think I know why you
did that, and I think your fancy Q6600 has a stable enough TSC to pull
it off.
Ok, the the below would need something that relates tick_timestamp's to
one another.. probably sucks without it..
OTOH, Andi said he was working on a fastish global sched_clock() thingy,
Andi got a link to that code?
Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -560,6 +560,7 @@ struct rq {
unsigned long next_balance;
struct mm_struct *prev_mm;
+ spinlock_t clock_lock;
u64 clock, prev_clock_raw;
s64 clock_max_delta;
@@ -657,20 +658,25 @@ static inline void update_last_tick_seen
}
#endif
+#define cpu_rq(cpu) (&per_cpu(runqueues, (cpu)))
+#define this_rq() (&__get_cpu_var(runqueues))
+#define task_rq(p) cpu_rq(task_cpu(p))
+#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
+
/*
* Update the per-runqueue clock, as finegrained as the platform can give
* us, but without assuming monotonicity, etc.:
*/
-static void __update_rq_clock(struct rq *rq)
+static void ___update_rq_clock(struct rq *rq, u64 now)
{
u64 prev_raw = rq->prev_clock_raw;
- u64 now = sched_clock();
s64 delta = now - prev_raw;
u64 clock = rq->clock;
#ifdef CONFIG_SCHED_DEBUG
WARN_ON_ONCE(cpu_of(rq) != smp_processor_id());
#endif
+ spin_lock(&rq->clock_lock);
/*
* Protect against sched_clock() occasionally going backwards:
*/
@@ -699,12 +705,31 @@ static void __update_rq_clock(struct rq
rq->prev_clock_raw = now;
rq->clock = clock;
+ spin_unlock(&rq->clock_lock);
+}
+
+static void __update_rq_clock(struct rq *rq)
+{
+ ___update_rq_clock(rq, sched_clock());
+}
+
+static void __update_remote_rq_clock(struct rq *rq)
+{
+ u64 now = sched_clock();
+ struct rq *this_rq = this_rq();
+
+ now -= ...Does this obsolete Ingo's "sched: make cpu_clock() globally synchronous"? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=27ec44... My laptop is UP, so I cannot test this :-( -- Guillaume --
It doesn't, I'm just throwing around code/ideas. We should probably come up with some clock lib that builds the various clocks: rq->clock and cpu_clock() from sched_clock() depending on some architecture supplied hints. We can toss most of this code on archs like power64, sparc64 and s390x which have stable clocks. It just seems wrong to do all this on these archs, but don't have much choice on x86. --
From: Peter Zijlstra <a.p.zijlstra@chello.nl> All of this scheduler runqueue clock business is probably solving the x86 TSC issue on the wrong level. I suggest having a char with Andi Kleen about these issues, he's had some good ideas in the past which were ignored for whatever reasons... --
From: Peter Zijlstra <a.p.zijlstra@chello.nl> While I'm fine with this kind of stuff being added to constantly cope with x86's joke of a TSC register implementation, it's starting to become an enormous burdon for platforms where the TICK source actually works properly. Heck, on my Niagara2 chips, all 64 cpus use the same physical register for the TICK source, it physically can't get desynchronized :-) So, a way to turn all of this muck off would be much appreciated. I'm happy to test anything on sparc64, and I'm sure the powerpc folks are as well. And I also heard a rumor that Peter has access to a machine with a stable tick source for testing :-) --
Yeah, I was thinking of a way for architectures to signify how much help they need building the various clocks. Sparc, power and s390 would not need any help. --
there's already such a mechanism in sched-devel.git (and has been there for a week or two): an architecture can set time_sync_thresh to -1LL during early bootup and essentially disable all the synchronization logic. Ingo --
From: Ingo Molnar <mingo@elte.hu> Does it remove all of the code too? :-) Please give us a config boolean. The only platform for which a run-time knob is even necessary is x86. Thanks. --
yeah, will try something like that too. the thing is, core kernel folks have to resist such pressures _somewhat_. Architecture maintainers will put pressure on us from two directions: if a particular generic feature only helps _another_ architecture, they find it a nuisance and want it to be gone as much as possible. If a particular feature helps them, they want it supported and default-enabled as much as possible. In fact, complaints come if a generic-looking feature shows up in one architecture only. (recent example: inlining optimizations ;-) And you are totally right about sched_clock() being dead on accurate an globally synchronous on sparc64 - and you are right to find _any_ issue about it a nuisance. I totally envy you that sparc64's sched_clock() is so simple - it should have been like that on x86 years ago, if hw designers werent so impotent about it. ( although please note that the growing generalization that goes on _did_ find a subtle nohz problem on sparc64 early in the merge window, so it's not like these changes are totally useless to you. ) but it all goes in the other direction as well: many folks find endianness problems a nuisance on x86, many folks find TLB and explicit cache coherence complications a nuisance on x86 as well. The bus-to-phys complication which is an identity on x86. Etc. etc. But the core kernel is a conscious and intelligent union of all architecture's needs, and often we maintain complications even if they make no sense on the most popular platform. I think it makes strategic sense because it keep the kernel truly generic and truly clean. It also keeps architectures honest: even today the x86 architecture is still not as clean as sparc64 for example. so please be patient, we are working on it :) Ingo --
From: Ingo Molnar <mingo@elte.hu>
Note that this bug theoretically exists on every platform,
even x86, and only sparc64 is fixed at the moment by adding
the irq_{entry,exit}() guards around all of it's IPI handlers.
Just a reminder.
--
yeah, indeed - it's masked on x86 by the suckiness of most clockevents hw timers. We dont get unlimited sleep time like sparc64 does :-/ Ingo --
What is the bug ? IPIs on power are normal interrupts, so they do happen in irq_entry/exit blocks but I'm curious to know what the root bug is :-) Ben. --
on nohz we still keep jiffies uptodate - despite there not being an explicit 'keep jiffies uptodate' tick interrupt anymore. So on every irq_enter() we roll jiffies forward - if needed - and thus emulate jiffies behavior to drivers and core kernel code, etc. if an IPI on Power does not do an irq_enter() then you might miss out on updated jiffies. That might not matter for most jiffies, but you might also miss out on the 'touch the softlockup watchdog because we just woke from idle' action. This is what triggered the false positive warnings on Sparc64. the same bug existed on x86 too: that too does a few IPIs without irq_enter/irq_exit. We now removed the softlockup dependency so it should not be required to do an irq_enter()/exit anymore - unless the code that the IPI uses accesses jiffies. (but that would be unusual) Ingo --
From: Ingo Molnar <mingo@elte.hu>
That move doesn't solve the problem, I'm strongly certain
arch's still need to add the irq_{enter,exit}().
I'll revert the irq_{enter,exit}() changes on one of my
sparc64 boxes to validate this later today.
--
From: David Miller <davem@davemloft.net>
False alarm, which is good news, thanks Ingo. I can revert
the following it seems:
commit 2664ef44cf5053d2b7dff01cecac70bc601a5f68
Author: David S. Miller <davem@davemloft.net>
Date: Fri Apr 25 03:11:37 2008 -0700
[SPARC64]: Wrap SMP IPIs with irq_enter()/irq_exit().
Otherwise all sorts of bad things can happen, including
spurious softlockup reports.
Other platforms have this same bug, in one form or
another, just don't see the issue because they
don't sleep as long as sparc64 can in NOHZ.
Thanks to some brilliant debugging by Peter Zijlstra.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/kernel/smp.c b/arch/sparc64/kernel/smp.c
index 524b889..409dd71 100644
--- a/arch/sparc64/kernel/smp.c
+++ b/arch/sparc64/kernel/smp.c
@@ -866,14 +866,21 @@ void smp_call_function_client(int irq, struct pt_regs *regs)
void *info = call_data->info;
clear_softint(1 << irq);
+
+ irq_enter();
+
+ if (!call_data->wait) {
+ /* let initiator proceed after getting data */
+ atomic_inc(&call_data->finished);
+ }
+
+ func(info);
+
+ irq_exit();
+
if (call_data->wait) {
/* let initiator proceed only after completion */
- func(info);
atomic_inc(&call_data->finished);
- } else {
- /* let initiator proceed after getting data */
- atomic_inc(&call_data->finished);
- func(info);
}
}
@@ -1032,7 +1039,9 @@ void smp_receive_signal(int cpu)
void smp_receive_signal_client(int irq, struct pt_regs *regs)
{
+ irq_enter();
clear_softint(1 << irq);
+ irq_exit();
}
void smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
@@ -1040,6 +1049,8 @@ void smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
struct mm_struct *mm;
unsigned long flags;
+ irq_enter();
+
clear_softint(1 << irq);
/* See if we need to allocate a new TLB context because
@@ -1059,6 +1070,8 @@ void ...ah, good! It really looks like unnecessary overhead for the "simple IPIs". But ... i'm wondering ... what about softirq processing? Do these IPIs process softirqs on the way out? In that case the non-processed jiffies might be a problem. it's all a bit messy. I wish we could start turning jiffies into a function (which would just read GTOD and estimate jiffies from there), but i fear we are not there yet ... Ingo --
From: Ingo Molnar <mingo@elte.hu> I think Peter Z. made a good point that we probably need to keep it there for smp_call_function() receivers, so that the called functions can test the context they are in accurately. --
Allright, so it looks like we don't have a problem today, though I'll be careful if some of our CPUs come up with more fancy ways to do IPIs. Currently they are just normal interrupts, so we do irq_enter/exit and we process softirqs on the way out. Ben. --
From: Ingo Molnar <mingo@elte.hu> And you guys have IOMMUs now, aren't you glad all of this nice DMA API infrastructure exists already and has been tested for years? :-)))) --
On Fri, 02 May 2008 14:48:27 -0700 (PDT) it's a sad affair indeed. On some systems it counts cycles, on other systems it counts time. On most systems it stops while idle, on others it keeps running. On most systems its not very synchronized between packages, and on some systems it's not even synchronized between cores. I'm not convinced TSC is the right thing for the scheduler in the first place; on current x86 systems TSC counts "time" not "work done". Now of course "time" is an approximation for "work done", but not a very good one given the presence of what effectively is variable cpu speeds (software CPU frequency control is only part of that; there's also the finegrained hardware level frequency control as done by what marketing people call "Intel Dynamic Acceleration technology"). [*] I and others have talked to Peter about this already, and I'm sure we'll talk more about it in the future as well.. at some point this part of CFS needs to fundamentally be cleaned up. Since this gets into a debate about what fairness means ;( [*] The converse is also true; cycles aren't a good representation of time either; this makes cycle based profilers a bit iffy if you're interested in where the system spends time rather than where it spends cycles. --
My current view on this is that per cpu scheduling should use a time based clock, whereas smp load balancing can take the cycle (ie work) counter to balance different work/time loads and estimate core power. As for using the TSC, I'm afraid we just don't have any choice. Although I guess we could dynamically detect some counter on new cpus and use that when available. But the TSC is the only thing available on a lot of existing machines. --
Oh, yeah, that's right. Bah, let them eat cake ;-) -Mike --
There are definitely issues with that commit, but I'm not at all sure they manifest themselves with this workload in any heavy-weight way. My thought was only that it's negative effects could be compounded by group scheduling. -Mike --
Based on Mike's mails I have quickly tried reverting the following two commits to see if that would fix the GROUP_SCHED related regression, but I still get a merge error with the second one in sched_fair.c: $ git revert 7ba2e74ab5 $ git revert 8f1bc385cf I don't feel comfortable solving the conflict myself as I'd have absolutely no idea what I'd be doing and it all seems to require a lot of precision. As other suggested patches don't seem ready for testing yet, I'll just wait for further instructions :-) Cheers, FJP --
You probably shouldn't bother. GROUP_SCHED needs Peter's EEVDF help regardless of whether 8f1bc385cf is playing a significant role or not. -Mike --
What's the current status here? With mainline v2.6.26-rc1-110-ga153063 (which has latest sched patches), I still get very bad audio skips with GROUP_SCHED enabled. Cheers, FJP --
there's no solution for now, the latency targets are still not working under the group scheduler. Upstream we've made it non-default and marked it EXPERIMENTAL. Peter has a new updated group scheduler queue that i'm testing right now, perhaps it can be pushed out to sched-devel/latest later today. But that is v2.6.27 material. Ingo --
