- Various source code cleanups (no functional changes)
- Shrink binary size by refactoring some code
- One K8 optimization that is unfortunately not measurable.
But it sounded good in theory! @)
(if you have a better scheduling latency benchmark than
lmbench3 it would be cool to try)
- One real bug fix for a obscure problemAll against the sched-devel git tree with
d024f67c652f7f0519d2a1906b646286abbb2e48 head.Future wishlist: someone document all the magic numbers
in the load balancer code.-Andi
-
Not only the migration thread, but also softlockup and stop machine
need to be protected against normalize_rt(). Instead of checking
for them all I added a new process flag for this.This was the last available bit in 32bit task->flags, sorry for that.
This removes the is_migration_thread macro added in the last patch
again because it is not needed anymore.Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/include/linux/sched.h
===================================================================
--- linux-2.6-sched-devel.orig/include/linux/sched.h
+++ linux-2.6-sched-devel/include/linux/sched.h
@@ -1330,6 +1330,7 @@ static inline void put_task_struct(struc
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezeable */
+#define PF_RT_PROTECTED 0x80000000 /* RT task protected from sysrq *//*
* Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -74,12 +74,6 @@ unsigned long long __attribute__((weak))
return (unsigned long long)jiffies * (1000000000 / HZ);
}-#ifdef CONFIG_SMP
-#define is_migration_thread(p, rq) ((p) == (rq)->migration_thread)
-#else
-#define is_migration_thread(p, rq) 0
-#endif
-
/*
* Convert user-nice values [ -20 ... 0 ... 19 ]
* to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -5305,6 +5299,7 @@ migration_call(struct notifier_block *nf
kthread_bind(p, cpu);
/* Must be high prio: stop_machine expects to yield to it. */
rq = task_rq_lock(p, &flags);
+ p->flags |= PF_RT_PROTECTED;
__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
task_rq_unlock(rq, &flags);
cpu_rq(cpu)->migration_thread = p...
hm, i dont really like this. The migration thread is an absolute
exception because it's needed for a correct scheduler. SysRq-N is like
SysRq-U or SysRq-B - last-minute resort in case something Goes Foul.
Results are not guaranteed.Ingo
-
softlockup is the same. Just think about it.
If you ever renormalize it and then run a fifo thread it will
starve and then eventually kill the box.And starving CPU unplug is also equally bad.
-Andi
-
yeah, agreed. I ended up doing the change below.
Ingo
------------------->
Subject: sched: do not normalize kernel threads via SysRq-N
From: Ingo Molnar <mingo@elte.hu>do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.pointed out by Andi Kleen.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -362,15 +362,6 @@ static inline int cpu_of(struct rq *rq)
#endif
}-static inline int is_migration_thread(struct task_struct *p, struct rq *rq)
-{
-#ifdef CONFIG_SMP
- return p == rq->migration_thread;
-#else
- return 0;
-#endif
-}
-
/*
* Update the per-runqueue clock, as finegrained as the platform can give
* us, but without assuming monotonicity, etc.:
@@ -6557,6 +6548,12 @@ void normalize_rt_tasks(void)read_lock_irq(&tasklist_lock);
do_each_thread(g, p) {
+ /*
+ * Only normalize user tasks:
+ */
+ if (!p->mm)
+ continue;
+
p->se.exec_start = 0;
#ifdef CONFIG_SCHEDSTATS
p->se.wait_start = 0;
@@ -6578,8 +6575,7 @@ void normalize_rt_tasks(void)
spin_lock_irqsave(&p->pi_lock, flags);
rq = __task_rq_lock(p);- if (!is_migration_thread(p, rq))
- normalize_task(rq, p);
+ normalize_task(rq, p);__task_rq_unlock(rq);
spin_unlock_irqrestore(&p->pi_lock, flags);
-
I considered doing it this way too. But it means that if keventd
(or another workqueue thread) starts looping for some reason you cannot renormalize it.
Might be not a good thing and impact debuggability. I think it's better to only
protect special threads.-Andi
-
Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier readable.Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -74,6 +74,12 @@ unsigned long long __attribute__((weak))
return (unsigned long long)jiffies * (1000000000 / HZ);
}+#ifdef CONFIG_SMP
+#define is_migration_thread(p, rq) ((p) == (rq)->migration_thread)
+#else
+#define is_migration_thread(p, rq) 0
+#endif
+
/*
* Convert user-nice values [ -20 ... 0 ... 19 ]
* to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -6534,12 +6540,25 @@ EXPORT_SYMBOL(__might_sleep);
#endif#ifdef CONFIG_MAGIC_SYSRQ
+static void normalize_task(struct rq *rq, struct task_struct *p)
+{
+ int on_rq;
+ update_rq_clock(rq);
+ on_rq = p->se.on_rq;
+ if (on_rq)
+ deactivate_task(rq, p, 0);
+ __setscheduler(rq, p, SCHED_NORMAL, 0);
+ if (on_rq) {
+ activate_task(rq, p, 0);
+ resched_task(rq->curr);
+ }
+}
+
void normalize_rt_tasks(void)
{
struct task_struct *g, *p;
unsigned long flags;
struct rq *rq;
- int on_rq;read_lock_irq(&tasklist_lock);
do_each_thread(g, p) {
@@ -6563,26 +6582,10 @@ void normalize_rt_tasks(void)spin_lock_irqsave(&p->pi_lock, flags);
rq = __task_rq_lock(p);
-#ifdef CONFIG_SMP
- /*
- * Do not touch the migration thread:
- */
- if (p == rq->migration_thread)
- goto out_unlock;
-#endif- update_rq_clock(rq);
- on_rq = p->se.on_rq;
- if (on_rq)
- deactivate_task(rq, p, 0);
- __setscheduler(rq, p, SCHED_NORMAL, 0);
- if (on_rq) {
- activate_task(rq, p, 0);
- resched_task(rq->curr);
- }
-#ifdef CONFIG_SMP
- out_unlock:
-#endif
+ if (!is_migration_thread(p, rq))
+ normalize_task(rq, p);
+
__task_rq_unlock(rq);
spin_unlock_irqrestore(&p->pi_...
thanks, applied.
Ingo
-
Function never returns -EINVAL.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -6814,8 +6814,6 @@ int sched_group_set_shares(struct task_g
if (tg->shares == shares)
return 0;- /* return -EINVAL if the new value is not sane */
-
tg->shares = shares;
for_each_possible_cpu(i)
set_se_shares(tg->se[i], shares);
-
thanks, applied.
Ingo
-
These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -3689,206 +3689,115 @@ void fastcall complete_all(struct comple
}
EXPORT_SYMBOL(complete_all);-void fastcall __sched wait_for_completion(struct completion *x)
+static inline long __sched
+do_wait_for_common(struct completion *x, long timeout, int state)
{
- might_sleep();
-
- spin_lock_irq(&x->wait.lock);
if (!x->done) {
DECLARE_WAITQUEUE(wait, current);wait.flags |= WQ_FLAG_EXCLUSIVE;
__add_wait_queue_tail(&x->wait, &wait);
do {
- __set_current_state(TASK_UNINTERRUPTIBLE);
- spin_unlock_irq(&x->wait.lock);
- schedule();
- spin_lock_irq(&x->wait.lock);
- } while (!x->done);
- __remove_wait_queue(&x->wait, &wait);
- }
- x->done--;
- spin_unlock_irq(&x->wait.lock);
-}
-EXPORT_SYMBOL(wait_for_completion);
-
-unsigned long fastcall __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
-{
- might_sleep();
-
- spin_lock_irq(&x->wait.lock);
- if (!x->done) {
- DECLARE_WAITQUEUE(wait, current);
-
- wait.flags |= WQ_FLAG_EXCLUSIVE;
- __add_wait_queue_tail(&x->wait, &wait);
- do {
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ if (state == TASK_INTERRUPTIBLE &&
+ signal_pending(current)) {
+ __remove_wait_queue(&x->wait, &wait);
+ return -ERESTARTSYS;
+ }
+ __set_current_state(state);
spin_unlock_irq(&x->wait.lock);
timeout = schedule_timeout(tim...
Here's a replacement patch fixing a minor problem:
sleep_on_common wasn't marked __sched and ended up in the
wrong section; breaking the WCHAN backtracing.-Andi
Refactor common code of sleep_on / wait_for_completion
These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -3690,206 +3690,116 @@ void fastcall complete_all(struct comple
}
EXPORT_SYMBOL(complete_all);-void fastcall __sched wait_for_completion(struct completion *x)
+static inline long __sched
+do_wait_for_common(struct completion *x, long timeout, int state)
{
- might_sleep();
-
- spin_lock_irq(&x->wait.lock);
if (!x->done) {
DECLARE_WAITQUEUE(wait, current);wait.flags |= WQ_FLAG_EXCLUSIVE;
__add_wait_queue_tail(&x->wait, &wait);
do {
- __set_current_state(TASK_UNINTERRUPTIBLE);
- spin_unlock_irq(&x->wait.lock);
- schedule();
- spin_lock_irq(&x->wait.lock);
- } while (!x->done);
- __remove_wait_queue(&x->wait, &wait);
- }
- x->done--;
- spin_unlock_irq(&x->wait.lock);
-}
-EXPORT_SYMBOL(wait_for_completion);
-
-unsigned long fastcall __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
-{
- might_sleep();
-
- spin_lock_irq(&x->wait.lock);
- if (!x->done) {
- DECLARE_WAITQUEUE(wait, current);
-
- wait.flags |= WQ_FLAG_EXCLUSIVE;
- __add_wait_queue_tail(&x->wait, &wait);
- do {
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ if (state == TASK_INTERRUPTIBLE &&
+ ...
thanks, applied.
Ingo
-
btw., this is a very nice cleanup, the .text savings are particularly
impressive:text data bss dec hex filename
24545 2406 16 26967 6957 sched.o.before
23687 2406 16 26109 65fd sched.o.afteralmost 1K of text shaven off! :)
Ingo
-
Some CPUs like K8 cannot predict indirect calls. A common
optimization in object oriented languages is to check
for the most common call target and then call it directly;
otherwise do an indirect call. This patch does this manually
for sched_fair calls in sched.cIn theory -- it doesn't seem to happen currently -- this
would allow the compiler to even inline parts of sched_fair.c
into the callers.I only did it only for fast paths (wake up and schedule) because the
generated code is little larger. This enlarges sched.o again by about
150 bytes.Also I unlined check_preempt_curr()
I unfortunately wasn't able to measure a consistent difference in lmbench3
lat_ctx -- the change seems to be below its (large) instability.Still seems like a nice optimization and it's not too ugly.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -348,11 +348,6 @@ struct rq {
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
static DEFINE_MUTEX(sched_hotcpu_mutex);-static inline void check_preempt_curr(struct rq *rq, struct task_struct *p)
-{
- rq->curr->sched_class->check_preempt_curr(rq, p);
-}
-
static inline int cpu_of(struct rq *rq)
{
#ifdef CONFIG_SMP
@@ -833,6 +828,18 @@ static int balance_tasks(struct rq *this
#define sched_class_highest (&rt_sched_class)/*
+ * Simple devirtualization:
+ * Most tasks are fair tasks. Avoid the overhead of
+ * doing indirect calls for them by checking for the fair case and doing
+ * a direct call. Use only in real fast paths since it generates larger code.
+ */
+#define __CLASS_CALL(class, name, args) \
+ ((class) == &fair_sched_class ? \
+ name ## _fair args : (class)->name args)
+#define CLASS_CALL(p, name, args) \
+ __CLASS_CALL((p)->sched_class, name, args)
+
+/*
* Update delta_exec...
hm, i'm not convinced about this one. It increases the code size a bit
and it's a sched.c local hack. If then this should be done on a generic
infrastructure level - lots of other code (VFS, networking, etc.) could
benefit from it i suspect - and then should be .configurable as well.
Then the benefit might become measurable too.Ingo
-
Tiny bit (<200 bytes) and the wait_for/sleep_on refactor patch in the series
saves over 1K so I should have some room for code size increase. OverallUnfortunately not -- for this to work (especially for inlining) requires to
#include files implementing the sub calls. Except for the scheduler that
is pretty uncommon unfortunately. Also the situation regarding which
call target is the common one is typically much less clear than with
sched_fair / other scheduling classes.I considered making it generic, but I don't think it would make sense
at the current time.It might have been measurable if the context switch was measurable at all.
Unfortunately the lmbench3 lat_ctx test I tired fluctuated by itself
over 50%. Ok I suppose it would be possible to instrument the kernel itself
to measure cycles. Would that convince you?-Andi
-
there's no forced dependency between those two patches :-) So for now
some workloads would call sched_fair uncommon too. To me this seems like
dunno, it would depend on the numbers. But really, in most workloads we
do a lot more VFS indirect calls than scheduler indirect calls. So if
this was an issue i'd really suggest to attack it in a generic way.Ingo
-
I did that .text reduction only to make up for the text increase
from the other patch. Your approach is asking for keeping them both
in a single patch next time, which surely cannot be what you reallyThat's like saying all optimizations are a workaround for lack of hardware
with infinite IPC. Yes sure. But that doesn't seem like a very fruitful way to
reason.Besides even on CPUs with indirect branch predictor it is probably a win --
they tend to have much less memory to store indirect branch prediction (requiresThe difference is that the VFS always did indirect calls; but the scheduler
didn't. And again it's much less clear for the VFS in general what
are the common paths. To do it fully generic would probably require
a JIT and reoptimization at run time -- i don't think that's a path
we should go.But if generic solutions are not possible doing it for important
special cases where it happens to be possible is certainly a valid approach.But given that I couldn't come up with clear numbers it's reasonable to not
apply it yet. I'll try to come up with some better way to measure this.-Andi
-
No need to be more spagetti than absolutely necessary.
Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;No functional changes.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -555,16 +555,13 @@ static inline void finish_lock_switch(st
static inline struct rq *__task_rq_lock(struct task_struct *p)
__acquires(rq->lock)
{
- struct rq *rq;
-
-repeat_lock_task:
- rq = task_rq(p);
- spin_lock(&rq->lock);
- if (unlikely(rq != task_rq(p))) {
+ for (;;) {
+ struct rq *rq = task_rq(p);
+ spin_lock(&rq->lock);
+ if (likely(rq == task_rq(p)))
+ return rq;
spin_unlock(&rq->lock);
- goto repeat_lock_task;
}
- return rq;
}/*
@@ -577,15 +574,14 @@ static struct rq *task_rq_lock(struct ta
{
struct rq *rq;-repeat_lock_task:
- local_irq_save(*flags);
- rq = task_rq(p);
- spin_lock(&rq->lock);
- if (unlikely(rq != task_rq(p))) {
+ for (;;) {
+ local_irq_save(*flags);
+ rq = task_rq(p);
+ spin_lock(&rq->lock);
+ if (likely(rq == task_rq(p)))
+ return rq;
spin_unlock_irqrestore(&rq->lock, *flags);
- goto repeat_lock_task;
}
- return rq;
}static void __task_rq_unlock(struct rq *rq)
@@ -1076,69 +1072,71 @@ void wait_task_inactive(struct task_stru
int running, on_rq;
struct rq *rq;-repeat:
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
+ for (;;) {
+ /*
+ * We do the initial early heuristics without holding
+ * any task-queue locks at all. We'll only try to get
+ * the runqueue lock when things look like they will
+ * work out!
+ */
...
this crashes during bootup - removed it for now.
Ingo
-
Boots here and also survived LTP and some other tests.
Hmm; i had an early version that hung because I forgot a break.
Perhaps I forgot quilt refresh. Can you check you have this version?-Andi
Remove some unnecessary gotos in sched.c
No need to be more spagetti than absolutely necessary.
Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;No functional changes.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -555,16 +555,13 @@ static inline void finish_lock_switch(st
static inline struct rq *__task_rq_lock(struct task_struct *p)
__acquires(rq->lock)
{
- struct rq *rq;
-
-repeat_lock_task:
- rq = task_rq(p);
- spin_lock(&rq->lock);
- if (unlikely(rq != task_rq(p))) {
+ for (;;) {
+ struct rq *rq = task_rq(p);
+ spin_lock(&rq->lock);
+ if (likely(rq == task_rq(p)))
+ return rq;
spin_unlock(&rq->lock);
- goto repeat_lock_task;
}
- return rq;
}/*
@@ -577,15 +574,14 @@ static struct rq *task_rq_lock(struct ta
{
struct rq *rq;-repeat_lock_task:
- local_irq_save(*flags);
- rq = task_rq(p);
- spin_lock(&rq->lock);
- if (unlikely(rq != task_rq(p))) {
+ for (;;) {
+ local_irq_save(*flags);
+ rq = task_rq(p);
+ spin_lock(&rq->lock);
+ if (likely(rq == task_rq(p)))
+ return rq;
spin_unlock_irqrestore(&rq->lock, *flags);
- goto repeat_lock_task;
}
- return rq;
}static void __task_rq_unlock(struct rq *rq)
@@ -1076,69 +1072,71 @@ void wait_task_inactive(struct task_stru
int running, on_rq;
struct rq *rq;-repeat:
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work o...
hm, i've got your latest. I did some more debugging today and this was a
red herring: it turns out that it was not your patch causing this
problem, it triggered with your patch only because of a subtle .config
difference that i overlooked. Sorry!Ingo
-
(and of course your patch is back in.)
Ingo
-
thanks, applied.
Ingo
-
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Matt Mackall | Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
