Linus, please pull the latest scheduler git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
It contains lots of scheduler updates from lots of people - hopefully
the last big one for quite some time. Most of the focus was on
performance (both micro-performance and scalability/balancing), but
there's the fair-scheduling feature now Kconfig selectable too. Find the
shortlog below.Code that is touched outside of the scheduler: the KVM bits were acked
by Avi, the net/unix change is trivial and only affects sync wakeups,
ditto the fs/pipe.c changes - but i can push those separately if it
needs an ack from David first.ABI/API changes:
- new CONFIG_FAIR_USER_SCHED and /sys/kernel/uids/ + uevent API.
- /proc/stat and /proc/<pid>/stat changes for guest-CPU usage [KVM]
- /proc/sched_debug formats changed/enhancedTesting status: the changes are chronological and all the
interactivity-impacting changes are near the head of the queue and most
of them were done weeks ago, and were thus part of the CFS-v22 backport
series - which was tested by many people. There are no known regressions
at the moment. It's all fully bisectable.Thanks,
Ingo
------------------>
Alexey Dobriyan (1):
sched: uninline schedulerAndi Kleen (4):
sched: cleanup: remove unnecessary gotos
sched: cleanup: refactor common code of sleep_on / wait_for_completion
sched: cleanup: refactor normalize_rt_tasks
sched: remove stale comment from sched_group_set_shares()Arjan van de Ven (1):
Make scheduler debug file operations constDhaval Giani (1):
sched: group scheduling, sysfs tunablesDmitry Adamushko (14):
sched: clean up struct load_stat
sched: clean up schedstat block in dequeue_entity()
sched: sched_setscheduler() fix
sched: add set_curr_task() calls
sched: do not keep current in the tree and get rid of sched_entity::fair_key
sched: optimize task_ne...
How does this one compare to the v22 you released earlier ?
I'm thinking of backporting any fixes/optimizations to 2.6.22
(and possibly 2.6.23)--
Thomas
-
i have already backported it as v22.1 - will release it within a few
days. (once the currently open regressions have been fixed)Ingo
-
i've uploaded what i have at the moment, to:
http://people.redhat.com/mingo/cfs-scheduler/devel/sched-cfs-v2.6.23.1-v...
Ingo
-
Big thanks for your work...
Now I just have to see if I can get it to work with the -hrt series and
I'm really happy ;-)--
Thomas
-
Nice work...
However it's a pity all the balancing stuff got wildly changed
in 2.6.23 and then somewhat changed back again now.Despite appearances, a lot of those things weren't actually
*completely* arbitrary values. I fear that it will make finding
performance regressions harder than it should have...Anyway.
-
so i dropped them and re-pushed. New shortlog below.
Ingo
------------------>
Alexey Dobriyan (1):
sched: uninline schedulerAndi Kleen (4):
sched: cleanup: remove unnecessary gotos
sched: cleanup: refactor common code of sleep_on / wait_for_completion
sched: cleanup: refactor normalize_rt_tasks
sched: remove stale comment from sched_group_set_shares()Arjan van de Ven (1):
Make scheduler debug file operations constDhaval Giani (1):
sched: group scheduling, sysfs tunablesDmitry Adamushko (14):
sched: clean up struct load_stat
sched: clean up schedstat block in dequeue_entity()
sched: sched_setscheduler() fix
sched: add set_curr_task() calls
sched: do not keep current in the tree and get rid of sched_entity::fair_key
sched: optimize task_new_fair()
sched: simplify sched_class::yield_task()
sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
sched: yield fix
sched: fix __pick_next_entity()
sched: tidy up SCHED_RR
sched: cleanup, remove calc_weighted()
sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
sched: fix group scheduling for SCHED_BATCHGautham R Shenoy (1):
sched: fix rt ptracer monopolizing CPUHiroshi Shimamoto (1):
sched: clean up sched_fork()Ingo Molnar (71):
sched: fix sysctl_sched_child_runs_first flag
sched: resched task in task_new_fair()
sched: small sched_debug cleanup
sched: debug: track maximum 'slice'
sched: uniform tunings
sched: use constants if !CONFIG_SCHED_DEBUG
sched: remove stat_gran
sched: remove precise CPU load
sched: remove precise CPU load calculations #2
sched: track cfs_rq->curr on !group-scheduling too
sched: cleanup: simplify cfs_rq_curr() methods
sched: uninline __enqueue_entity()/__dequeue_entity()
sched: speed up update_load_add/_sub()
sc...
On Mon, 15 Oct 2007 16:17:23 +0200
Did Paul Jackson's crash get fixed?
-
yes - that crash was a showstopper that was holding up the pull request
for 2 days. Paul bisected it down to the culprit and the fix was to do
this in wake_up_new_task():- if (!p->sched_class->task_new || !current->se.on_rq) {
+ if (!p->sched_class->task_new || !current->se.on_rq || !rq->cfs.curr) {(during early bootup the cfs_rq has no curr pointer yet.) It's not clear
why this race did not trigger earlier. (and the two checks can probably
be consolidated into a single "!rq->cfs.curr" condition.)Ingo
-
an update on this issue:
shortly, SD_BALANCE_FORK is required to trigger this problem and
hence, only NUMA machines could have been affected by it (and only
ia64 and x86 have SD_BALANCE_FORK in SD_NODE_INIT).more details:
it's perfectly legitimate for 'rq->cfs.curr' to be NULL in
task_new_fair() in the case when this_cpu != task_cpu(p) (p -- is a
newly created task).why this_cpu != task_cpu(p) :
do_fork() --> copy_process() --> sched_fork() -->
cpu = sched_balance_self(this_cpu, SD_BALANCE_FORK)chose a different cpu for the new task and there is _no_
'class_sched_fair' task running on this cpu at the moment (that's why
rq->cfs.curr == NULL).[ thanks a lot to Paul for providing debugging information ]
btw., it's not the 'curr->vruntime < se->vruntime' part in
task_new_fair() that gave us the oops (it's only executed in the case
of this_cpu == task_cpu(p)) _but_ it's rather:2 checks are required as 'current' and rq->cfs.curr are not the same :-)
It also should work if we just get rid of [*] or add an adiitional
(curr != NULL) check there.just as a additional observation:
there are lots of per-cpu threads (like events/cpu, ksoftirq/cpu,
etc.) being created on start-up (x NUMBER_OF_CPUS) and SD_SCHED_FORK
(actually, sched_balance_self() from sched_fork()) is just an overhead
in this case...
although, sched_balance_self() is likely to be responsible for a minor
% of the time taken to create a new context so optimizing it away
(esp. for some corner cases) won't improve the start-up time--
Best regards,
Dmitry Adamushko
-
Maybe not related to that but now my box is killed after this merge.
When I do not much on the box I get maybe 6h uptime , by doing some work ( compiling etc ) is random freeze.
I was able to capture the OOps finally :
...
[15692.917111] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
[15692.917159] printing eip:
[15692.917174] c0111f90
[15692.917185] *pde = 00000000
[15692.917200] Oops: 0000 [#1]
[15692.917208] PREEMPT SMP
[15692.917240] Modules linked in: fuse netconsole configfs pc87360 hwmon_vid eeprom adm1021 uhci_hcd sr_mod shpchp pci_hotplug ohci_hcd iTCO_wdt iTCO_vendor_support intel_agp i82860_edac i2c_i801 ehci_hcd usbcore edac_core cdrom agpgart 3c59x mii ext4dev jbd2 capability commoncap loop lp parport_pc parport evdev
[15692.917623] CPU: 0
[15692.917625] EIP: 0060:[<c0111f90>] Not tainted VLI
[15692.917629] EFLAGS: 00010046 (2.6.23-g65a6ec0d #330)
[15692.917661] EIP is at pick_next_task_fair+0x1f/0x2d
[15692.917672] eax: c150a7b8 ebx: 00000000 ecx: 00000000 edx: 00000000
[15692.917689] esi: c1507a48 edi: 00000000 ebp: 00eaaf7a esp: cb1fdf14
[15692.917701] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
[15692.917715] Process sed (pid: 28999, ti=cb1fc000 task=cfdc3500 task.ti=cb1fc000)
[15692.917725] Stack: c02f8268 c02ef7b5 00000002 cb1fdf58 cb1fdf50 00000000 c0400f38 c0403780
[15692.917833] cfdc3500 cfdc3634 c150a780 00000000 c011a8e7 00000000 c1077aa0 000000ff
[15692.917942] 00000000 00000000 00000000 cb1fdf8c 00000010 cfdc3500 cb1fdf8c c011ace5
[15692.918048] Call Trace:
[15692.918072] [<c02ef7b5>] schedule+0x321/0x58f
[15692.918109] [<c011a8e7>] do_exit+0x293/0x6c6
[15692.918143] [<c011ace5>] do_exit+0x691/0x6c6
[15692.918169] [<c011ad87>] sys_exit_group+0x0/0xd
[15692.918195] [<c01026e6>] sysenter_past_esp+0x5f/0x85
[15692.918232] =======================
[15692.918244] Code: 8b 53 28 89 43 34 89 53 38 5b 5e c3 53 31 d2 83 78 40 00 74 20 83 c...
[ cc'ed Srivatsa ]
Gabriel, could you please post a disassembled code for pick_next_task_fair()?
(objdump -d kernel/sched.o and then search for pick_next_task_fair --
copy_and_past)anyway, my guess is that it's :
se = pick_next_entity(cfs_rq);
cfs_rq = group_cfs_rq(se);'se' _happens_ to be NULL and group_cf_rq(se) does se->my_q and
(according to my calculations) offset(my_q) == 68 (0x44) for x86 32bit
system with CONFIG_SCHEDSTATS=n and CONFIG_FAIR_GROUP_SCHED=y
(according to the config).that might take place provided put_prev_task_fair() failed for some
reason to insert 'current' (or its corresponding group element) back
into the tree in schedule()... say, due to some inconsistency in
cfs_rq's data.Srivatsa, that's somewhat similar to another issue that has been
posted earlier today (crash in put_prev_task_fair() -->
__enqueue_task() --> rb_insert_color()) that you are already aware of
... (/me will continue tomorrow).--
Best regards,
Dmitry Adamushko
-
Sure here it is :
00000e49 <pick_next_task_fair>:
e49: 53 push %ebx
e4a: 31 d2 xor %edx,%edx
e4c: 83 78 40 00 cmpl $0x0,0x40(%eax)
e50: 74 20 je e72 <pick_next_task_fair+0x29>
e52: 83 c0 38 add $0x38,%eax
e55: 8b 50 20 mov 0x20(%eax),%edx
e58: 31 db xor %ebx,%ebx
e5a: 85 d2 test %edx,%edx
e5c: 74 0a je e68 <pick_next_task_fair+0x1f>
e5e: 8d 5a f8 lea -0x8(%edx),%ebx
e61: 89 da mov %ebx,%edx
e63: e8 a9 ff ff ff call e11 <set_next_entity>
e68: 8b 43 44 mov 0x44(%ebx),%eax
e6b: 85 c0 test %eax,%eax
e6d: 75 e6 jne e55 <pick_next_task_fair+0xc>
e6f: 8d 53 d0 lea -0x30(%ebx),%edx
e72: 89 d0 mov %edx,%eax
e74: 5b pop %ebx
-
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Andrew Morton | -mm merge plans for 2.6.23 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| PJ Waskiewicz | [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe |
| David Miller | Re: [GIT]: Networking |
