I'm concerned that we don't have adequate protection for the scheduler during cpu hotplug events, but I'm willing to believe I simply don't understand the mechanism well enough. We had a crash in (comparatively ancient) 2.6.16.* but I think the relevant code is basically unchanged since then. First we introduced some cpu-intensive workloads. Then we added two cpus. System quickly crashed. The crash was in find_busiest_group(), when the kernel tried to access "this", which was NULL. If we don't find a localgroup, we won't set this, and when we try to calculate *imbalance, we'll dereference a NULL "this" and crash. As I looked over the code, though, I couldn't tell if the fault was with find_busiest_group() for not covering this case, or if the problem was that the method the hotplug code is using to reconstruct the sched_domains really doesn't protect find_busiest_group (and find_idlest_group) at all. Can anybody explain how synchronize_sched() is really syncing? It looks like a half-implemented RCU setup. I fear we really don't have any way to protect the two functions above from hotplug's desire to twiddle with the sched_domains. Do we? Rick -
| Michal Piotrowski | Re: 2.6.23-rc3-mm1 |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Fred Tyler | Slow, persistent memory leak in 2.6.20 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Antonio Almeida | HTB accuracy for high speed |
