I am saying the machine has no PLE feature support. Even with PLE feature support, there is still performance loss due to PLE's cost.
For example, one machine is configured with 2 pCPUs and there are two Windows guests running on the machine, and each guest is cconfigured with 2 vcpus and one webbench server runs in it.
If use host's default scheduler, webbench's performance is very bad, but if pin each geust's vCPU0 to pCPU0 and vCPU1 to pCPU1, we can see 5-10X performance improvement with same CPU utilization.
In addition, we also see kvm's perf scalability is also impacted in large systems, for some performance experiments, kvm's perf begins to drop when vCPU is overcommitted and pCPU are saturated, but once the wake_up_affine feature is switched off in scheduler, kvm's perf can keep rising in this case.
Xiantao
--