commit 6f15c833f56267baf5abdd0fbc90a81489573053
Author: Mark Langsdorf <mlangsdo@wshpnow.amd.com>
Date: Mon May 4 15:02:38 2009 -0500
New AMD processors will support the Pause Filter Feature.
This feature creates a new field in the VMCB called Pause
Filter Count. If Pause Filter Count is greater than 0 and
ntercepting PAUSEs is enabled, the processor will increment
an internal counter when a PAUSE instruction occurs instead
of intercepting. When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.
This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.
Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand. Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.
Processor support for this feature is indicated by a CPUID
bit.
On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 1%. Further performance
improvement may be possible with a more sophisticated
yield algorithm.
-Mark Langsdorf
Operating System Research Center
AMD
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 85574b7..1fecb7e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u16 intercept_dr_write;
u32 intercept_exceptions;
u64 intercept;
- u8 reserved_1[44];
+ u8 reserved_1[42];
+ u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;
u64 tsc_offset;
diff --git a/arch/x86/kvm/svm.c ...Beside the small style problem the patch looks good. Please remove this empty line from the patch and resubmit. Please also add avi@redhat.com and kvm@vger.kernel.org to the CC list. Joerg -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München System | Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München | Registergericht München, HRB Nr. 43632 --
Isn't a much better solution to the spinlock problem a usable monitor-wait implementation? If we implement virt spinlocks using monitor-wait they don't spin but simply wait in place, the HV could then decide to run someone else. This is the HV equivalent to futexes. The only problem with this is that the current hardware has horrid mwait wakeup latencies. If this were (much) improved you don't need such ugly yield hacks like this. --
I've considered MWAIT, but its really hard on the hw side: the hardware would have to generate a 'wakeup', meaning it either has to trap out, or has to send an irq. Trapping out is only possible on the release-the-lock side - which is usually on the wrong physical CPU, and it also happens _too late_ - such monitor/wait thingies are usually based on MESI cache, and the originating CPU does not wait for everything to happen. An irq (on the target CPU that notices the cacheline flush) is more feasible, but it is several thousand cycles to begin with. Irqs/vectors are a lot harder to add in general as well, and incur a cost of several years of CPU-design-cycle latency. Furthermore, the bits around MESI updates are _very_ sensitive codepaths of the CPU, while REP; NOP is a slowpath to begin with. But ... especially as SMT techniques spread, something like that will have to happen as well - but it will take years. Meanwhile, this particular CPU feature is there, it's fairly intuitive (plus a VM exit doesnt really change the CPU's behavior materially, so a lot easier to validate and get OS support for), so we could use it. Ingo --
What was your test case? How many runs did you do? I've been working with the former VI guys on some scheduler improvements for Xen, and hope to get back to this some next week. -Mark Langsdorf Operating System Research Center AMD --
Mark, Do you have time to push the Linux scheduler changes for Pause Filtering now? It's almost done after your work. If needing any help, pls. let me know. Thanks, edwin --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | Re: [PATCH] optimized ktime_get[_ts] for GENERIC_TIME=y |
git: | |
| Junio C Hamano | Re: Some advanced index playing |
| Jeff King |
