Re: [PATCH][KVM] Add support for Pause Filtering to AMD SVM

Previous thread: [PATCH resend] drm: ignore LVDS on intel graphics systems that lie about having it by Jarod Wilson on Tuesday, May 5, 2009 - 7:00 am. (2 messages)

Next thread: sata_sil24 EH complete by Tim Connors on Tuesday, May 5, 2009 - 7:22 am. (7 messages)
From: Mark Langsdorf
Date: Tuesday, May 5, 2009 - 7:09 am

commit 6f15c833f56267baf5abdd0fbc90a81489573053
Author: Mark Langsdorf <mlangsdo@wshpnow.amd.com>
Date:   Mon May 4 15:02:38 2009 -0500

    New AMD processors will support the Pause Filter Feature.
    This feature creates a new field in the VMCB called Pause
    Filter Count.  If Pause Filter Count is greater than 0 and
    ntercepting PAUSEs is enabled, the processor will increment
    an internal counter when a PAUSE instruction occurs instead
    of intercepting.  When the internal counter reaches the
    Pause Filter Count value, a PAUSE intercept will occur.
    
    This feature can be used to detect contended spinlocks,
    especially when the lock holding VCPU is not scheduled.
    Rescheduling another VCPU prevents the VCPU seeking the
    lock from wasting its quantum by spinning idly.
    
    Experimental results show that most spinlocks are held
    for less than 1000 PAUSE cycles or more than a few
    thousand.  Default the Pause Filter Counter to 3000 to
    detect the contended spinlocks.
    
    Processor support for this feature is indicated by a CPUID
    bit.
    
    On a 24 core system running 4 guests each with 16 VCPUs,
    this patch improved overall performance of each guest's
    32 job kernbench by approximately 1%.  Further performance
    improvement may be possible with a more sophisticated
    yield algorithm.
    
    -Mark Langsdorf
    Operating System Research Center
    AMD
    
    Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 85574b7..1fecb7e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u16 intercept_dr_write;
 	u32 intercept_exceptions;
 	u64 intercept;
-	u8 reserved_1[44];
+	u8 reserved_1[42];
+	u16 pause_filter_count;
 	u64 iopm_base_pa;
 	u64 msrpm_base_pa;
 	u64 tsc_offset;
diff --git a/arch/x86/kvm/svm.c ...
From: Bert Wesarg
Date: Tuesday, May 5, 2009 - 9:05 am

From: Joerg Roedel
Date: Thursday, May 7, 2009 - 6:55 am

Beside the small style problem the patch looks good. Please remove this
empty line from the patch and resubmit. Please also add avi@redhat.com
and kvm@vger.kernel.org to the CC list.

Joerg



-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632

--

From: Peter Zijlstra
Date: Monday, May 11, 2009 - 7:38 am

Isn't a much better solution to the spinlock problem a usable
monitor-wait implementation?

If we implement virt spinlocks using monitor-wait they don't spin but
simply wait in place, the HV could then decide to run someone else.

This is the HV equivalent to futexes.

The only problem with this is that the current hardware has horrid mwait
wakeup latencies. If this were (much) improved you don't need such ugly
yield hacks like this.
--

From: Ingo Molnar
Date: Monday, May 11, 2009 - 7:51 am

I've considered MWAIT, but its really hard on the hw side:

the hardware would have to generate a 'wakeup', meaning it either 
has to trap out, or has to send an irq.

Trapping out is only possible on the release-the-lock side - which 
is usually on the wrong physical CPU, and it also happens _too late_ 
- such monitor/wait thingies are usually based on MESI cache, and 
the originating CPU does not wait for everything to happen.

An irq (on the target CPU that notices the cacheline flush) is more 
feasible, but it is several thousand cycles to begin with.

Irqs/vectors are a lot harder to add in general as well, and incur a 
cost of several years of CPU-design-cycle latency. Furthermore, the 
bits around MESI updates are _very_ sensitive codepaths of the CPU, 
while REP; NOP is a slowpath to begin with.

But ... especially as SMT techniques spread, something like that 
will have to happen as well - but it will take years. Meanwhile, 
this particular CPU feature is there, it's fairly intuitive (plus a 
VM exit doesnt really change the CPU's behavior materially, so a lot 
easier to validate and get OS support for), so we could use it.

	Ingo
--

From: Langsdorf, Mark
Date: Wednesday, July 22, 2009 - 3:40 pm

What was your test case?  How many runs did you do?

I've been working with the former VI guys on some scheduler
improvements for Xen, and hope to get back to this some
next week.

-Mark Langsdorf
Operating System Research Center
AMD

--

From: Zhai, Edwin
Date: Wednesday, August 5, 2009 - 2:08 am

Mark,
Do you have time to push the Linux scheduler changes for Pause Filtering 
now?  It's almost done after your work. If needing any help, pls. let me 
know.

Thanks,
edwin

--

Previous thread: [PATCH resend] drm: ignore LVDS on intel graphics systems that lie about having it by Jarod Wilson on Tuesday, May 5, 2009 - 7:00 am. (2 messages)

Next thread: sata_sil24 EH complete by Tim Connors on Tuesday, May 5, 2009 - 7:22 am. (7 messages)