mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linux Kernel Mailing List
Date: Tuesday, September 22, 2009 - 9:01 am

Gitweb:     http://git.kernel.org/linus/252c5f94d944487e9f50ece7942b0fbf659c5c31
Commit:     252c5f94d944487e9f50ece7942b0fbf659c5c31
Parent:     3f96b79ad96263cc0ece7bb340cddf9b2ddfb1b3
Author:     Lee Schermerhorn <Lee.Schermerhorn@hp.com>
AuthorDate: Mon Sep 21 17:03:40 2009 -0700
Committer:  Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Tue Sep 22 07:17:41 2009 -0700

    mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()
    
    We noticed very erratic behavior [throughput] with the AIM7 shared
    workload running on recent distro [SLES11] and mainline kernels on an
    8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
    [2.6.27.19+] with Barcelona processors, as we increased the load [10s of
    thousands of tasks], the throughput would vary between two "plateaus"--one
    at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
    causes the results to smooth out at the ~130k plateau.
    
    But wait, there's more:
    
    We do not see this behavior on smaller platforms--e.g., 4 socket/8 core.
    This could be the result of the larger number of cpus on the larger
    platform--a scalability issue--or it could be the result of the larger
    number of interconnect "hops" between some nodes in this platform and how
    the tasks for a given load end up distributed over the nodes' cpus and
    memories--a stochastic NUMA effect.
    
    The variability in the results are less pronounced [on the same platform]
    with Shanghai processors and with mainline kernels.  With 31-rc6 on
    Shanghai processors and 288 file systems on 288 fibre attached storage
    volumes, the curves [jpm vs load] are both quite flat with the patched
    kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
    jpm] than the unpatched kernel.
    
    Profiling indicated that the "slow" runs were incurring high[er]
    contention on an anon_vma lock in vma_adjust(), apparently called from the
    sbrk() system call.
    
    The patch:
    
    A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
    anon_vma lock when we're only adjusting the end of a vma, as is the case
    for brk().  The comment questions whether it's worth while to optimize for
    this case.  Apparently, on the newer, larger x86_64 platforms, with
    interesting NUMA topologies, it is worth while--especially considering
    that the patch [if correct!] is quite simple.
    
    We can detect this condition--no overlap with next vma--by noting a NULL
    "importer".  The anon_vma pointer will also be NULL in this case, so
    simply avoid loading vma->anon_vma to avoid the lock.
    
    However, we DO need to take the anon_vma lock when we're inserting a vma
    ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
    need to check for 'insert', as well.  And Hugh points out that we should
    also take it when adjusting vm_start (so that rmap.c can rely upon
    vma_address() while it holds the anon_vma lock).
    
    akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it
    might be -stable material even though thiss isn't a regression: "this
    issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is
    severe on large machine (4*8*2 cpu)"
    
    [hugh.dickins@tiscali.co.uk: test vma start too]
    Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
    Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: Eric Whitney <eric.whitney@hp.com>
    Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/mmap.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 56eb871..b6d74b3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -570,9 +570,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 
 	/*
 	 * When changing only vma->vm_end, we don't really need
-	 * anon_vma lock: but is that case worth optimizing out?
+	 * anon_vma lock.
 	 */
-	if (vma->anon_vma)
+	if (vma->anon_vma && (insert || importer || start != vma->vm_start))
 		anon_vma = vma->anon_vma;
 	if (anon_vma) {
 		spin_lock(&anon_vma->lock);
--
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
mmap: avoid unnecessary anon_vma lock acquisition in vma_a ..., Linux Kernel Mailing ..., (Tue Sep 22, 9:01 am)