Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web servers

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: KOSAKI Motohiro
Date: Monday, October 4, 2010 - 5:45 am

Hi


Yeah. page cache often have very long life than processes. then, CPU place
which current process running is not so good heuristics. and kernel don't 
have good statistics to find best node for cache. That's problem.
How do we know future processes work on which cpus?

Also, CPU scheduler have an issue. IO intensive workload often makes
unbalanced process layout. (cpus haven't been so busy yet. why do we
need to make costly cpu migration?). end up, memory consumption also 
become unbalanced. this is also difficult issue. hmm..



Yup.


In theory, yes. but please talk with userland developers. They always say
"Our software work fine on *BSD, Solaris, Mac, etc etc. that's definitely 
linux problem". /me have no way to persuade them ;-)




I think it's accurate. and I don't think this is easy works because
there are many mothorboard vendor in the world and we don't have a way of
communicate them. That's difficulty of the commodity.


This is one of option. but we don't need to create x86 arch specific
RECLAIM_DISTANCE. Because practical high-end numa machine are either
ia64(SGI, Fujitsu) or Power(IBM) and both platform already have arch
specific definition. then changing RECLAIM_DISTANCE doesn't make any
side effect on such platform. and if possible, x86 shouldn't have
arch specific definition because almost minor arch don't have a lot of
tester and its quality often depend on testing on x86.

attached a patch below.



If the problem was on only few atypical software, this makes sense.
but I don't think this is practical way on current situation.


For performance, this is best way definitely. And MySQL or other DB software
should concern this, I believe.
But, again, the problem is, too many software don't match zone_reclaim_mode.



From d54928bfb4b2b865bedcff17e9b45dfbb714a5e6 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 14 Oct 2010 13:48:21 +0900
Subject: [PATCH] mm: increase RECLAIM_DISTANCE to 30

Recently, Robert Mueller reported zone_reclaim_mode doesn't work
properly on his new NUMA server (Dual Xeon E5520 + Intel S5520UR MB).
He is using Cyrus IMAPd and it's built on a very traditional
single-process model.

  * a master process which reads config files and manages the other
    process
  * multiple imapd processes, one per connection
  * multiple pop3d processes, one per connection
  * multiple lmtpd processes, one per connection
  * periodical "cleanup" processes.

Then, there are thousands of independent processes. The problem is,
recent Intel motherboard turn on zone_reclaim_mode by default and
traditional prefork model software don't work fine on it.
Unfortunatelly, Such model is still typical one even though 21th
century. We can't ignore them.

This patch raise zone_reclaim_mode threshold to 30. 30 don't have
specific meaning. but 20 mean one-hop QPI/Hypertransport and such
relatively cheap 2-4 socket machine are often used for tradiotional
server as above. The intention is, their machine don't use
zone_reclaim_mode.

Note: ia64 and Power have arch specific RECLAIM_DISTANCE definition.
then this patch doesn't change such high-end NUMA machine behavior.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux.com>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Robert Mueller <robm@fastmail.fm>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/topology.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 64e084f..bfbec49 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -60,7 +60,7 @@ int arch_update_cpu_topology(void);
  * (in whatever arch specific measurement units returned by node_distance())
  * then switch on zone reclaim on boot.
  */
-#define RECLAIM_DISTANCE 20
+#define RECLAIM_DISTANCE 30
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
-- 
1.6.5.2



--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Thu Sep 16, 10:06 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Fri Sep 17, 6:56 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Fri Sep 17, 7:22 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., KAMEZAWA Hiroyuki, (Mon Sep 20, 6:05 pm)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Tue Sep 21, 7:14 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., KAMEZAWA Hiroyuki, (Sun Sep 26, 7:06 pm)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Mon Sep 27, 6:53 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Tue Sep 28, 5:35 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., Christoph Lameter, (Tue Sep 28, 5:49 am)
Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad fo ..., KOSAKI Motohiro, (Mon Oct 4, 5:45 am)