Re: [PATCH 2/2] writeback: Adding pages_dirtied and pages_entered_writeback

Previous thread: [PATCH 1/2] mm: helper functions for dirty and writeback accounting by Michael Rubin on Wednesday, August 4, 2010 - 5:43 pm. (1 message)

Next thread: linux-next: manual merge of the genesis tree with Linus' tree by Stephen Rothwell on Wednesday, August 4, 2010 - 5:53 pm. (1 message)
From: Michael Rubin
Date: Wednesday, August 4, 2010 - 5:43 pm

Patch #1 sets up some helper functions for accounting.

Patch #2 adds writeback files for visibility

To help developers and applications gain visibility into writeback
behaviour adding two read-only sysctl files into /proc/sys/vm.
These files allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance.

  # cat /proc/sys/vm/pages_dirtied
  3747
  # cat /proc/sys/vm/pages_entered_writeback
  3618

These two new files are necessary to give visibility into writeback
behaviour. We have /proc/diskstats which lets us understand the io in
the block layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system,
if it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers
to see when writeback is not able to keep up with incoming traffic and
the rate of dirty memory being sent to the IO back end. This allows
folks to understand their io workloads and track kernel issues. Non
kernel engineers at Google often use these counters to solve puzzling
performance problems.


Michael Rubin (2):
  mm: helper functions for dirty and writeback accounting
  writeback: Adding pages_dirtied and pages_entered_writeback

 Documentation/sysctl/vm.txt |   20 +++++++++++++++---
 drivers/base/node.c         |   14 +++++++++++++
 fs/ceph/addr.c              |    8 +-----
 fs/nilfs2/segment.c         |    2 +-
 include/linux/mm.h          |    1 +
 include/linux/mmzone.h      |    2 +
 include/linux/writeback.h   |    9 ++++++++
 kernel/sysctl.c             |   14 +++++++++++++
 mm/page-writeback.c         |   45 ...
From: Michael Rubin
Date: Wednesday, August 4, 2010 - 5:43 pm

To help developers and applications gain visibility into writeback
behaviour adding four read only sysctl files into /proc/sys/vm.
These files allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance.

   # cat /proc/sys/vm/pages_dirtied
   3747
   # cat /proc/sys/vm/pages_entered_writeback
   3618

Documentation/vm.txt has been updated.

In order to track the "cleaned" and "dirtied" counts we added two
vm_stat_items.  Per memory node stats have been added also. So we can
see per node granularity:

   # cat /sys/devices/system/node/node20/writebackstat
   Node 20 pages_writeback: 0 times
   Node 20 pages_dirtied: 0 times

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 Documentation/sysctl/vm.txt |   20 ++++++++++++++++----
 drivers/base/node.c         |   14 ++++++++++++++
 include/linux/mmzone.h      |    2 ++
 include/linux/writeback.h   |    9 +++++++++
 kernel/sysctl.c             |   14 ++++++++++++++
 mm/page-writeback.c         |   36 ++++++++++++++++++++++++++++++------
 mm/vmstat.c                 |    2 ++
 7 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 5fdbb61..de9ec6a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -50,6 +50,8 @@ Currently, these files are in /proc/sys/vm:
 - overcommit_memory
 - overcommit_ratio
 - page-cluster
+- pages_dirtied
+- pages_entered_writeback
 - panic_on_oom
 - percpu_pagelist_fraction
 - stat_interval
@@ -425,10 +427,7 @@ See Documentation/vm/hugetlbpage.txt
 nr_pdflush_threads
 
 The current number of pdflush threads.  This value is read-only.
-The value changes according to the number of dirty pages in the system.
-
-When necessary, additional pdflush threads are created, one per second, up to
-nr_pdflush_threads_max.
+This value is obsolete.
 
 ==============================================================
 
@@ -582,6 +581,19 @@ ...
From: Michael Rubin
Date: Thursday, August 5, 2010 - 3:05 pm

I did not know they would show up in /proc/vmstat.

I thought it made sense to put them in /proc/sys/vm since the other
writeback controls are there.
but have no problems just adding them to /prov/vmstat if that makes more sense.

mrubin
--

From: KOSAKI Motohiro
Date: Thursday, August 5, 2010 - 4:56 pm

?

/proc/vmstat already have both.

cat /proc/vmstat |grep nr_dirty
cat /proc/vmstat |grep nr_writeback

Also, /sys/devices/system/node/node0/meminfo show per-node stat.

Perhaps, I'm missing your point.


--

From: Michael Rubin
Date: Thursday, August 5, 2010 - 5:11 pm

On Thu, Aug 5, 2010 at 4:56 PM, KOSAKI Motohiro

These only show the number of dirty pages present in the system at the
point they are queried.
The counter I am trying to add are increasing over time. They allow
developers to see rates of pages being dirtied and entering writeback.
Which is very helpful.

mrubin
--

From: KOSAKI Motohiro
Date: Thursday, August 5, 2010 - 5:18 pm

Usually administrators get the data two times and subtract them. Isn't it sufficient?


--

From: Andrew Morton
Date: Thursday, August 5, 2010 - 5:27 pm

On Fri,  6 Aug 2010 09:18:59 +0900 (JST)

Nope.  The existing nr_dirty is "number of pages dirtied since boot"
minus "number of pages cleaned since boot".  If you do the
wait-one-second-then-subtract thing on nr_dirty, the result is
dirtying-bandwidth minus cleaning-bandwidth, and can't be used to
determine dirtying-bandwidth.

I can see that a graph of dirtying events versus time could be an
interesting thing.  I don't see how it could be obtained using the
existing instrumentation.  tracepoints, probably..

--

From: KOSAKI Motohiro
Date: Thursday, August 5, 2010 - 5:44 pm

Technically, yes. I meant, _now_, typical administrators are using the 
subtraction.
Do you mean this is wrong? or do you mean you have another use case?


I think it depend on frequency of the usecase. If the usecase is enouth
major, convenience way (e.g. /proc/vmstat) is very helpful.

probably, I haven't understand the usecase of this feature.



--

From: Michael Rubin
Date: Friday, August 6, 2010 - 12:19 am

Andrew I was thinking about this today. And I think there is a case
for keeping the proc files.
Christoph was the one who pointed out to me that is their proper home
and I think he's right. Most if not all the tunables for writeback are
there. When one is trying to find the state of the system's writeback
activity that's the directory. Only having these variables in
/proc/vmstat to me feels like a way to make sure that users who would
need them won't find them unless they are reading source. And these
are folks who aren't reading source.

/proc/vmstat _does_ look like a good place to put the thresholds as it
already has similar values as the thresholds suck as
kswapd_low_wmark_hit_quickly.

mrubin
--

Previous thread: [PATCH 1/2] mm: helper functions for dirty and writeback accounting by Michael Rubin on Wednesday, August 4, 2010 - 5:43 pm. (1 message)

Next thread: linux-next: manual merge of the genesis tree with Linus' tree by Stephen Rothwell on Wednesday, August 4, 2010 - 5:53 pm. (1 message)