Patch #1 sets up some helper functions for accounting. Patch #2 adds writeback files for visibility To help developers and applications gain visibility into writeback behaviour adding two read-only sysctl files into /proc/sys/vm. These files allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. # cat /proc/sys/vm/pages_dirtied 3747 # cat /proc/sys/vm/pages_entered_writeback 3618 These two new files are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Michael Rubin (2): mm: helper functions for dirty and writeback accounting writeback: Adding pages_dirtied and pages_entered_writeback Documentation/sysctl/vm.txt | 20 +++++++++++++++--- drivers/base/node.c | 14 +++++++++++++ fs/ceph/addr.c | 8 +----- fs/nilfs2/segment.c | 2 +- include/linux/mm.h | 1 + include/linux/mmzone.h | 2 + include/linux/writeback.h | 9 ++++++++ kernel/sysctl.c | 14 +++++++++++++ mm/page-writeback.c | 45 ...
To help developers and applications gain visibility into writeback behaviour adding four read only sysctl files into /proc/sys/vm. These files allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. # cat /proc/sys/vm/pages_dirtied 3747 # cat /proc/sys/vm/pages_entered_writeback 3618 Documentation/vm.txt has been updated. In order to track the "cleaned" and "dirtied" counts we added two vm_stat_items. Per memory node stats have been added also. So we can see per node granularity: # cat /sys/devices/system/node/node20/writebackstat Node 20 pages_writeback: 0 times Node 20 pages_dirtied: 0 times Signed-off-by: Michael Rubin <mrubin@google.com> --- Documentation/sysctl/vm.txt | 20 ++++++++++++++++---- drivers/base/node.c | 14 ++++++++++++++ include/linux/mmzone.h | 2 ++ include/linux/writeback.h | 9 +++++++++ kernel/sysctl.c | 14 ++++++++++++++ mm/page-writeback.c | 36 ++++++++++++++++++++++++++++++------ mm/vmstat.c | 2 ++ 7 files changed, 87 insertions(+), 10 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 5fdbb61..de9ec6a 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -50,6 +50,8 @@ Currently, these files are in /proc/sys/vm: - overcommit_memory - overcommit_ratio - page-cluster +- pages_dirtied +- pages_entered_writeback - panic_on_oom - percpu_pagelist_fraction - stat_interval @@ -425,10 +427,7 @@ See Documentation/vm/hugetlbpage.txt nr_pdflush_threads The current number of pdflush threads. This value is read-only. -The value changes according to the number of dirty pages in the system. - -When necessary, additional pdflush threads are created, one per second, up to -nr_pdflush_threads_max. +This value is obsolete. ============================================================== @@ -582,6 +581,19 @@ ...
I did not know they would show up in /proc/vmstat. I thought it made sense to put them in /proc/sys/vm since the other writeback controls are there. but have no problems just adding them to /prov/vmstat if that makes more sense. mrubin --
? /proc/vmstat already have both. cat /proc/vmstat |grep nr_dirty cat /proc/vmstat |grep nr_writeback Also, /sys/devices/system/node/node0/meminfo show per-node stat. Perhaps, I'm missing your point. --
On Thu, Aug 5, 2010 at 4:56 PM, KOSAKI Motohiro These only show the number of dirty pages present in the system at the point they are queried. The counter I am trying to add are increasing over time. They allow developers to see rates of pages being dirtied and entering writeback. Which is very helpful. mrubin --
Usually administrators get the data two times and subtract them. Isn't it sufficient? --
On Fri, 6 Aug 2010 09:18:59 +0900 (JST) Nope. The existing nr_dirty is "number of pages dirtied since boot" minus "number of pages cleaned since boot". If you do the wait-one-second-then-subtract thing on nr_dirty, the result is dirtying-bandwidth minus cleaning-bandwidth, and can't be used to determine dirtying-bandwidth. I can see that a graph of dirtying events versus time could be an interesting thing. I don't see how it could be obtained using the existing instrumentation. tracepoints, probably.. --
Technically, yes. I meant, _now_, typical administrators are using the subtraction. Do you mean this is wrong? or do you mean you have another use case? I think it depend on frequency of the usecase. If the usecase is enouth major, convenience way (e.g. /proc/vmstat) is very helpful. probably, I haven't understand the usecase of this feature. --
Andrew I was thinking about this today. And I think there is a case for keeping the proc files. Christoph was the one who pointed out to me that is their proper home and I think he's right. Most if not all the tunables for writeback are there. When one is trying to find the state of the system's writeback activity that's the directory. Only having these variables in /proc/vmstat to me feels like a way to make sure that users who would need them won't find them unless they are reading source. And these are folks who aren't reading source. /proc/vmstat _does_ look like a good place to put the thresholds as it already has similar values as the thresholds suck as kswapd_low_wmark_hit_quickly. mrubin --
