[PATCH 0/3 v2] per-task-delay-accounting: add memory reclaim delay

Previous thread: [PATCH -mm 14/14] bootmem: replace node_boot_start in struct bootmem_data by Johannes Weiner on Thursday, June 5, 2008 - 3:49 pm. (1 message)

Next thread: [PATCH] fat: Relax the permission check of fat_setattr() by OGAWA Hirofumi on Thursday, June 5, 2008 - 4:32 pm. (1 message)
From: Keika Kobayashi
Date: Thursday, June 5, 2008 - 4:27 pm

Hi.

This is v2 of accounting memory reclaim patch series.
Thanks to Kosaki-san, Kamezawa-san, Andrew for comments and advice!
These patches were fixed about the following.

Against: next-20080605

1) Change Log

 o Add accounting memory reclaim delay from memcgroup.
   For accounting both global and cgroup memory reclaim,
   accounting point was moved from try_to_free_pages() to do_try_to_free_pages.

 o Drop the patch regarding /proc export for memory reclaim delay.
   Because it seems that two separate ways to report are not necessary,
   this patch series supports only NETLINK and doesn't add a field to /proc/<pid>/stat.


2) Confirm the fix regarding memcgroup.

  o Previous patch can't catch memory reclaim delay from memcgroup.

    $ echo 10M > /cgroups/0/memory.limit_in_bytes

    $ ls -s test.dat
    500496 test.dat

    $ time tar cvf test.tar test.dat
    real    0m21.957s
    user    0m0.032s
    sys     0m2.348s

    $ ./delayget -d -p <pid>
    CPU             count     real total  virtual total    delay total
                     2441     2288143000     2438256954       22371958
    IO              count    delay total
                     2444    18745251314
    SWAP            count    delay total
                        0              0
    RECLAIM         count    delay total
                        0              0

  o Current patch can catch memory reclaim delay from memcgroup.

    $ echo 10M > /cgroups/0/memory.limit_in_bytes

    $ ls -s test.dat
    500496 test.dat

    $ time tar cvf test.tar test.dat
    real    0m22.563s
    user    0m0.028s
    sys     0m2.440s

    $ ./delayget -d -p <pid>
    CPU             count     real total  virtual total    delay total
                     2640     2456153500     2478353004       28366219
    IO              count    delay total
                     2628    19894214188
    SWAP            count    delay total
                        0              0
    RECLAIM         count    ...
From: Keika Kobayashi
Date: Thursday, June 5, 2008 - 4:31 pm

Sometimes, application responses become bad under heavy memory load.
Applications take a bit time to reclaim memory.
The statistics, how long memory reclaim takes, will be useful to
measure memory usage.

This patch adds accounting memory reclaim to per-task-delay-accounting
for accounting the time of do_try_to_free_pages().

<i.e>

- When System is under low memory load,
  memory reclaim may not occur.

$ free
             total       used       free     shared    buffers     cached
Mem:       8197800    1577300    6620500          0       4808    1516724
-/+ buffers/cache:      55768    8142032
Swap:     16386292          0   16386292

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 5069748  10612 3014060    0    0     0     0    3   26  0  0 100  0
 0  0      0 5069748  10612 3014060    0    0     0     0    4   22  0  0 100  0
 0  0      0 5069748  10612 3014060    0    0     0     0    3   18  0  0 100  0

Measure the time of tar command.

$ ls -s test.dat
1501472 test.dat

$ time tar cvf test.tar test.dat
real    0m13.388s
user    0m0.116s
sys     0m5.304s

$ ./delayget -d -p <pid>
CPU             count     real total  virtual total    delay total
                  428     5528345500     5477116080       62749891
IO              count    delay total
                  338     8078977189
SWAP            count    delay total
                    0              0
RECLAIM         count    delay total
                    0              0

- When system is under heavy memory load
  memory reclaim may occur.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 7159032  49724   1812   3012    0    0     0     0    3   24  0  0 100  0
 0  0 7159032  49724   1812   3012    0    0     0     0    4   24  0  0 100  ...
From: Balbir Singh
Date: Thursday, June 5, 2008 - 8:50 pm

Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: KOSAKI Motohiro
Date: Friday, June 6, 2008 - 9:40 pm

looks good to me.

	Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujistu.com>



--

From: Keika Kobayashi
Date: Thursday, June 5, 2008 - 4:32 pm

Add members for memory reclaim delay to taskstats,
and accumulate them in __delayacct_add_tsk() .

Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
---
 include/linux/taskstats.h |    4 ++++
 kernel/delayacct.c        |    3 +++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/taskstats.h b/include/linux/taskstats.h
index 5d69c07..87aae21 100644
--- a/include/linux/taskstats.h
+++ b/include/linux/taskstats.h
@@ -81,6 +81,10 @@ struct taskstats {
 	__u64	swapin_count;
 	__u64	swapin_delay_total;
 
+	/* Delay waiting for memory reclaim */
+	__u64	freepages_count;
+	__u64	freepages_delay_total;
+
 	/* cpu "wall-clock" running time
 	 * On some architectures, value will adjust for cpu time stolen
 	 * from the kernel in involuntary waits due to virtualization.
diff --git a/kernel/delayacct.c b/kernel/delayacct.c
index 84b6782..b3179da 100644
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -145,8 +145,11 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 	d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp;
 	tmp = d->swapin_delay_total + tsk->delays->swapin_delay;
 	d->swapin_delay_total = (tmp < d->swapin_delay_total) ? 0 : tmp;
+	tmp = d->freepages_delay_total + tsk->delays->freepages_delay;
+	d->freepages_delay_total = (tmp < d->freepages_delay_total) ? 0 : tmp;
 	d->blkio_count += tsk->delays->blkio_count;
 	d->swapin_count += tsk->delays->swapin_count;
+	d->freepages_count += tsk->delays->freepages_count;
 	spin_unlock_irqrestore(&tsk->delays->lock, flags);
 
 done:
-- 
1.5.0.6

--

From: Balbir Singh
Date: Thursday, June 5, 2008 - 9:13 pm

Two suggested changes

1. Please add all fields at the end (otherwise we risk breaking compatibility)
2. please also update Documentation/accounting/taskstats-struct.txt

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: Hiroshi Shimamoto
Date: Friday, June 6, 2008 - 2:58 pm

And update TASKSTATS_VERSION, right?

Thanks,
Hiroshi Shimamoto
--

From: Keika Kobayashi
Date: Friday, June 6, 2008 - 5:24 pm

Add members for memory reclaim delay to taskstats,
and accumulate them in __delayacct_add_tsk() .

Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
---


Thanks for your suggestion.
Update version is here.

In taskstats-struct.txt, there was not a discription for "Time accounting for SMT machines".
This patch was made after the following patch had been applied.
http://lkml.org/lkml/2008/6/6/436 .

 Documentation/accounting/taskstats-struct.txt |    7 +++++++
 include/linux/taskstats.h                     |    6 +++++-
 kernel/delayacct.c                            |    3 +++
 3 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/Documentation/accounting/taskstats-struct.txt b/Documentation/accounting/taskstats-struct.txt
index cd784f4..b988d11 100644
--- a/Documentation/accounting/taskstats-struct.txt
+++ b/Documentation/accounting/taskstats-struct.txt
@@ -26,6 +26,8 @@ There are three different groups of fields in the struct taskstats:
 
 5) Time accounting for SMT machines
 
+6) Extended delay accounting fields for memory reclaim
+
 Future extension should add fields to the end of the taskstats struct, and
 should not change the relative position of each field within the struct.
 
@@ -170,4 +172,9 @@ struct taskstats {
 	__u64	ac_utimescaled;		/* utime scaled on frequency etc */
 	__u64	ac_stimescaled;		/* stime scaled on frequency etc */
 	__u64	cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
+
+6) Extended delay accounting fields for memory reclaim
+	/* Delay waiting for memory reclaim */
+	__u64	freepages_count;
+	__u64	freepages_delay_total;
 }
diff --git a/include/linux/taskstats.h b/include/linux/taskstats.h
index 5d69c07..18269e9 100644
--- a/include/linux/taskstats.h
+++ b/include/linux/taskstats.h
@@ -31,7 +31,7 @@
  */
 
 
-#define TASKSTATS_VERSION	6
+#define TASKSTATS_VERSION	7
 #define TS_COMM_LEN		32	/* should be >= TASK_COMM_LEN
 					 * in linux/sched.h */
 
@@ -157,6 +157,10 @@ struct taskstats {
 ...
From: KOSAKI Motohiro
Date: Friday, June 6, 2008 - 9:44 pm

The code of this patch looks good to me.
but this change log isn't so good.

Do you want remain email bare discssion on git log?


--

From: Keika Kobayashi
Date: Monday, June 9, 2008 - 9:27 am

On Sat, 07 Jun 2008 13:44:07 +0900

Thanks for your comment.

But I confirmed this patch did not remain
"email bare discssion" on git log.
--

From: Keika Kobayashi
Date: Thursday, June 5, 2008 - 4:33 pm

Update document and make getdelays.c show delay accounting for memory reclaim.

For making a distinction between "swapping in pages" and "memory reclaim"
in getdelays.c, MEM is changed to SWAP.

Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
---
 Documentation/accounting/delay-accounting.txt |   11 ++++++++---
 Documentation/accounting/getdelays.c          |    8 ++++++--
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt
index 1443cd7..8a12f07 100644
--- a/Documentation/accounting/delay-accounting.txt
+++ b/Documentation/accounting/delay-accounting.txt
@@ -11,6 +11,7 @@ the delays experienced by a task while
 a) waiting for a CPU (while being runnable)
 b) completion of synchronous block I/O initiated by the task
 c) swapping in pages
+d) memory reclaim
 
 and makes these statistics available to userspace through
 the taskstats interface.
@@ -41,7 +42,7 @@ this structure. See
      include/linux/taskstats.h
 for a description of the fields pertaining to delay accounting.
 It will generally be in the form of counters returning the cumulative
-delay seen for cpu, sync block I/O, swapin etc.
+delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
 
 Taking the difference of two successive readings of a given
 counter (say cpu_delay_total) for a task will give the delay
@@ -94,7 +95,9 @@ CPU	count	real total	virtual total	delay total
 	7876	92005750	100000000	24001500
 IO	count	delay total
 	0	0
-MEM	count	delay total
+SWAP	count	delay total
+	0	0
+RECLAIM	count	delay total
 	0	0
 
 Get delays seen in executing a given simple command
@@ -108,5 +111,7 @@ CPU	count	real total	virtual total	delay total
 	6	4000250		4000000		0
 IO	count	delay total
 	0	0
-MEM	count	delay total
+SWAP	count	delay total
+	0	0
+RECLAIM	count	delay total
 	0	0
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index ...
From: Balbir Singh
Date: Thursday, June 5, 2008 - 9:14 pm

Looks good

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: KOSAKI Motohiro
Date: Friday, June 6, 2008 - 9:46 pm

looks good to me.

but I don't know s/MEM/SWAP/ renaming is good or bad.
You should hear delayacct developer opinion. IMHO.





--

From: Balbir Singh
Date: Thursday, June 5, 2008 - 7:18 pm

Looks interesting, this data is for the whole system or memcgroup? If it is for
memcgroup, we should be using cgroupstats.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: KAMEZAWA Hiroyuki
Date: Thursday, June 5, 2008 - 8:14 pm

On Fri, 06 Jun 2008 07:48:25 +0530
This patch itself is for per-task-stat. And it seems cgroup_stat doesn't handle
delay accounting now. It just counts # of processes in some status.

Thanks,
-Kame





--

From: Keika Kobayashi
Date: Thursday, June 5, 2008 - 8:21 pm

On Fri, 06 Jun 2008 07:48:25 +0530

Thanks for your comment.
This accounting, which is named "RECLAIM", is global and memcgroup reclaim delay
and this data is value per task.

Unfortunately, I'm not sure what the whole system means.
Could you tell me your point?
--

From: Balbir Singh
Date: Thursday, June 5, 2008 - 8:41 pm

By whole system, I meant global reclaim.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--

From: KAMEZAWA Hiroyuki
Date: Thursday, June 5, 2008 - 8:01 pm

On Thu, 5 Jun 2008 16:27:59 -0700

Thanks, I like this set.


--

Previous thread: [PATCH -mm 14/14] bootmem: replace node_boot_start in struct bootmem_data by Johannes Weiner on Thursday, June 5, 2008 - 3:49 pm. (1 message)

Next thread: [PATCH] fat: Relax the permission check of fat_setattr() by OGAWA Hirofumi on Thursday, June 5, 2008 - 4:32 pm. (1 message)