"We don't want to introduce pointless delays in throttle_vm_writeout() when the writeback limits are not yet exceeded, do we?" asked Fengguang Wu as the description of his patch to mm/page-writeback.c [1]. Andrew Morton replied, "this is a pretty major bugfix, explaining, "this patch has the potential to significantly alter the dynamics of the VM behaviour under particular workloads. It might turn up other stuff..." He continued:
"I wonder why nobody noticed this happening. Either a) it turns out that kswapd is doing a good job and such callers don't do direct reclaim much or b) nobody is doing any in-depth kernel instrumentation.
"Now, how _would_ one notice this problem? We don't have very good tools, really. Booting with "profile=sleep" and looking at the profile data would be one way. Repeatedly doing sysrq-T is another. Perhaps the new lockstat-via-lockdep code would allow this to be observed in some fashion, dunno."
From: Fengguang Wu <wfg@...>
Subject: [PATCH] writeback: remove unnecessary wait in throttle_vm_writeout()
[1]Date: Sep 26, 9:50 pm 2007
We don't want to introduce pointless delays in throttle_vm_writeout()
when the writeback limits are not yet exceeded, do we?
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
mm/page-writeback.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
--- linux-2.6.23-rc8-mm1.orig/mm/page-writeback.c
+++ linux-2.6.23-rc8-mm1/mm/page-writeback.c
@@ -507,16 +507,6 @@ void throttle_vm_writeout(gfp_t gfp_mask
long background_thresh;
long dirty_thresh;
- if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
- /*
- * The caller might hold locks which can prevent IO completion
- * or progress in the filesystem. So we cannot just sit here
- * waiting for IO to complete.
- */
- congestion_wait(WRITE, HZ/10);
- return;
- }
-
for ( ; ; ) {
get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
@@ -530,6 +520,14 @@ void throttle_vm_writeout(gfp_t gfp_mask
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
congestion_wait(WRITE, HZ/10);
+
+ /*
+ * The caller might hold locks which can prevent IO completion
+ * or progress in the filesystem. So we cannot just sit here
+ * waiting for IO to complete.
+ */
+ if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO))
+ break;
}
}
-
From: Andrew Morton <akpm@...>
Subject: Re: [PATCH] writeback: remove unnecessary wait in throttle_vm_writeout()
[1]Date: Sep 27, 4:47 pm 2007
On Thu, 27 Sep 2007 09:50:16 +0800
Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> We don't want to introduce pointless delays in throttle_vm_writeout()
> when the writeback limits are not yet exceeded, do we?
>
> Cc: Nick Piggin <nickpiggin@yahoo.com.au>
> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Cc: Kumar Gala <galak@kernel.crashing.org>
> Cc: Pete Zaitcev <zaitcev@redhat.com>
> Cc: Greg KH <greg@kroah.com>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> ---
> mm/page-writeback.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> --- linux-2.6.23-rc8-mm1.orig/mm/page-writeback.c
> +++ linux-2.6.23-rc8-mm1/mm/page-writeback.c
> @@ -507,16 +507,6 @@ void throttle_vm_writeout(gfp_t gfp_mask
> long background_thresh;
> long dirty_thresh;
>
> - if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> - /*
> - * The caller might hold locks which can prevent IO completion
> - * or progress in the filesystem. So we cannot just sit here
> - * waiting for IO to complete.
> - */
> - congestion_wait(WRITE, HZ/10);
> - return;
> - }
> -
> for ( ; ; ) {
> get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
>
> @@ -530,6 +520,14 @@ void throttle_vm_writeout(gfp_t gfp_mask
> global_page_state(NR_WRITEBACK) <= dirty_thresh)
> break;
> congestion_wait(WRITE, HZ/10);
> +
> + /*
> + * The caller might hold locks which can prevent IO completion
> + * or progress in the filesystem. So we cannot just sit here
> + * waiting for IO to complete.
> + */
> + if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO))
> + break;
> }
> }
>
This is a pretty major bugfix.
GFP_NOIO and GFP_NOFS callers should have been spending really large
amounts of time stuck in that sleep.
I wonder why nobody noticed this happening. Either a) it turns out that
kswapd is doing a good job and such callers don't do direct reclaim much or
b) nobody is doing any in-depth kernel instrumentation.
Now, how _would_ one notice this problem? We don't have very good tools,
really. Booting with "profile=sleep" and looking at the profile data would
be one way. Repeatedly doing sysrq-T is another. Perhaps the new
lockstat-via-lockdep code would allow this to be observed in some fashion,
dunno.
Anyway, this patch has the potential to significantly alter the dynamics of
the VM behaviour under particular workloads. It might turn up other
stuff...
-
Related links:
- Archive of above thread [1]