Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

Previous thread: [BUG 2.6.20-rc3-mm1] raid1 mount blocks for ever by Fengguang Wu on Friday, January 5, 2007 - 10:50 pm. (11 messages)

Next thread: Re: RTC subsystem and fractions of seconds by David Brownell on Friday, January 5, 2007 - 11:49 pm. (10 messages)
To: linux-kernel Mailing List <linux-kernel@...>
Date: Friday, January 5, 2007 - 10:39 pm

Linux 2.6.19.1 SMP [2] on Pentium D...
I was running dt-15.14 [2] and I ran
"cinfo datafile" (it does mincore()).
Well it went OK but when I ran "strace cinfo datafile"...:
04:18:48.062466 mincore(0x37f1f000, 2147266560,
...
2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
2007-01-06 04:19:03.788221500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
2007-01-06 04:19:03.788223500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
2007-01-06 04:19:03.788224500 <4> [<c0103dcb>] show_trace+0x12/0x14
2007-01-06 04:19:03.788225500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
2007-01-06 04:19:03.788227500 <4> [<c01546a6>] cancel_dirty_page+0x7e/0x80
2007-01-06 04:19:03.788228500 <4> [<c01546c2>] truncate_complete_page+0x1a/0x47
2007-01-06 04:19:03.788229500 <4> [<c0154854>] truncate_inode_pages_range+0x114/0x2ae
2007-01-06 04:19:03.788245500 <4> [<c0154a08>] truncate_inode_pages+0x1a/0x1c
2007-01-06 04:19:03.788247500 <4> [<c0269244>] fs_flushinval_pages+0x40/0x77
2007-01-06 04:19:03.788248500 <4> [<c026d48c>] xfs_write+0x8c4/0xb68
2007-01-06 04:19:03.788250500 <4> [<c0268b14>] xfs_file_aio_write+0x7e/0x95
2007-01-06 04:19:03.788251500 <4> [<c016d66c>] do_sync_write+0xca/0x119
2007-01-06 04:19:03.788265500 <4> [<c016d842>] vfs_write+0x187/0x18c
2007-01-06 04:19:03.788267500 <4> [<c016d8e8>] sys_write+0x3d/0x64
2007-01-06 04:19:03.788268500 <4> [<c0102e73>] syscall_call+0x7/0xb
2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410
2007-01-06 04:19:03.788289500 <4> =======================

funny that when stracing, mincore does not return?

$ time cinfo dtfile-2091
dtfile-2091: 524285 pages, 0 pages cached (0.00%)

real 0m0.049s
user 0m0.003s
sys 0m0.046s

safari 6941 29.9 10.8 2098768 108788 pts/2 D+ 04:20 3:41 strace -vfttT cinfo dtfile-2091

stra...

To: Sami Farin <7atbggg02@...>
Cc: Nathan Scott <nathans@...>, David Chinner <dgc@...>, Nick Piggin <nickpiggin@...>, <linux-kernel@...>
Date: Saturday, January 6, 2007 - 5:11 pm

You rightly noted in a followup that there have been changes to
mincore, but I doubt they have any bearing on this: I think the

So... XFS uses truncate_inode_pages when serving the write system call.
That's very inventive, and now it looks like Linus' cancel_dirty_page
and new warning have caught it out. VM people expect it to be called
either when freeing an inode no longer in use, or when doing a truncate,

Gosh. Might be better to reproduce it on 2.6.20-rc3;
but I think we have to hand this over to some XFS people.

-

To: Hugh Dickins <hugh@...>
Cc: Sami Farin <7atbggg02@...>, Nathan Scott <nathans@...>, <xfs@...>, Nick Piggin <nickpiggin@...>, <linux-kernel@...>
Date: Sunday, January 7, 2007 - 6:23 pm

Only when you are doing direct I/O. XFS does direct writes without
the i_mutex held, so it has to invalidate the range of cached pages
while holding it's own locks to ensure direct I/O cache semantics are

Ok, so we are punching a hole in the middle of the address space
because we are doing direct I/O on it and need to invalidate the
cache.

How are you supposed to invalidate a range of pages in a mapping for
this case, then? invalidate_mapping_pages() would appear to be the
candidate (the generic code uses this), but it _skips_ pages that
are already mapped. invalidate_mapping_pages() then advises you to
use truncate_inode_pages():

/**
* invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
* @mapping: the address_space which holds the pages to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
*
* This function only removes the unlocked pages, if you want to
* remove all the pages of one inode, you must call truncate_inode_pages.
*
* invalidate_mapping_pages() will not block on IO activity. It will not
* invalidate pages which are dirty, locked, under writeback or mapped into
* pagetables.
*/

We want to remove all pages within the range given, so, as directed by
the comment here, we use truncate_inode_pages(). Says nothing about
mappings needing to be removed first so I guess that's where we've
been caught.....

I think we can use invalidate_inode_pages2_range(), but that doesn't
handle partial page invalidations. I think this will be ok, but it's
going to need some serious fsx testing on blocksize != page size
configs.

So, am I correct in assuming we should be calling invalidate_inode_pages2_range()
instead of truncate_inode_pages()?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

To: David Chinner <dgc@...>
Cc: Hugh Dickins <hugh@...>, Sami Farin <7atbggg02@...>, Nathan Scott <nathans@...>, <xfs@...>, Nick Piggin <nickpiggin@...>, <linux-kernel@...>
Date: Sunday, January 7, 2007 - 6:48 pm

On Mon, 8 Jan 2007 09:23:41 +1100

That would be conventional.
-

To: Andrew Morton <akpm@...>
Cc: David Chinner <dgc@...>, Hugh Dickins <hugh@...>, Sami Farin <7atbggg02@...>, <xfs@...>, Nick Piggin <nickpiggin@...>, <linux-kernel@...>
Date: Sunday, January 7, 2007 - 7:04 pm

/me looks at how it's used in invalidate_inode_pages2_range() and

.... in that case the following patch should fix the warning:

---
fs/xfs/linux-2.6/xfs_fs_subr.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_fs_subr.c 2006-12-12 12:05:17.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c 2007-01-08 09:30:22.056571711 +1100
@@ -21,6 +21,8 @@ int fs_noerr(void) { return 0; }
int fs_nosys(void) { return ENOSYS; }
void fs_noval(void) { return; }

+#define XFS_OFF_TO_PCSIZE(off) \
+ (((off) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT)
void
fs_tosspages(
bhv_desc_t *bdp,
@@ -32,7 +34,9 @@ fs_tosspages(
struct inode *ip = vn_to_inode(vp);

if (VN_CACHED(vp))
- truncate_inode_pages(ip->i_mapping, first);
+ invalidate_inode_pages2_range(ip->i_mapping,
+ XFS_OFF_TO_PCSIZE(first),
+ XFS_OFF_TO_PCSIZE(last));
}

void
@@ -49,7 +53,9 @@ fs_flushinval_pages(
if (VN_TRUNC(vp))
VUNTRUNCATE(vp);
filemap_write_and_wait(ip->i_mapping);
- truncate_inode_pages(ip->i_mapping, first);
+ invalidate_inode_pages2_range(ip->i_mapping,
+ XFS_OFF_TO_PCSIZE(first),
+ XFS_OFF_TO_PCSIZE(last));
}
}

--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

To: <xfs@...>, <linux-kernel@...>
Cc: Andrew Morton <akpm@...>, David Chinner <dgc@...>, Hugh Dickins <hugh@...>, Nick Piggin <nickpiggin@...>
Date: Monday, January 8, 2007 - 7:58 am

I tried dt+strace+cinfo with this patch applied and got no warnings.
Thanks for quick fix.

--
Do what you love because life is too short for anything else.

-

To: <linux-kernel@...>
Cc: Hugh Dickins <hugh@...>, Nathan Scott <nathans@...>, David Chinner <dgc@...>, Nick Piggin <nickpiggin@...>
Date: Saturday, January 6, 2007 - 9:24 pm

BUG does not happen if I do not do "strace cinfo dtfile" with O_DIRECT
test file. It's easy to reproduce.
Without strace BUG does not happen.

Now I got it again, with also the mincore patch applied:

01:48:42.831060 mincore(0x37ff1000, 2147254272, ^[

2007-01-07 01:48:51.480531500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
2007-01-07 01:48:51.480532500 <4> [<c0103cff>] dump_trace+0x215/0x21a
2007-01-07 01:48:51.480557500 <4> [<c0103da7>] show_trace_log_lvl+0x1a/0x30
2007-01-07 01:48:51.480559500 <4> [<c0103dcf>] show_trace+0x12/0x14
2007-01-07 01:48:51.480560500 <4> [<c0103ecc>] dump_stack+0x19/0x1b
2007-01-07 01:48:51.480561500 <4> [<c0155616>] cancel_dirty_page+0x7e/0x80
2007-01-07 01:48:51.480562500 <4> [<c0155632>] truncate_complete_page+0x1a/0x47
2007-01-07 01:48:51.480563500 <4> [<c01557c4>] truncate_inode_pages_range+0x114/0x2ae
2007-01-07 01:48:51.480564500 <4> [<c0155978>] truncate_inode_pages+0x1a/0x1c
2007-01-07 01:48:51.480574500 <4> [<c026a558>] fs_flushinval_pages+0x40/0x77
2007-01-07 01:48:51.480575500 <4> [<c026e7a8>] xfs_write+0x8c4/0xb68
2007-01-07 01:48:51.480576500 <4> [<c0269e28>] xfs_file_aio_write+0x7e/0x95
2007-01-07 01:48:51.480577500 <4> [<c016e5d0>] do_sync_write+0xca/0x119
2007-01-07 01:48:51.480578500 <4> [<c016e7a6>] vfs_write+0x187/0x18c
2007-01-07 01:48:51.480579500 <4> [<c016e84c>] sys_write+0x3d/0x64
2007-01-07 01:48:51.480589500 <4> [<c0102e77>] syscall_call+0x7/0xb
2007-01-07 01:48:51.480590500 <4> [<00bed410>] 0xbed410

$ /sbin/sysctl -n vm.dirty_expire_centisecs
999

cancel_dirty_page would be more useful if it executed WARN_ON
at max once per 10s or something instead of five times out of 2^32 or 2^64
errors... I mean, user might think program/kernel started to work OK when
BUGs stop if he/she doesn't check cancel_dirty_page() funct...

To: linux-kernel Mailing List <linux-kernel@...>
Date: Friday, January 5, 2007 - 11:30 pm

Forgot to do "git-whatchanged mm/mincore.c"...
Looks like git and 2.6.19.2 review patch include a fix for mincore.
Maybe it fixes this issue.

--
-

Previous thread: [BUG 2.6.20-rc3-mm1] raid1 mount blocks for ever by Fengguang Wu on Friday, January 5, 2007 - 10:50 pm. (11 messages)

Next thread: Re: RTC subsystem and fractions of seconds by David Brownell on Friday, January 5, 2007 - 11:49 pm. (10 messages)