Hi, I found a bug in current -git: On my system on of cpus stays 100% in iowait mode (I have core 2 duo) Otherwise the system works OK, no disk activity and/or slowdown. Suspecting that this is a swap-related problem I tried to turn swap of, but it doesn't affect anything. It is probably some accounting bug. If I start with init=/bin/bash, then this disappears. I tried then to start usual /etc/init.d scripts then, and first one to show this bug was gpm. but then I rebooted the system to X without gpm, and I still see 100% iowait. No additional messages in dmesg. I tried to bisect this, but eventually I run into other bugs that cause system to oops early. This is very rough estimate of the bug location: HEAD ...... c8f30ae54714abf494d79826d90b5e4844fbf355 - has the above bug, but otherwise works properly ..... 5c8e191e8437616a498a8e1cc0af3dd0d32bbff2 - fails early ..... f4a1c2bce002f683801bcdbbc9fd89804614fb6b - last known working revision Best regards, Maxim Levitsky PS: .config attached.
On Mon, 22 Oct 2007 08:22:52 +0200 You can pick a different revision to test with: git-reset --hard "SHA1" Choose one with "git-bisect visualize". -- Paolo Ornati Linux 2.6.23-ge8b8c977 on x86_64 -
Well, I know that, and I did try this a lot. The problem is that between good and bad revisions there are few bugs that cause the system to oops early, thus I can't tell whenever the 100% iowait bug is present or not. Best regards, Maxim Levitsky -
does sysrq-t show any D state tasks? -
This one: Probably per-block device dirty writeback? I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b Thanks for the pointer. [ 673.365631] pdflush D c21bdecc 0 221 2 [ 673.365635] c21bdee0 00000046 00000002 c21bdecc c21bdec4 00000000 c21b3000 00000002 [ 673.365643] c0134892 c21b3164 c1e00200 00000001 c7109280 c21bdec0 c03ff849 c21bdef0 [ 673.365650] 00052974 00000000 000000ff 00000000 00000000 00000000 c21bdef0 000529dc [ 673.365657] Call Trace: [ 673.365659] [<c03fd728>] schedule_timeout+0x48/0xc0 [ 673.365663] [<c03fd50e>] io_schedule_timeout+0x5e/0xb0 [ 673.365667] [<c0170d11>] congestion_wait+0x71/0x90 [ 673.365671] [<c016b92e>] wb_kupdate+0x9e/0xf0 [ 673.365675] [<c016beb2>] pdflush+0x102/0x1d0 [ 673.365679] [<c013fa82>] kthread+0x42/0x70 [ 673.365683] [<c01050df>] kernel_thread_helper+0x7/0x18 Best regards, Maxim Levitsky -
That looks more like the inode writeback patches from Wu than the per bdi dirty stuff. The later typically hangs in balance_dirty_pages(). -
Yes, you are right, both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 3e26c149c358529b1605f8959341d34bc4b880a3 work fine But I didn't pay attention that those are before f4a1c2bce002f683801bcdbbc9fd89804614fb6b. So, back to the drawing board.... :-) Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after writeback patches. Thanks, Best regards, Maxim Levitsky -
Thank you. I'll try if I can reproduce it locally... Fengguang -
Bingo, Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug. I will now bisect to find exact patch that caused this bug, Thanks, Maxim Levitsky -
This one is most relevant: 1f7decf6d9f06dac008b8d66935c0c3b18e564f9 writeback: introduce writeback_control.more_io to indicate more io Still compiling the kernel... Thank you, Fengguang -
Hi, I Bisected this bug to exactly this commit: 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b writeback: introduce writeback_control.more_io to indicate more io Reverting it and compiling latest git shows no more signs of that bug. Thanks, Best regards, Maxim Levitsky -
Thank you very much.
I guess your system has some difficulty in writing back some inodes.
(i.e. a bug disclosed by this patch, the 100% iowait only makes it
more obvious)
I cannot reproduce it with your .config, so would you recompile and
run the kernel with the above commit _and_ the below debugging patch?
Thank you,
Fengguang
---
fs/fs-writeback.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- linux-2.6.23-git17.orig/fs/fs-writeback.c
+++ linux-2.6.23-git17/fs/fs-writeback.c
@@ -164,12 +164,25 @@ static void redirty_tail(struct inode *i
list_move(&inode->i_list, &sb->s_dirty);
}
+#define requeue_io(inode) \
+ do { \
+ __requeue_io(inode, __LINE__); \
+ } while (0)
+
/*
* requeue inode for re-scanning after sb->s_io list is exhausted.
*/
-static void requeue_io(struct inode *inode)
+static void __requeue_io(struct inode *inode, int line)
{
list_move(&inode->i_list, &inode->i_sb->s_more_io);
+
+ printk(KERN_DEBUG "redirtied inode %lu size %llu at %02x:%02x(%s), line %d.\n",
+ inode->i_ino,
+ i_size_read(inode),
+ MAJOR(inode->i_sb->s_dev),
+ MINOR(inode->i_sb->s_dev),
+ inode->i_sb->s_id,
+ line);
}
static void inode_sync_complete(struct inode *inode)
-
Hi, Thank you very much too, for helping me. Applied. Had to kill klogd, since kernel generates tons of redirtied inode messages. Size of the kern.log is 863 KB, thus I I don't think it is polite to attachit. Don't know whenever it is ok to put it on pastebin too. Anyway, it shows lots of redirtied inode... messages, and while most of them are "at 08:02(sda2)" , my reiserfs root partition, some are "Oct 22 14:50:27 MAIN kernel: [ 73.643794] redirtied inode 2582 size 0 at 00:0f(tmpfs), line 300." " line 300" is shown always (I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup) Best regards, Maxim Levitsky -
Thank you for the testing out. Hmm, Maybe it's an reiserfs related issue. Do you have the full log file? Thank you, Fengguang -
This is not a new problem in 2.6.23-git17.
2.6.22/2.6.23 is buggy in the same way.
Reiserfs could leave newly created sub-page-size files in dirty state
for ever. They cannot be synced to disk by pdflush routines or
explicit `sync' commands. Only `umount' can do the trick.
The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
[<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
[<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
[<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
[<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
[<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
[<ffffffff802a187c>] __fput+0xcc/0x1b0
[<ffffffff802a1ba6>] fput+0x16/0x20
[<ffffffff8029e676>] filp_close+0x56/0x90
[<ffffffff8029fe0d>] sys_close+0xad/0x110
[<ffffffff8020c41e>] system_call+0x7e/0x83
Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.
=== for the patient ===
Here are more detailed demonstrations of the problem.
1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.
------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#
------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ___UD__Bd_ 2
[T2] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ...Thank you.
This fix is kind of crude - even when it fixed Maxim's problem, and
survived my stress testing of a lot of patching and kernel compiling.
I'd be glad to see better solutions.
Fengguang
---
reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file
This is not a new problem in 2.6.23-git17.
2.6.22/2.6.23 is buggy in the same way.
Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands. Only `umount' can do the trick.
The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
[<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
[<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
[<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
[<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
[<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
[<ffffffff802a187c>] __fput+0xcc/0x1b0
[<ffffffff802a1ba6>] fput+0x16/0x20
[<ffffffff8029e676>] filp_close+0x56/0x90
[<ffffffff8029fe0d>] sys_close+0xad/0x110
[<ffffffff8020c41e>] system_call+0x7e/0x83
Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.
=== for the patient ===
Here are more detailed demonstrations of the problem.
1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.
------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#
------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
...On Tue, 23 Oct 2007 19:56:20 +0800 This should be safe, reiserfs has the buffer heads themselves clean and the page should get cleaned eventually. The cancel_dirty_page call was just an optimization to be VM friendly. -chris -
'chris' as in fs/reiserfs/{inode.c,namei.c}, and now in btrfs/*?
Nice to meet you ;-)
Fengguang
-
One thing to say... Works perfectly! Big thanks for fixing that bug. Best regards, Maxim Levitsky -
And many thanks for your testing~ Fengguang -
