On Tue, 2008-05-13 at 16:54 +0200, Jan Kara wrote:Bummer, we can't free buffers in ext3_launder_page() before calling try_to_free_page, as later invalidate_complete_page2()->try_to_free_page() expecting the page buffers are still here, and will return EIO if it launder_page() has already freed those buffers.:( Doing wait_on_buffer() alone in launder_page() is not enough as it doesn't wait for buffer reference drop to 0. Guess this is the last it worth give a try to see how bad it is, here is a draft patch. Do you see any obvious problems? Mingming Signed-off-by: Mingming Cao <cmm@us.ibm.com> fs/ext3/inode.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) Index: linux-2.6.26-rc1/fs/ext3/inode.c =================================================================== --- linux-2.6.26-rc1.orig/fs/ext3/inode.c 2008-05-13 13:35:27.000000000 -0700 +++ linux-2.6.26-rc1/fs/ext3/inode.c 2008-05-13 14:33:55.000000000 -0700 @@ -1766,6 +1766,53 @@ static int ext3_journalled_set_page_dirt return __set_page_dirty_nobuffers(page); } +/* + * There are a number of places where journal_try_to_free_buffers() + * could race with journal_commit_transaction(), the later still + * helds the reference to the buffers on the t_syncdata_list or t_locked_list + * while journal_try_to_free_buffers() tries to free them, + * which resulting an EIO error returns back to the generic_file_direct_IO() + * + * It is also possible that mapped IO could re-dirty the page and buffers + * after direct IO has waited dirty pages have been flush to disk. + * When the journal_try_to_free_buffers() is called from the direct IO + * path, it may failed to free the buffers as the buffer may be locked again + * + * To fix this problem, we add a ext3_launder_page() callback which + * is only called on direct IO path, to wait for the buffer unlocked + * and waiting for the jbd finish commit those data buffers. + * + * This will impact the direct IO perf, but this alows direct IO and + * buffered IO could be performed concurrently. + * + * returns 0 in sucess case, and non zero in failure case. + */ +static int ext3_launder_page(struct page *page) +{ + int ret = 0; + struct buffer_head *head; + struct buffer_head *bh; + journal_t *journal = EXT3_JOURNAL(page->mapping->host); + + head = bh = page_buffers(page); + do { + /* + * it's possible mapped IO re-dirty and locked the + * buffer again when we came here + */ + if (buffer_locked(bh)) { + get_bh(bh); + wait_on_buffer(bh); + put_bh(bh); + } + + journal_wait_commit_buffer(journal, bh); + bh = bh->b_this_page; + } while (bh != head); + + return ret; +} + static const struct address_space_operations ext3_ordered_aops = { .readpage = ext3_readpage, .readpages = ext3_readpages, @@ -1778,6 +1825,7 @@ static const struct address_space_operat .releasepage = ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage = buffer_migrate_page, + .launder_page = ext3_launder_page, }; static const struct address_space_operations ext3_writeback_aops = { Index: linux-2.6.26-rc1/fs/jbd/journal.c =================================================================== --- linux-2.6.26-rc1.orig/fs/jbd/journal.c 2008-05-03 11:59:44.000000000 -0700 +++ linux-2.6.26-rc1/fs/jbd/journal.c 2008-05-13 14:43:47.000000000 -0700 @@ -74,6 +74,7 @@ EXPORT_SYMBOL(journal_errno); EXPORT_SYMBOL(journal_ack_err); EXPORT_SYMBOL(journal_clear_err); EXPORT_SYMBOL(log_wait_commit); +EXPORT_SYMBOL(journal_wait_commit_buffer); EXPORT_SYMBOL(journal_start_commit); EXPORT_SYMBOL(journal_force_commit_nested); EXPORT_SYMBOL(journal_wipe); @@ -558,6 +559,46 @@ int log_wait_commit(journal_t *journal, } /* + * Wait for a specific buffer has committed to the log + */ +int journal_wait_commit_buffer(journal_t *journal, struct buffer_head *bh) +{ + int ret = 0; + struct journal_head * jh; + transaction_t *t; + tid_t tid; + + jbd_lock_bh_state(bh); + spin_lock(&journal->j_list_lock); + jh = journal_grab_journal_head(bh); + + if (!jh) + goto unlock; + + t = jh->b_transaction; + + if (!t) + goto release; + + tid = t->t_tid; + + journal_put_journal_head(jh); + spin_unlock(&journal->j_list_lock); + jbd_unlock_bh_state(bh); + + log_start_commit(journal, tid); + ret = log_wait_commit(journal, tid); + return ret; + +release: + journal_put_journal_head(jh); +unlock: + spin_unlock(&journal->j_list_lock); + jbd_unlock_bh_state(bh); + return ret; +} + +/* * Log buffer allocation routines: */ Index: linux-2.6.26-rc1/include/linux/jbd.h =================================================================== --- linux-2.6.26-rc1.orig/include/linux/jbd.h 2008-05-03 11:59:44.000000000 -0700 +++ linux-2.6.26-rc1/include/linux/jbd.h 2008-05-13 14:42:45.000000000 -0700 @@ -974,6 +974,7 @@ int __log_start_commit(journal_t *journa int journal_start_commit(journal_t *journal, tid_t *tid); int journal_force_commit_nested(journal_t *journal); int log_wait_commit(journal_t *journal, tid_t tid); +int journal_wait_commit_buffer(journal_t *journal, struct buffer_head *bh); int log_do_checkpoint(journal_t *journal); void __log_wait_for_space(journal_t *journal); --
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Linus Torvalds | Linux 2.6.27-rc8 |
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Jakub Narebski | Re: VCS comparison table |
| Wincent Colaiuta | Re: [ANNOUNCE] GIT 1.5.4 |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Junio C Hamano | Re: More precise tag following |
| Nick Guenther | Re: Real men don't attack straw men |
| Raimo Niskanen | HP ProLiant DL140 G3 problems |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| R. Fumione | OpenBSD speed on desktops |
| Patrick McHardy | [NET_SCHED 03/15]: act_api: fix netlink API conversion bug |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Theodore Tso | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Linus Torvalds | Re: [GIT]: Networking |
