Changes since take 2:
- fix a bug pointed by Jan, and updated the comments
journal_try_to_free_buffers() could race with jbd commit transaction when
the later is holding the buffer reference while waiting for the data buffer
to flush to disk. If the caller of journal_try_to_free_buffers() request
tries hard to release the buffers, it will treat the failure as error and return
back to the caller. We have seen the directo IO failed due to this race.
Some of the caller of releasepage() also expecting the buffer to be dropped
when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().
With this patch, if the caller is passing the GFP_KERNEL to indicating this
call could wait, in case of try_to_free_buffers() failed, let's waiting for
journal_commit_transaction() to finish commit the current committing transaction
, then try to free those buffers again with journal locked.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
---
fs/jbd/transaction.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++--
mm/filemap.c | 3 --
2 files changed, 56 insertions(+), 4 deletions(-)
Index: linux-2.6.26-rc3/fs/jbd/transaction.c
===================================================================
--- linux-2.6.26-rc3.orig/fs/jbd/transaction.c 2008-05-21 16:17:51.000000000 -0700
+++ linux-2.6.26-rc3/fs/jbd/transaction.c 2008-05-21 16:20:11.000000000 -0700
@@ -1648,12 +1648,40 @@ out:
return;
}
+/*
+ * journal_try_to_free_buffers() could race with journal_commit_transaction()
+ * The later might still hold the reference count to the buffers when inspecting
+ * them on t_syncdata_list or t_locked_list.
+ *
+ * Journal_try_to_free_buffers() will call this function to
+ * wait for the current transaction to finish syncing data buffers, before
+ * try to free that buffer.
+ *
+ * Called with journal->j_state_lock hold.
+ */
+static void journal_wait_for_transaction_sync_data(journal_t *journal)
+{
+ transaction_t *transaction = NULL;
+ tid_t tid;
+
+ transaction = journal->j_committing_transaction;
+
+ if (!transaction)
+ return;
+
+ tid = transaction->t_tid;
+ spin_unlock(&journal->j_state_lock);
+ log_wait_commit(journal, tid);
+ spin_lock(&journal->j_state_lock);
+}
/**
* int journal_try_to_free_buffers() - try to free page buffers.
* @journal: journal for operation
* @page: to try and free
- * @unused_gfp_mask: unused
+ * @gfp_mask: we use the mask to detect how hard should we try to release
+ * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
+ * release the buffers.
*
*
* For all the buffers on this page,
@@ -1682,9 +1710,11 @@ out:
* journal_try_to_free_buffer() is changing its state. But that
* cannot happen because we never reallocate freed data as metadata
* while the data is part of a transaction. Yes?
+ *
+ * Return 0 on failure, 1 on success
*/
int journal_try_to_free_buffers(journal_t *journal,
- struct page *page, gfp_t unused_gfp_mask)
+ struct page *page, gfp_t gfp_mask)
{
struct buffer_head *head;
struct buffer_head *bh;
@@ -1713,7 +1743,30 @@ int journal_try_to_free_buffers(journal_
if (buffer_jbd(bh))
goto busy;
} while ((bh = bh->b_this_page) != head);
+
ret = try_to_free_buffers(page);
+
+ /*
+ * There are a number of places where journal_try_to_free_buffers()
+ * could race with journal_commit_transaction(), the later still
+ * holds the reference to the buffers to free while processing them.
+ * try_to_free_buffers() failed to free those buffers. Some of the
+ * caller of releasepage() request page buffers to be dropped, otherwise
+ * treat the fail-to-free as errors (such as generic_file_direct_IO())
+ *
+ * So, if the caller of try_to_release_page() wants the synchronous
+ * behaviour(i.e make sure buffers are dropped upon return),
+ * let's wait for the current transaction to finish flush of
+ * dirty data buffers, then try to free those buffers again,
+ * with the journal locked.
+ */
+ if (ret == 0 && (gfp_mask & GFP_KERNEL == GFP_KERNEL)) {
+ spin_lock(&journal->j_state_lock);
+ journal_wait_for_transaction_sync_data(journal);
+ ret = try_to_free_buffers(page);
+ spin_unlock(&journal->j_state_lock);
+ }
+
busy:
return ret;
}
Index: linux-2.6.26-rc3/mm/filemap.c
===================================================================
--- linux-2.6.26-rc3.orig/mm/filemap.c 2008-05-21 16:17:51.000000000 -0700
+++ linux-2.6.26-rc3/mm/filemap.c 2008-05-21 16:17:58.000000000 -0700
@@ -2581,9 +2581,8 @@ out:
* Otherwise return zero.
*
* The @gfp_mask argument specifies whether I/O may be performed to release
- * this page (__GFP_IO), and whether the call may block (__GFP_WAIT).
+ * this page (__GFP_IO), and whether the call may block (__GFP_WAIT & __GFP_FS).
*
- * NOTE: @gfp_mask may go away, and this function may become non-blocking.
*/
int try_to_release_page(struct page *page, gfp_t gfp_mask)
{
--
| Linus Torvalds | Linux 2.6.27 |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 007/196] Chinese: add translation of stable_kernel_rules.txt |
git: | |
| Pierre Habouzit | Re: Decompression speed: zip vs lzo |
| Jakub Narebski | Re: VCS comparison table |
| Johannes Sixt | [PATCH 25/40] Windows: Implement a cpio emulation in git-clone.sh. |
| Jakub Narebski | Re: Git User's Survey 2007 unfinished summary continued |
| Steve B | Intel Atom and D945GCLF2 |
| Wijnand Wiersma | Almost success: OpenBSD on Xen |
| Theo de Raadt | Re: dmesg IBM x3650 OpenBSD 4.3 |
| Richard Stallman | Real men don't attack straw men |
| Hugh Dickins | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Jens Axboe | Re: [BUG] New Kernel Bugs |
| Denys Fedoryshchenko | thousands of classes, e1000 TX unit hang |
| Adrian Bunk | [RFC: 2.6 patch] remove ieee80211_tx_frame() |
| Shared swap partition | 8 hours ago | Linux general |
| high memory | 2 days ago | Linux kernel |
| semaphore access speed | 2 days ago | Applications and Utilities |
| the kernel how to power off the machine | 2 days ago | Linux kernel |
| Easter Eggs in windows XP | 2 days ago | Windows |
| Root password | 2 days ago | Linux general |
| Where/when DNOTIFY is used? | 2 days ago | Linux kernel |
| How to convert Linux Kernel built-in module into a loadable module | 2 days ago | Linux kernel |
| Linux 2.6.24 and I/O schedulers | 2 days ago | Linux kernel |
| USB Driver -- Interrupt Polling -- A Little Help Please | 2 days ago | Linux general |
