Hi, Here is the current content of the GFS2 -nmw git tree. Mostly its just clean up and bug fixes this time. There is one exception which is the "wait for journal id" patch which is a new feature aimed at (eventually) allowing us to simplify the userland support which GFS2 requires, Steve. --
From: Bob Peterson <rpeterso@redhat.com>
Function gfs2_write_alloc_required always returned zero as its
return code. Therefore, it doesn't need to return a return code
at all. Given that, we can use the return value to return whether
or not the dinode needs block allocations rather than passing
that value in, which in turn simplifies a bunch of error checking.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 9485a88..5e96cbd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -634,9 +634,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
}
}
- error = gfs2_write_alloc_required(ip, pos, len, &alloc_required);
- if (error)
- goto out_unlock;
+ alloc_required = gfs2_write_alloc_required(ip, pos, len);
if (alloc_required || gfs2_is_jdata(ip))
gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 84da64b..744c29e 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1244,13 +1244,12 @@ int gfs2_file_dealloc(struct gfs2_inode *ip)
* @ip: the file being written to
* @offset: the offset to write to
* @len: the number of bytes being written
- * @alloc_required: set to 1 if an alloc is required, 0 otherwise
*
- * Returns: errno
+ * Returns: 1 if an alloc is required, 0 otherwise
*/
int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
- unsigned int len, int *alloc_required)
+ unsigned int len)
{
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct buffer_head bh;
@@ -1258,26 +1257,23 @@ int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
u64 lblock, lblock_stop, size;
u64 end_of_file;
- *alloc_required = 0;
-
if (!len)
return 0;
if (gfs2_is_stuffed(ip)) {
if (offset + len >
sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode))
- *alloc_required = 1;
+ return 1;
...This looks like a big change, but in reality its only a single line of actual
code change, the rest is just moving a function to before its new caller.
The "try" flag for glocks is a rather subtle and delicate setting since it
requires that the state machine tries just hard enough to ensure that it has
a good chance of getting the requested lock, but no so hard that the
request can land up blocked behind another.
The patch adds in an additional check which will fail any queued try
locks if there is another request blocking the try lock request which
is not granted and compatible, nor in progress already. The check is made
only after all pending locks which may be granted have been granted.
I've checked this with the reproducer for the reported flock bug which
this is intended to fix, and it now passes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 0898f3e..717531d 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -328,6 +328,30 @@ static void gfs2_holder_wake(struct gfs2_holder *gh)
}
/**
+ * do_error - Something unexpected has happened during a lock request
+ *
+ */
+
+static inline void do_error(struct gfs2_glock *gl, const int ret)
+{
+ struct gfs2_holder *gh, *tmp;
+
+ list_for_each_entry_safe(gh, tmp, &gl->gl_holders, gh_list) {
+ if (test_bit(HIF_HOLDER, &gh->gh_iflags))
+ continue;
+ if (ret & LM_OUT_ERROR)
+ gh->gh_error = -EIO;
+ else if (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB))
+ gh->gh_error = GLR_TRYFAILED;
+ else
+ continue;
+ list_del_init(&gh->gh_list);
+ trace_gfs2_glock_queue(gh, 0);
+ gfs2_holder_wake(gh);
+ }
+}
+
+/**
* do_promote - promote as many requests as possible on the current queue
* @gl: The glock
*
@@ -375,36 +399,13 @@ restart:
}
if (gh->gh_list.prev == &gl->gl_holders)
return 1;
+ do_error(gl, 0);
break;
}
return 0;
}
/**
- * do_error - Something unexpected has happened during a lock ...This is a clean up of the code which deals with LM_FLAG_NOEXP
which aims to remove any possible race conditions by using
gl_spin to cover the gap between testing for the LM_FLAG_NOEXP
and the GL_FROZEN flag.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 2b3d8f8..9adf8f9 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1063,6 +1063,9 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
spin_lock(&gl->gl_spin);
add_to_queue(gh);
+ if ((LM_FLAG_NOEXP & gh->gh_flags) &&
+ test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))
+ set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
run_queue(gl, 1);
spin_unlock(&gl->gl_spin);
@@ -1320,6 +1323,36 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)
}
/**
+ * gfs2_should_freeze - Figure out if glock should be frozen
+ * @gl: The glock in question
+ *
+ * Glocks are not frozen if (a) the result of the dlm operation is
+ * an error, (b) the locking operation was an unlock operation or
+ * (c) if there is a "noexp" flagged request anywhere in the queue
+ *
+ * Returns: 1 if freezing should occur, 0 otherwise
+ */
+
+static int gfs2_should_freeze(const struct gfs2_glock *gl)
+{
+ const struct gfs2_holder *gh;
+
+ if (gl->gl_reply & ~LM_OUT_ST_MASK)
+ return 0;
+ if (gl->gl_target == LM_ST_UNLOCKED)
+ return 0;
+
+ list_for_each_entry(gh, &gl->gl_holders, gh_list) {
+ if (test_bit(HIF_HOLDER, &gh->gh_iflags))
+ continue;
+ if (LM_FLAG_NOEXP & gh->gh_flags)
+ return 0;
+ }
+
+ return 1;
+}
+
+/**
* gfs2_glock_complete - Callback used by locking
* @gl: Pointer to the glock
* @ret: The return value from the dlm
@@ -1329,18 +1362,17 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)
void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
{
struct lm_lockstruct *ls = &gl->gl_sbd->sd_lockstruct;
+
gl->gl_reply = ret;
+
if (unlikely(test_bit(DFL_BLOCK_LOCKS, &ls->ls_flags))) {
- struct gfs2_holder ...From: David Rientjes <rientjes@google.com>
The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable,
so remove __GFP_NOFAIL from their masks.
Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 6b48d7c..b9dd88a 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -955,7 +955,12 @@ static int dir_split_leaf(struct inode *inode, const struct qstr *name)
/* Change the pointers.
Don't bother distinguishing stuffed from non-stuffed.
This code is complicated enough already. */
- lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS | __GFP_NOFAIL);
+ lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS);
+ if (!lp) {
+ error = -ENOMEM;
+ goto fail_brelse;
+ }
+
/* Change the pointers */
for (x = 0; x < half_len; x++)
lp[x] = cpu_to_be64(bn);
@@ -1063,7 +1068,9 @@ static int dir_double_exhash(struct gfs2_inode *dip)
/* Allocate both the "from" and "to" buffers in one big chunk */
- buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS | __GFP_NOFAIL);
+ buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS);
+ if (!buf)
+ return -ENOMEM;
for (block = dip->i_disksize >> sdp->sd_hash_bsize_shift; block--;) {
error = gfs2_dir_read_data(dip, (char *)buf,
--
1.7.1.1
--
From: Abhijith Das <adas@redhat.com>
trunc_start() in bmap.c incorrectly uses sizeof(struct gfs2_inode) instead of
sizeof(struct gfs2_dinode).
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 744c29e..6f48280 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1040,7 +1040,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
goto out;
if (gfs2_is_stuffed(ip)) {
- u64 dsize = size + sizeof(struct gfs2_inode);
+ u64 dsize = size + sizeof(struct gfs2_dinode);
ip->i_disksize = size;
ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
--
1.7.1.1
--
This reverts commit b7dc2df5725fe7355fd76000ead7e39728e1b8a9.
The initial patch didn't quite work since it doesn't cover all
the possible routes by which the GLF_FROZEN flag might be set.
A revised fix is coming up in the next patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 717531d..2b3d8f8 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -707,18 +707,8 @@ static void glock_work_func(struct work_struct *work)
{
unsigned long delay = 0;
struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work);
- struct gfs2_holder *gh;
int drop_ref = 0;
- if (unlikely(test_bit(GLF_FROZEN, &gl->gl_flags))) {
- spin_lock(&gl->gl_spin);
- gh = find_first_waiter(gl);
- if (gh && (gh->gh_flags & LM_FLAG_NOEXP) &&
- test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))
- set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
- spin_unlock(&gl->gl_spin);
- }
-
if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
finish_xmote(gl, gl->gl_reply);
drop_ref = 1;
--
1.7.1.1
--
Use nobh_writepage rather than calling mpage_writepage directly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@lst.de> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 9f8b525..9485a88 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -136,10 +136,7 @@ static int gfs2_writeback_writepage(struct page *page, if (ret <= 0) return ret; - ret = mpage_writepage(page, gfs2_get_block_noalloc, wbc); - if (ret == -EAGAIN) - ret = block_write_full_page(page, gfs2_get_block_noalloc, wbc); - return ret; + return nobh_writepage(page, gfs2_get_block_noalloc, wbc); } /** -- 1.7.1.1 --
This patch implements a wait for the journal id in the case that it has
not been specified on the command line. This is to allow the future
removal of the mount.gfs2 helper. The journal id would instead be
directly communicated by gfs_controld to the file system. Here is a
comparison of the two systems:
Current:
1. mount calls mount.gfs2
2. mount.gfs2 connects to gfs_controld to retrieve the journal id
3. mount.gfs2 adds the journal id to the mount command line and calls
the mount system call
4. gfs_controld receives the status of the mount request via a uevent
Proposed:
1. mount calls the mount system call (no mount.gfs2 helper)
2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
about already
3. gfs_controld assigns a journal id to it via sysfs
4. the mount system call then completes as normal (sending a uevent
according to status)
The advantage of the proposed system is that it is completely backward
compatible with the current system both at the kernel and at the
userland levels. The "first" parameter can also be set the same way,
with the restriction that it must be set before the journal id is
assigned.
In addition, if mount becomes stuck waiting for a reply from
gfs_controld which never arrives, then it is killable and will abort the
mount gracefully.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b5d7363..8fcbce4 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -460,6 +460,7 @@ enum {
SDF_NOBARRIERS = 3,
SDF_NORECOVERY = 4,
SDF_DEMOTE = 5,
+ SDF_NOJOURNALID = 6,
};
#define GFS2_FSNAME_LEN 256
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 3593b3a..45a4a36 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -76,7 +76,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
sb->s_fs_info = sdp;
sdp->sd_vfs = sb;
-
+ set_bit(SDF_NOJOURNALID, &sdp->sd_flags);
gfs2_tune_init(&sdp->sd_tune);
...