Hi, The following patches comprise the bulk of Ocfs2 updates for the 2.6.28 merge window. They can roughly be broken up into 4 sets which add incremental features to Ocfs2. The patches are presented as they come in git. EA Support The largest set adds support for extended attributes in Ocfs2. Extended attributes are stored both within the inode block, and externally, when their numbers grow. Individual attributes can be arbitrarily sized. Smaller ones have their data stored inline. Larger attributes grow out to a btree. In theory the btrees have similar limits to inode data. In practice though, the VFS limits EA sizes to 64K. When inode space for attributes run low, new ones are created in an external disk block. When the block fills up, external attributes are moved to an indexed btree. The btree can store many thousands of attributes, if needed. The patches leading up to EA support further abstracted portions of the Ocfs2 btree code. Ultimately, this means we can "add" a btree to any Ocfs2 structure by embedding a header, and providing the proper callbacks to manipulate certain key fields. The xattr code makes use of this, as will future Ocfs2 features. Joel made some further improvements to our 'generic' (for Ocfs2 at least) btree support which completed the interface by cleaning things up and providing for proper callbacks in a static operations structure. Those patches follow the xattr series as they were developed afterwards. JBD2 Support Ocfs2 can now use JBD2. Amongst other benefits, this allows us to support large block devices with more than 32 bits worth of block numbers. As a part of these patches, and 'inode64' mount option is added which toggles creation of inodes whose inode number requires more than 32 bits to be adequately described. JBD2 support in Ocfs2 is compiled in by default, however since journaling is so central to the operation of a file system, we kept our 'legacy' JBD support. We did this to provide a fallback for any users who ...
This is actually pretty easy since fs/dlm already handles the bulk of the
work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
underlying lock manager, so I only had to add the right calls.
Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
Internally, the file system uses two sets of file_operations, depending on
whether cluster aware plocks is required. This turns out to be easier than
implementing local-only versions of ->lock.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/file.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/file.h | 2 +
fs/ocfs2/inode.c | 15 ++++++++++++-
fs/ocfs2/locks.c | 15 ++++++++++++++
fs/ocfs2/locks.h | 1 +
fs/ocfs2/stack_user.c | 33 +++++++++++++++++++++++++++++++
fs/ocfs2/stackglue.c | 20 +++++++++++++++++++
fs/ocfs2/stackglue.h | 19 ++++++++++++++++++
8 files changed, 154 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index ec2ed15..60232b1 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2236,6 +2236,10 @@ const struct inode_operations ocfs2_special_file_iops = {
.permission = ocfs2_permission,
};
+/*
+ * Other than ->lock, keep ocfs2_fops and ocfs2_dops in sync with
+ * ocfs2_fops_no_plocks and ocfs2_dops_no_plocks!
+ */
const struct file_operations ocfs2_fops = {
.llseek = generic_file_llseek,
.read = do_sync_read,
@@ -2250,6 +2254,7 @@ const struct file_operations ocfs2_fops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = ocfs2_compat_ioctl,
#endif
+ .lock = ocfs2_lock,
.flock = ocfs2_flock,
.splice_read = ocfs2_file_splice_read,
.splice_write = ocfs2_file_splice_write,
@@ -2266,5 +2271,51 @@ const struct file_operations ocfs2_dops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = ocfs2_compat_ioctl,
#endif
+ .lock = ocfs2_lock,
+ .flock = ocfs2_flock,
+};
+
+/*
+ ...It's pointless doing !! on something which is already 0 or 1. --
Sure - the following patch is now on the 'merge_window' branch of ocfs2.git.
Also, thanks for all the review you did on these.
--Mark
--
Mark Fasheh
From: Mark Fasheh <mfasheh@suse.com>
ocfs2: Remove pointless !!
ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
one value.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/stackglue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 7150f5d..68b668b 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -290,7 +290,7 @@ EXPORT_SYMBOL_GPL(ocfs2_dlm_dump_lksb);
int ocfs2_stack_supports_plocks(void)
{
- return !!(active_stack && active_stack->sp_ops->plock);
+ return active_stack && active_stack->sp_ops->plock;
}
EXPORT_SYMBOL_GPL(ocfs2_stack_supports_plocks);
--
1.5.4.1
--
Do this instead of tracking absolute local alloc size. This avoids
needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
the value is now in a more natural unit for internal file system bitmap
work.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/localalloc.c | 34 ++++++++++++----------------------
fs/ocfs2/ocfs2.h | 10 +++++++++-
fs/ocfs2/super.c | 8 +++++---
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 28e492e..b05ce66 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -47,8 +47,6 @@
#define OCFS2_LOCAL_ALLOC(dinode) (&((dinode)->id2.i_lab))
-static inline int ocfs2_local_alloc_window_bits(struct ocfs2_super *osb);
-
static u32 ocfs2_local_alloc_count_bits(struct ocfs2_dinode *alloc);
static int ocfs2_local_alloc_find_clear_bits(struct ocfs2_super *osb,
@@ -75,21 +73,13 @@ static int ocfs2_local_alloc_new_window(struct ocfs2_super *osb,
static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
struct inode *local_alloc_inode);
-static inline int ocfs2_local_alloc_window_bits(struct ocfs2_super *osb)
-{
- BUG_ON(osb->s_clustersize_bits > 20);
-
- /* Size local alloc windows by the megabyte */
- return osb->local_alloc_size << (20 - osb->s_clustersize_bits);
-}
-
/*
* Tell us whether a given allocation should use the local alloc
* file. Otherwise, it has to go to the main bitmap.
*/
int ocfs2_alloc_should_use_local(struct ocfs2_super *osb, u64 bits)
{
- int la_bits = ocfs2_local_alloc_window_bits(osb);
+ int la_bits = osb->local_alloc_bits;
int ret = 0;
if (osb->local_alloc_state != OCFS2_LA_ENABLED)
@@ -120,14 +110,16 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
mlog_entry_void();
- if (osb->local_alloc_size == 0)
+ if (osb->local_alloc_bits == 0)
goto bail;
- if (ocfs2_local_alloc_window_bits(osb) >= osb->bitmap_cpg) {
+ if (osb->local_alloc_bits >= ...From: Tao Ma <tao.ma@oracle.com>
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
ocfs2_do_cluster_allocation() now, but the latter can be used for other
btree types as well.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 110 +++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/alloc.h | 17 +++++++
fs/ocfs2/aops.c | 8 ++--
fs/ocfs2/dir.c | 6 +-
fs/ocfs2/file.c | 136 +++++++++++-------------------------------------------
fs/ocfs2/file.h | 26 ++++------
fs/ocfs2/namei.c | 8 ++--
7 files changed, 176 insertions(+), 135 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 90cefc5..1332309 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -4302,6 +4302,116 @@ bail:
return status;
}
+/*
+ * Allcate and add clusters into the extent b-tree.
+ * The new clusters(clusters_to_add) will be inserted at logical_offset.
+ * The extent b-tree's root is root_el and it should be in root_bh, and
+ * it is not limited to the file storage. Any extent tree can use this
+ * function if it implements the proper ocfs2_extent_tree.
+ */
+int ocfs2_add_clusters_in_btree(struct ocfs2_super *osb,
+ struct inode *inode,
+ u32 *logical_offset,
+ u32 clusters_to_add,
+ int mark_unwritten,
+ struct buffer_head *root_bh,
+ struct ocfs2_extent_list *root_el,
+ handle_t *handle,
+ struct ocfs2_alloc_context *data_ac,
+ struct ocfs2_alloc_context *meta_ac,
+ enum ocfs2_alloc_restarted *reason_ret,
+ enum ocfs2_extent_tree_type type)
+{
+ int status = 0;
+ int free_extents;
+ enum ocfs2_alloc_restarted reason = RESTART_NONE;
+ u32 bit_off, num_bits;
+ u64 block;
+ u8 flags = 0;
+
+ BUG_ON(!clusters_to_add);
+
+ if (mark_unwritten)
+ flags = OCFS2_EXT_UNWRITTEN;
+
+ free_extents = ocfs2_num_free_extents(osb, inode, root_bh, ...From: Tao Ma <tao.ma@oracle.com>
Ocfs2 uses a very flexible structure for storing extended attributes on
disk. Small amount of attributes are stored directly in the inode block - up
to 256 bytes worth. If that fills up, attributes are also stored in an
external block, linked to from the inode block. That block can in turn
expand to a btree, capable of storing large numbers of attributes.
Individual attribute values are stored inline if they're small enough
(currently about 80 bytes, this can be changed though), and otherwise are
expanded to a btree. The theoretical limit to the size of an individual
attribute is about the same as an inode, though the kernel's upper bound on
the size of an attributes data is far smaller.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/ocfs2_fs.h | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 118 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 4f61985..1b46505 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -64,6 +64,7 @@
#define OCFS2_INODE_SIGNATURE "INODE01"
#define OCFS2_EXTENT_BLOCK_SIGNATURE "EXBLK01"
#define OCFS2_GROUP_DESC_SIGNATURE "GROUP01"
+#define OCFS2_XATTR_BLOCK_SIGNATURE "XATTR01"
/* Compatibility flags */
#define OCFS2_HAS_COMPAT_FEATURE(sb,mask) \
@@ -715,6 +716,123 @@ struct ocfs2_group_desc
/*40*/ __u8 bg_bitmap[0];
};
+/*
+ * On disk extended attribute structure for OCFS2.
+ */
+
+/*
+ * ocfs2_xattr_entry indicates one extend attribute.
+ *
+ * Note that it can be stored in inode, one block or one xattr bucket.
+ */
+struct ocfs2_xattr_entry {
+ __le32 xe_name_hash; /* hash value of xattr prefix+suffix. */
+ __le16 xe_name_offset; /* byte offset from the 1st etnry in the local
+ local xattr storage(inode, xattr block or
+ xattr bucket). */
+ __u8 xe_name_len; /* xattr name len, does't include prefix. */
+ __u8 xe_type; ...From: Tao Ma <tao.ma@oracle.com>
The old uptodate only handles the issue of removing one buffer_head from
ocfs2 inode's buffer cache. With xattr clusters, we may need to remove
multiple buffer_head's at a time.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/uptodate.c | 32 ++++++++++++++++++++++++++------
fs/ocfs2/uptodate.h | 3 +++
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/fs/ocfs2/uptodate.c b/fs/ocfs2/uptodate.c
index 4da8851..e26459e 100644
--- a/fs/ocfs2/uptodate.c
+++ b/fs/ocfs2/uptodate.c
@@ -511,14 +511,10 @@ static void ocfs2_remove_metadata_tree(struct ocfs2_caching_info *ci,
ci->ci_num_cached--;
}
-/* Called when we remove a chunk of metadata from an inode. We don't
- * bother reverting things to an inlined array in the case of a remove
- * which moves us back under the limit. */
-void ocfs2_remove_from_cache(struct inode *inode,
- struct buffer_head *bh)
+static void ocfs2_remove_block_from_cache(struct inode *inode,
+ sector_t block)
{
int index;
- sector_t block = bh->b_blocknr;
struct ocfs2_meta_cache_item *item = NULL;
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct ocfs2_caching_info *ci = &oi->ip_metadata_cache;
@@ -544,6 +540,30 @@ void ocfs2_remove_from_cache(struct inode *inode,
kmem_cache_free(ocfs2_uptodate_cachep, item);
}
+/*
+ * Called when we remove a chunk of metadata from an inode. We don't
+ * bother reverting things to an inlined array in the case of a remove
+ * which moves us back under the limit.
+ */
+void ocfs2_remove_from_cache(struct inode *inode,
+ struct buffer_head *bh)
+{
+ sector_t block = bh->b_blocknr;
+
+ ocfs2_remove_block_from_cache(inode, block);
+}
+
+/* Called when we remove xattr clusters from an inode. */
+void ocfs2_remove_xattr_clusters_from_cache(struct inode *inode,
+ sector_t block,
+ u32 c_len)
+{
+ u64 i, b_len = ocfs2_clusters_to_blocks(inode->i_sb, ...I really really hope that `i' and `b_len' didn't really need to be 64-bit here. --
Yeah, there's no way currently that any of those variables should get even
close to that large. I made them unsigned ints with the patch below.
--Mark
--
Mark Fasheh
From: Mark Fasheh <mfasheh@suse.com>
ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache
i and b_len don't really need to be u64's. Xattr extent lengths should be
limited by the VFS, and then the size of our on-disk length field.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/uptodate.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/uptodate.c b/fs/ocfs2/uptodate.c
index 5235140..187b99f 100644
--- a/fs/ocfs2/uptodate.c
+++ b/fs/ocfs2/uptodate.c
@@ -562,7 +562,7 @@ void ocfs2_remove_xattr_clusters_from_cache(struct inode *inode,
sector_t block,
u32 c_len)
{
- u64 i, b_len = ocfs2_clusters_to_blocks(inode->i_sb, 1) * c_len;
+ unsigned int i, b_len = ocfs2_clusters_to_blocks(inode->i_sb, 1) * c_len;
for (i = 0; i < b_len; i++, block++)
ocfs2_remove_block_from_cache(inode, block);
--
1.5.4.1
--
From: Tao Ma <tao.ma@oracle.com> In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/alloc.c | 508 +++++++++++++++++++++++++++++++++------------------ fs/ocfs2/alloc.h | 23 ++- fs/ocfs2/aops.c | 11 +- fs/ocfs2/dir.c | 7 +- fs/ocfs2/file.c | 104 ++--------- fs/ocfs2/file.h | 4 - fs/ocfs2/suballoc.c | 82 ++++++++ fs/ocfs2/suballoc.h | 5 + 8 files changed, 456 insertions(+), 288 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index dc844df..90cefc5 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -49,6 +49,143 @@ #include "buffer_head_io.h" +/* + * ocfs2_extent_tree and ocfs2_extent_tree_operations are used to abstract + * the b-tree operations in ocfs2. Now all the b-tree operations are not + * limited to ocfs2_dinode only. Any data which need to allocate clusters + * to store can use b-tree. And it only needs to implement its ...
From: Tao Ma <tao.ma@oracle.com>
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 3 ++-
fs/ocfs2/alloc.h | 12 +++++++++---
fs/ocfs2/aops.c | 3 ++-
fs/ocfs2/dir.c | 5 +++--
fs/ocfs2/file.c | 9 +++++----
fs/ocfs2/journal.h | 17 +++++++++++------
fs/ocfs2/suballoc.c | 4 ++--
fs/ocfs2/suballoc.h | 7 ++++++-
8 files changed, 40 insertions(+), 20 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index c74711f..dc844df 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -4536,7 +4536,8 @@ static int ocfs2_split_tree(struct inode *inode, struct buffer_head *di_bh,
} else
rightmost_el = path_leaf_el(path);
- credits += path->p_tree_depth + ocfs2_extend_meta_needed(di);
+ credits += path->p_tree_depth +
+ ocfs2_extend_meta_needed(&di->id2.i_list);
ret = ocfs2_extend_trans(handle, credits);
if (ret) {
mlog_errno(ret);
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 758dbda..249e79e 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -48,8 +48,14 @@ int ocfs2_remove_extent(struct inode *inode, struct buffer_head *di_bh,
int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct inode *inode,
struct buffer_head *bh);
-/* how many new metadata chunks would an allocation need at maximum? */
-static inline int ocfs2_extend_meta_needed(struct ocfs2_dinode *fe)
+/*
+ * how many new metadata chunks would an allocation need at maximum?
+ *
+ * Please note that the caller must make sure that root_el is the root
+ * of extent tree. So for ...From: Tao Ma <tao.ma@oracle.com> Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/Makefile | 3 +- fs/ocfs2/alloc.c | 184 +++++++++++++++++++++----- fs/ocfs2/alloc.h | 42 ++++-- fs/ocfs2/aops.c | 5 +- fs/ocfs2/cluster/masklog.c | 1 + fs/ocfs2/cluster/masklog.h | 1 + fs/ocfs2/dir.c | 11 +- fs/ocfs2/extent_map.c | 60 +++++++++ fs/ocfs2/extent_map.h | 3 + fs/ocfs2/file.c | 9 +- fs/ocfs2/suballoc.c | 5 +- fs/ocfs2/suballoc.h | 3 +- fs/ocfs2/xattr.c | 305 ++++++++++++++++++++++++++++++++++++++++++++ 13 files changed, 568 insertions(+), 64 deletions(-) create mode 100644 fs/ocfs2/xattr.c diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index f6956de..af63980 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -34,7 +34,8 @@ ocfs2-objs := \ symlink.o \ sysfile.o \ uptodate.o \ - ver.o + ver.o \ + xattr.o \ ocfs2_stackglue-objs ...
Thanks, luckily though, it seems these got fixed later in the series :) --Mark -- Mark Fasheh --
brelse(0) is legal. Please do an fs-wide review for this. It shouldn't affect code generation because brelse() is inlined. --
Ok, I'll queue that up. I'll watch to make sure that new patches from here Fair enough, thanks. --Mark -- Mark Fasheh --
From: Tao Ma <tao.ma@oracle.com>
When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
store large numbers of EAs. This patch adds a new type in
ocfs2_extent_tree_type and adds the implementation so that we can re-use the
b-tree code to handle the storage of many EAs.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/alloc.h | 10 ++++++
2 files changed, 99 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index d175db1..47cdea6 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -177,6 +177,48 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_et_ops = {
.sanity_check = ocfs2_xattr_value_sanity_check,
};
+static void ocfs2_xattr_tree_set_last_eb_blk(struct ocfs2_extent_tree *et,
+ u64 blkno)
+{
+ struct ocfs2_xattr_block *xb =
+ (struct ocfs2_xattr_block *) et->root_bh->b_data;
+ struct ocfs2_xattr_tree_root *xt = &xb->xb_attrs.xb_root;
+
+ xt->xt_last_eb_blk = cpu_to_le64(blkno);
+}
+
+static u64 ocfs2_xattr_tree_get_last_eb_blk(struct ocfs2_extent_tree *et)
+{
+ struct ocfs2_xattr_block *xb =
+ (struct ocfs2_xattr_block *) et->root_bh->b_data;
+ struct ocfs2_xattr_tree_root *xt = &xb->xb_attrs.xb_root;
+
+ return le64_to_cpu(xt->xt_last_eb_blk);
+}
+
+static void ocfs2_xattr_tree_update_clusters(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ u32 clusters)
+{
+ struct ocfs2_xattr_block *xb =
+ (struct ocfs2_xattr_block *)et->root_bh->b_data;
+
+ le32_add_cpu(&xb->xb_attrs.xb_root.xt_clusters, clusters);
+}
+
+static int ocfs2_xattr_tree_sanity_check(struct inode *inode,
+ struct ocfs2_extent_tree *et)
+{
+ return 0;
+}
+
+static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
+ .set_last_eb_blk = ocfs2_xattr_tree_set_last_eb_blk,
+ .get_last_eb_blk = ...From: Tiger Yang <tiger.yang@oracle.com>
Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 22 ++++++++++++++++------
fs/ocfs2/ocfs2.h | 1 +
fs/ocfs2/ocfs2_fs.h | 46 +++++++++++++++++++++++++++++++++++++++++++---
fs/ocfs2/super.c | 2 ++
4 files changed, 62 insertions(+), 9 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 130988f..d175db1 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6586,20 +6586,29 @@ out:
return ret;
}
-static void ocfs2_zero_dinode_id2(struct inode *inode, struct ocfs2_dinode *di)
+static void ocfs2_zero_dinode_id2_with_xattr(struct inode *inode,
+ struct ocfs2_dinode *di)
{
unsigned int blocksize = 1 << inode->i_sb->s_blocksize_bits;
+ unsigned int xattrsize = le16_to_cpu(di->i_xattr_inline_size);
- memset(&di->id2, 0, blocksize - offsetof(struct ocfs2_dinode, id2));
+ if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_XATTR_FL)
+ memset(&di->id2, 0, blocksize -
+ offsetof(struct ocfs2_dinode, id2) -
+ xattrsize);
+ else
+ memset(&di->id2, 0, blocksize -
+ offsetof(struct ocfs2_dinode, id2));
}
void ocfs2_dinode_new_extent_list(struct inode *inode,
struct ocfs2_dinode *di)
{
- ocfs2_zero_dinode_id2(inode, di);
+ ocfs2_zero_dinode_id2_with_xattr(inode, di);
di->id2.i_list.l_tree_depth = 0;
di->id2.i_list.l_next_free_rec = 0;
- di->id2.i_list.l_count = cpu_to_le16(ocfs2_extent_recs_per_inode(inode->i_sb));
+ di->id2.i_list.l_count = cpu_to_le16(
+ ocfs2_extent_recs_per_inode_with_xattr(inode->i_sb, di));
}
void ocfs2_set_inode_data_inline(struct inode *inode, struct ocfs2_dinode *di)
@@ -6616,9 +6625,10 @@ ...From: Tao Ma <tao.ma@oracle.com>
ocfs2_num_free_extents() is used to find the number of free extent records
in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
use this for extended attribute trees in the future, so genericize the
interface the take a buffer head. A future patch will allow that buffer_head
to contain any structure rooting an ocfs2 btree.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 3 ++-
fs/ocfs2/alloc.h | 2 +-
fs/ocfs2/aops.c | 5 +++--
fs/ocfs2/dir.c | 3 ++-
fs/ocfs2/file.c | 11 ++++++-----
fs/ocfs2/file.h | 2 +-
6 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 10bfb46..c74711f 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -368,12 +368,13 @@ struct ocfs2_merge_ctxt {
*/
int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct inode *inode,
- struct ocfs2_dinode *fe)
+ struct buffer_head *bh)
{
int retval;
struct ocfs2_extent_list *el;
struct ocfs2_extent_block *eb;
struct buffer_head *eb_bh = NULL;
+ struct ocfs2_dinode *fe = (struct ocfs2_dinode *)bh->b_data;
mlog_entry_void();
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 42ff94b..758dbda 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -47,7 +47,7 @@ int ocfs2_remove_extent(struct inode *inode, struct buffer_head *di_bh,
struct ocfs2_cached_dealloc_ctxt *dealloc);
int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct inode *inode,
- struct ocfs2_dinode *fe);
+ struct buffer_head *bh);
/* how many new metadata chunks would an allocation need at maximum? */
static inline int ocfs2_extend_meta_needed(struct ocfs2_dinode *fe)
{
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index a53da14..e2008dc 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1712,8 +1712,9 @@ int ocfs2_write_begin_nolock(struct address_space ...From: Tao Ma <tao.ma@oracle.com>
Add code to lookup a given extended attribute in the xattr btree. Lookup
follows this general scheme:
1. Use ocfs2_xattr_get_rec to find the xattr extent record
2. Find the xattr bucket within the extent which may contain this xattr
3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need
to recalcuate the block offset and name offset for the right position of
name/value.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/xattr.c | 351 ++++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 328 insertions(+), 23 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index fb17f7f..acccdfa 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -99,12 +99,25 @@ struct ocfs2_xattr_search {
*/
struct buffer_head *xattr_bh;
struct ocfs2_xattr_header *header;
+ struct ocfs2_xattr_bucket bucket;
void *base;
void *end;
struct ocfs2_xattr_entry *here;
int not_found;
};
+static int ocfs2_xattr_bucket_get_name_value(struct inode *inode,
+ struct ocfs2_xattr_header *xh,
+ int index,
+ int *block_off,
+ int *new_offset);
+
+static int ocfs2_xattr_index_block_find(struct inode *inode,
+ struct buffer_head *root_bh,
+ int name_index,
+ const char *name,
+ struct ocfs2_xattr_search *xs);
+
static int ocfs2_xattr_tree_list_index_block(struct inode *inode,
struct ocfs2_xattr_tree_root *xt,
char *buffer,
@@ -604,7 +617,7 @@ static int ocfs2_xattr_find_entry(int name_index,
}
static int ocfs2_xattr_get_value_outside(struct inode *inode,
- struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_value_root *xv,
void *buffer,
size_t len)
{
@@ -613,12 +626,8 @@ static int ocfs2_xattr_get_value_outside(struct inode *inode,
int i, ret = 0;
size_t cplen, blocksize;
struct buffer_head *bh = NULL;
- struct ...From: Tao Ma <tao.ma@oracle.com>
In inode removal, we need to iterate all the buckets, remove any
externally-stored EA values and delete the xattr buckets.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/xattr.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 80 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 5e8fae9..9ec7136 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -131,6 +131,9 @@ static int ocfs2_xattr_set_entry_index_block(struct inode *inode,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs);
+static int ocfs2_delete_xattr_index_block(struct inode *inode,
+ struct buffer_head *xb_bh);
+
static inline struct xattr_handler *ocfs2_xattr_handler(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -1511,13 +1514,14 @@ static int ocfs2_xattr_block_remove(struct inode *inode,
struct buffer_head *blk_bh)
{
struct ocfs2_xattr_block *xb;
- struct ocfs2_xattr_header *header;
int ret = 0;
xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
- header = &(xb->xb_attrs.xb_header);
-
- ret = ocfs2_remove_value_outside(inode, blk_bh, header);
+ if (!(le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED)) {
+ struct ocfs2_xattr_header *header = &(xb->xb_attrs.xb_header);
+ ret = ocfs2_remove_value_outside(inode, blk_bh, header);
+ } else
+ ret = ocfs2_delete_xattr_index_block(inode, blk_bh);
return ret;
}
@@ -4738,3 +4742,75 @@ out:
mlog_exit(ret);
return ret;
}
+
+static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket,
+ void *para)
+{
+ int ret = 0;
+ struct ocfs2_xattr_header *xh = bucket->xh;
+ u16 i;
+ struct ocfs2_xattr_entry *xe;
+
+ for (i = 0; i < le16_to_cpu(xh->xh_count); i++) {
+ xe = &xh->xh_entries[i];
+ if (ocfs2_xattr_is_local(xe))
+ continue;
+
+ ret = ...From: Tiger Yang <tiger.yang@oracle.com>
This patch adds the s_incompat flag for extended attribute support. This
helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
to mount a volume with xattr support.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/ocfs2.h | 7 +++++++
fs/ocfs2/ocfs2_fs.h | 19 +++++++++++++------
fs/ocfs2/super.c | 3 ++-
fs/ocfs2/xattr.c | 12 ++++++++++++
4 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index cae0dd4..6d3c10d 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -363,6 +363,13 @@ static inline int ocfs2_supports_inline_data(struct ocfs2_super *osb)
return 0;
}
+static inline int ocfs2_supports_xattr(struct ocfs2_super *osb)
+{
+ if (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_XATTR)
+ return 1;
+ return 0;
+}
+
/* set / clear functions because cluster events can make these happen
* in parallel so we want the transitions to be atomic. this also
* means that any future flags osb_flags must be protected by spinlock
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 8d5e72f..f24ce3d 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -91,7 +91,8 @@
| OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC \
| OCFS2_FEATURE_INCOMPAT_INLINE_DATA \
| OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP \
- | OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK)
+ | OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK \
+ | OCFS2_FEATURE_INCOMPAT_XATTR)
#define OCFS2_FEATURE_RO_COMPAT_SUPP OCFS2_FEATURE_RO_COMPAT_UNWRITTEN
/*
@@ -128,10 +129,6 @@
/* Support for data packed into inode blocks */
#define OCFS2_FEATURE_INCOMPAT_INLINE_DATA 0x0040
-/* Support for the extended slot map */
-#define OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP 0x100
-
-
/*
* Support for alternate, userspace cluster stacks. If set, the superblock
* field ...This patch fixes the following build warnings: fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket': fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket': fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/xattr.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index 090449f..1b349c7 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -3264,7 +3264,8 @@ static int ocfs2_half_xattr_bucket(struct inode *inode, xe = &xh->xh_entries[start]; len = sizeof(struct ocfs2_xattr_entry) * (count - start); mlog(0, "mv xattr entry len %d from %d to %d\n", len, - (char *)xe - (char *)xh, (char *)xh->xh_entries - (char *)xh); + (int)((char *)xe - (char *)xh), + (int)((char *)xh->xh_entries - (char *)xh)); memmove((char *)xh->xh_entries, (char *)xe, len); xe = &xh->xh_entries[count - start]; len = sizeof(struct ocfs2_xattr_entry) * start; @@ -4073,8 +4074,8 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode, u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb); struct ...
From: Tao Ma <tao.ma@oracle.com> Where the previous patches added the ability of list/get xattr in buckets for ocfs2, this patch enables ocfs2 to store large numbers of EAs. The original design doc is written by Mark Fasheh, and it can be found in http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to make small modifications to it. First, because the bucket size is 4K, a new field named xh_free_start is added in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket. It is used when we store new EA name/value. With this field, we can find the place more quickly and what's more, we don't need to sort the name/value every time to let the last entry indicate the next unused space. This makes the insert operation more efficient for blocksizes smaller than 4k. Because of the new xh_free_start, another field named as xh_name_value_len is also added in ocfs2_xattr_header. It records the total length of all the name/values in the bucket. We need this so that we can check it and defragment the bucket if there is not enough contiguous free space. An xattr insertion looks like this: 1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA. 2. check whether there is enough space in bucketA. If yes, insert it directly and modify xh_free_start and xh_name_value_len accordingly. If not, check xh_name_value_len to see whether we can store this by defragment the bucket. If yes, defragment it and go on insertion. 3. If defragement doesn't work, check whether there is new empty bucket in the clusters within this extent record. If yes, init the new bucket and move all the buckets after bucketA one by one to the next bucket. Move half of the entries in bucketA to the next bucket and go on insertion. 4. If there is no new bucket, grow the extent tree. As for xattr deletion, we will delete an xattr bucket when all it's xattrs are removed and move all the buckets after it to the previous one. When all the ...
From: Joel Becker <joel.becker@oracle.com>
The ocfs2_extent_tree_operations structure gains a field prefix on its
members. The ->eo_sanity_check() operation gains a wrapper function for
completeness. All of the extent tree operation wrappers gain a
consistent name (ocfs2_et_*()).
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 85 +++++++++++++++++++++++++++++------------------------
1 files changed, 46 insertions(+), 39 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 16879bd..9fe49f2 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -65,12 +65,13 @@
struct ocfs2_extent_tree;
struct ocfs2_extent_tree_operations {
- void (*set_last_eb_blk) (struct ocfs2_extent_tree *et, u64 blkno);
- u64 (*get_last_eb_blk) (struct ocfs2_extent_tree *et);
- void (*update_clusters) (struct inode *inode,
- struct ocfs2_extent_tree *et,
- u32 new_clusters);
- int (*sanity_check) (struct inode *inode, struct ocfs2_extent_tree *et);
+ void (*eo_set_last_eb_blk)(struct ocfs2_extent_tree *et,
+ u64 blkno);
+ u64 (*eo_get_last_eb_blk)(struct ocfs2_extent_tree *et);
+ void (*eo_update_clusters)(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ u32 new_clusters);
+ int (*eo_sanity_check)(struct inode *inode, struct ocfs2_extent_tree *et);
};
struct ocfs2_extent_tree {
@@ -132,10 +133,10 @@ static int ocfs2_dinode_sanity_check(struct inode *inode,
}
static struct ocfs2_extent_tree_operations ocfs2_dinode_et_ops = {
- .set_last_eb_blk = ocfs2_dinode_set_last_eb_blk,
- .get_last_eb_blk = ocfs2_dinode_get_last_eb_blk,
- .update_clusters = ocfs2_dinode_update_clusters,
- .sanity_check = ocfs2_dinode_sanity_check,
+ .eo_set_last_eb_blk = ocfs2_dinode_set_last_eb_blk,
+ .eo_get_last_eb_blk = ocfs2_dinode_get_last_eb_blk,
+ .eo_update_clusters = ocfs2_dinode_update_clusters,
+ .eo_sanity_check = ocfs2_dinode_sanity_check,
};
static void ...From: Tao Ma <tao.ma@oracle.com>
In xattr bucket, we want to limit the maximum size of a btree leaf,
otherwise we'll lose the benefits of hashing because we'll have to search
large leaves.
So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
record even if it is contiguous with an adjacent record.
Other btree types are not affected by this change.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 39 ++++++++++++++++++++++++++++++---------
fs/ocfs2/alloc.h | 5 +++++
2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 47cdea6..16879bd 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -79,6 +79,7 @@ struct ocfs2_extent_tree {
struct buffer_head *root_bh;
struct ocfs2_extent_list *root_el;
void *private;
+ unsigned int max_leaf_clusters;
};
static void ocfs2_dinode_set_last_eb_blk(struct ocfs2_extent_tree *et,
@@ -220,7 +221,8 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
};
static struct ocfs2_extent_tree*
- ocfs2_new_extent_tree(struct buffer_head *bh,
+ ocfs2_new_extent_tree(struct inode *inode,
+ struct buffer_head *bh,
enum ocfs2_extent_tree_type et_type,
void *private)
{
@@ -248,6 +250,8 @@ static struct ocfs2_extent_tree*
(struct ocfs2_xattr_block *)bh->b_data;
et->root_el = &xb->xb_attrs.xb_root.xt_list;
et->eops = &ocfs2_xattr_tree_et_ops;
+ et->max_leaf_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
+ OCFS2_MAX_XATTR_TREE_LEAF_SIZE);
}
return et;
@@ -4118,7 +4122,8 @@ out:
static void ocfs2_figure_contig_type(struct inode *inode,
struct ocfs2_insert_type *insert,
struct ocfs2_extent_list *el,
- struct ocfs2_extent_rec *insert_rec)
+ struct ocfs2_extent_rec *insert_rec,
+ ...From: Tao Ma <tao.ma@oracle.com>
Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets.
Attributes are stored within a given bucket, depending on hash value.
After a discussion with Mark, we decided that the per-bucket index
(xe_entry[]) would only exist in the 1st block of a bucket. Likewise,
name/value pairs will not straddle more than one block. This allows the
majority of operations to work directly on the buffer heads in a leaf block.
This patch adds code to iterate the buckets in an EA. A new abstration of
ocfs2_xattr_bucket is added. It records the bhs in this bucket and
ocfs2_xattr_header. This keeps the code neat, improving readibility.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/ocfs2_fs.h | 35 +++++++-
fs/ocfs2/xattr.c | 255 ++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/ocfs2/xattr.h | 9 ++
3 files changed, 293 insertions(+), 6 deletions(-)
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 98e1f8b..8d5e72f 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -755,8 +755,13 @@ struct ocfs2_xattr_header {
__le16 xh_count; /* contains the count of how
many records are in the
local xattr storage. */
- __le16 xh_reserved1;
- __le32 xh_reserved2;
+ __le16 xh_free_start; /* current offset for storing
+ xattr. */
+ __le16 xh_name_value_len; /* total length of name/value
+ length in this bucket. */
+ __le16 xh_num_buckets; /* bucket nums in one extent
+ record, only valid in the
+ first bucket. */
__le64 xh_csum;
struct ocfs2_xattr_entry xh_entries[0]; /* xattr entry list. */
};
@@ -793,6 +798,10 @@ struct ocfs2_xattr_tree_root {
#define OCFS2_XATTR_SIZE(size) (((size) + OCFS2_XATTR_ROUND) & \
~(OCFS2_XATTR_ROUND))
+#define OCFS2_XATTR_BUCKET_SIZE 4096
+#define OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET ...From: Tiger Yang <tiger.yang@oracle.com>
This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/Makefile | 2 +
fs/ocfs2/file.c | 5 +
fs/ocfs2/inode.c | 8 +
fs/ocfs2/inode.h | 3 +
fs/ocfs2/journal.h | 10 +
fs/ocfs2/namei.c | 5 +
fs/ocfs2/ocfs2.h | 2 +
fs/ocfs2/ocfs2_fs.h | 8 +-
fs/ocfs2/suballoc.c | 17 +-
fs/ocfs2/suballoc.h | 3 +
fs/ocfs2/super.c | 14 +
fs/ocfs2/symlink.c | 9 +
fs/ocfs2/xattr.c | 1620 ++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/xattr.h | 51 ++
fs/ocfs2/xattr_trusted.c | 82 +++
fs/ocfs2/xattr_user.c | 94 +++
16 files changed, 1927 insertions(+), 6 deletions(-)
create mode 100644 fs/ocfs2/xattr.h
create mode 100644 fs/ocfs2/xattr_trusted.c
create mode 100644 fs/ocfs2/xattr_user.c
diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index af63980..21323da 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -36,6 +36,8 @@ ocfs2-objs := \
uptodate.o \
ver.o \
xattr.o \
+ xattr_user.o \
+ xattr_trusted.o
ocfs2_stackglue-objs := stackglue.o
ocfs2_stack_o2cb-objs := stack_o2cb.o
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 4dc5edf..7ddb363 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -55,6 +55,7 @@
#include "mmap.h"
#include "suballoc.h"
#include "super.h"
+#include "xattr.h"
#include "buffer_head_io.h"
@@ -2070,6 +2071,10 @@ const struct inode_operations ocfs2_file_iops = {
.setattr = ocfs2_setattr,
.getattr = ocfs2_getattr,
.permission = ocfs2_permission,
+ .setxattr = ...Is there a documentation update for these? --
There is now :) I'm actually usually a bit of a stickler for those too, but obviously I missed this one. --Mark -- Mark Fasheh From: Mark Fasheh <mfasheh@suse.com> ocfs2: Documentation update for user_xattr / nouser_xattr mount options Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- Documentation/filesystems/ocfs2.txt | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 6acf1b4..4340cc8 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -80,3 +80,5 @@ inode64 Indicates that Ocfs2 is allowed to create inodes at any location in the filesystem, including those which will result in inode numbers occupying more than 32 bits of significance. +user_xattr (*) Enables Extended User Attributes. +nouser_xattr Disables Extended User Attributes. -- 1.5.4.1 --
Please don't split this up, it's always been a really stupid idea in extN. The only difference between secure, trusted and user attrs is that they go into a different namespace bit (and have different permission checking, but that's handled in the VFS). I have some upcoming patches to store a fs private flag in struct xattr_handler so that even those flags wrappers can go away, and each of the namespaces will just be five lines of code for the xattr_handler You seem to need the handler mostly for getting back to the prefix from the handler. This is a pretty clear indicator that you don't want to use the xattr_handler splitting but deal with the whole attr name. Take a look at the btrfs code after my recent xattr changes And I think there's far too much inlining going on in here.. --
Ok. The following patch (in ocfs2.git now) removes those two files, and puts the code for user and trusted xattrs at the bottom of xattr.c. Is that Yep, I went ahead and un-inlined that function. Thanks for the review, --Mark -- Mark Fasheh From: Mark Fasheh <mfasheh@suse.com> ocfs2: Move trusted and user attribute support into xattr.c Per Christoph Hellwig's suggestion - don't split these up. It's not like we gained much by having the two tiny files around. Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/Makefile | 4 +- fs/ocfs2/xattr.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/xattr_trusted.c | 82 ---------------------------------- fs/ocfs2/xattr_user.c | 94 --------------------------------------- 4 files changed, 111 insertions(+), 179 deletions(-) delete mode 100644 fs/ocfs2/xattr_trusted.c delete mode 100644 fs/ocfs2/xattr_user.c diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index 21323da..589dcdf 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -35,9 +35,7 @@ ocfs2-objs := \ sysfile.o \ uptodate.o \ ver.o \ - xattr.o \ - xattr_user.o \ - xattr_trusted.o + xattr.o ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index e21a1a8..0f556b0 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -37,6 +37,9 @@ #include <linux/writeback.h> #include <linux/falloc.h> #include <linux/sort.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/string.h> #define MLOG_MASK_PREFIX ML_XATTR #include <cluster/masklog.h> @@ -4740,3 +4743,110 @@ static int ocfs2_delete_xattr_index_block(struct inode *inode, out: return ret; } + +/* + * 'trusted' attributes support + */ + +#define XATTR_TRUSTED_PREFIX "trusted." + +static size_t ocfs2_xattr_trusted_list(struct inode *inode, char *list, + size_t list_size, const char *name, + ...
I have looked the patch for btrfs about this. We are different. Btrfs store the whole xattr name including the prefix "user." "trusted.", we store index number instead of it. regards, tiger --
In which case you shouldn't need to look the handler up anyway. I'll re-review the code once you post the next version. --
I looked at the git tree and there are two users of
ocfs2_xattr_handler().
(1) for using the ->list handler in listattr. That's something I fixed
in btrfs that I wanted to point you to. The whole concept of a
->list handler is stupid, and it was only added as a hack for
the tmpfs "generic" xattr support which is a mess. Instead of
looking up a handler that would only do the same thing anyway
for all on-disk attributes just call the code directly and
have a map from index to prefix (look at
fs/xfs/linux-2.6/xfs_xattr.c for an example). You
also have a check for OCFS2_MOUNT_NOUSERXATTR for the user
attributes, but that's much easier done by just checking the
index in an if (and I'd personally just kill it completely, the
options doesn't seem useful - but that's an unrelated bit)
(2) For generating the hash. I don't quite understand why you want to
also hash the prefix if it's not store on disk anyway but sorted
into the numeric buckets.
--
yes, you are right. The handler for list is borrowed from ext3 and somewhat ugly. We just need the prefix name but use such a complicated This is done intentionally. See the design doc http://oss.oracle.com/osswiki/OCFS2/DesignDocs/ExtendedAttributes. "Each entry has a 32-bit hash value associated with it. The hash value is calculated using the full (prefix.suffix) name of the xattr to avoid hash collisions when the same suffix is used in multiple attribute namespaces. " So Mark, do you think we need this prefix hash? Anyway, if we make consensus that the hash calculation doesn't need prefix any more, we can remove the ocfs2_xattr_handler safely. Regards, Tao --
Removing the prefix hash should be fine. Technically, this changes the disk format, but nobody should be using this for production yet anyway. --Mark -- Mark Fasheh --
A per-mount debugfs file, "local_alloc" is created which when read will
expose live state of the nodes local alloc file. Performance impact is
minimal, only a bit of memory overhead per mount point. Still, the code is
hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
local alloc performance problems on a live system.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/localalloc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/ocfs2.h | 5 +++
2 files changed, 92 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index f71658a..b889f10 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -28,6 +28,7 @@
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/bitops.h>
+#include <linux/debugfs.h>
#define MLOG_MASK_PREFIX ML_DISK_ALLOC
#include <cluster/masklog.h>
@@ -73,6 +74,85 @@ static int ocfs2_local_alloc_new_window(struct ocfs2_super *osb,
static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
struct inode *local_alloc_inode);
+#ifdef CONFIG_OCFS2_FS_STATS
+
+DEFINE_MUTEX(la_debug_mutex);
+
+static int ocfs2_la_debug_open(struct inode *inode, struct file *file)
+{
+ file->private_data = inode->i_private;
+ return 0;
+}
+
+#define LA_DEBUG_BUF_SZ PAGE_CACHE_SIZE
+#define LA_DEBUG_VER 1
+static ssize_t ocfs2_la_debug_read(struct file *file, char __user *userbuf,
+ size_t count, loff_t *ppos)
+{
+ struct ocfs2_super *osb = file->private_data;
+ int written, ret;
+ char *buf = osb->local_alloc_debug_buf;
+
+ mutex_lock(&la_debug_mutex);
+ memset(buf, 0, LA_DEBUG_BUF_SZ);
+
+ written = snprintf(buf, LA_DEBUG_BUF_SZ,
+ "0x%x\t0x%llx\t%u\t%u\t0x%x\n",
+ LA_DEBUG_VER,
+ (unsigned long long)osb->la_last_gd,
+ osb->local_alloc_default_bits,
+ osb->local_alloc_bits, osb->local_alloc_state);
+
+ ret = simple_read_from_buffer(userbuf, count, ppos, buf, ...Thanks, fixed in 'merge_window' branch of ocfs2.git.
--Mark
--
Mark Fasheh
From: Mark Fasheh <mfasheh@suse.com>
ocfs2: make la_debug_mutex static
It can also be moved into ocfs2_la_debug_read().
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/localalloc.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 02227c3..b1c634d 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -76,8 +76,6 @@ static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
#ifdef CONFIG_OCFS2_FS_STATS
-DEFINE_MUTEX(la_debug_mutex);
-
static int ocfs2_la_debug_open(struct inode *inode, struct file *file)
{
file->private_data = inode->i_private;
@@ -89,6 +87,7 @@ static int ocfs2_la_debug_open(struct inode *inode, struct file *file)
static ssize_t ocfs2_la_debug_read(struct file *file, char __user *userbuf,
size_t count, loff_t *ppos)
{
+ static DEFINE_MUTEX(la_debug_mutex);
struct ocfs2_super *osb = file->private_data;
int written, ret;
char *buf = osb->local_alloc_debug_buf;
--
1.5.4.1
--
Ocfs2's local allocator disables itself for the duration of a mount point
when it has trouble allocating a large enough area from the primary bitmap.
That can cause performance problems, especially for disks which were only
temporarily full or fragmented. This patch allows for the allocator to
shrink it's window first, before being disabled. Later, it can also be
re-enabled so that any performance drop is minimized.
To do this, we allow the value of osb->local_alloc_bits to be shrunk when
needed. The default value is recorded in a mostly read-only variable so that
we can re-initialize when required.
Locking had to be updated so that we could protect changes to
local_alloc_bits. Mostly this involves protecting various local alloc values
with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
is used when the local allocator is has shrunk, but is not disabled. If the
available space dips below 1 megabyte, the local alloc file is disabled. In
either case, local alloc is re-enabled 30 seconds after the event, or when
an appropriate amount of bits is seen in the primary bitmap.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/localalloc.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++---
fs/ocfs2/localalloc.h | 4 +
fs/ocfs2/ocfs2.h | 23 +++++-
fs/ocfs2/suballoc.c | 31 ++++----
fs/ocfs2/suballoc.h | 1 +
fs/ocfs2/super.c | 4 +-
6 files changed, 230 insertions(+), 31 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index b05ce66..f71658a 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -73,16 +73,51 @@ static int ocfs2_local_alloc_new_window(struct ocfs2_super *osb,
static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
struct inode *local_alloc_inode);
+static inline int ocfs2_la_state_enabled(struct ocfs2_super *osb)
+{
+ return (osb->local_alloc_state == OCFS2_LA_THROTTLED ||
+ osb->local_alloc_state == OCFS2_LA_ENABLED);
+}
+
+void ...cacnel_delayed_work() is a pretty risky function. The work handler (ocfs2_la_enable_worker) can execute an arbitrarily long time after cancel_delayed_work() has returned. Can all the code here cope with such a surprise alteration of ->local_alloc_state()? And you canot use cancel_delayed_work_sync() here due to a deadlock on ->osb_lock(). --
From: Joel Becker <joel.becker@oracle.com>
Provide an optional extent_tree_operation to specify the
max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value
is 0 (unlimited).
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 0b900f6..7c0721d 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -76,6 +76,8 @@ struct ocfs2_extent_tree_operations {
/* These are internal to ocfs2_extent_tree and don't have
* accessor functions */
void (*eo_fill_root_el)(struct ocfs2_extent_tree *et);
+ void (*eo_fill_max_leaf_clusters)(struct inode *inode,
+ struct ocfs2_extent_tree *et);
};
struct ocfs2_extent_tree {
@@ -205,6 +207,14 @@ static void ocfs2_xattr_tree_fill_root_el(struct ocfs2_extent_tree *et)
et->et_root_el = &xb->xb_attrs.xb_root.xt_list;
}
+static void ocfs2_xattr_tree_fill_max_leaf_clusters(struct inode *inode,
+ struct ocfs2_extent_tree *et)
+{
+ et->et_max_leaf_clusters =
+ ocfs2_clusters_for_bytes(inode->i_sb,
+ OCFS2_MAX_XATTR_TREE_LEAF_SIZE);
+}
+
static void ocfs2_xattr_tree_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
@@ -243,6 +253,7 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
.eo_update_clusters = ocfs2_xattr_tree_update_clusters,
.eo_sanity_check = ocfs2_xattr_tree_sanity_check,
.eo_fill_root_el = ocfs2_xattr_tree_fill_root_el,
+ .eo_fill_max_leaf_clusters = ocfs2_xattr_tree_fill_max_leaf_clusters,
};
static void ocfs2_get_extent_tree(struct ocfs2_extent_tree *et,
@@ -254,7 +265,6 @@ static void ocfs2_get_extent_tree(struct ocfs2_extent_tree *et,
et->et_type = et_type;
get_bh(bh);
et->et_root_bh = bh;
- et->et_max_leaf_clusters = 0;
if (!obj)
obj = (void *)bh->b_data;
et->et_object = obj;
@@ -265,11 +275,13 @@ ...From: Joel Becker <joel.becker@oracle.com>
A couple places check an extent_tree for a valid inode. We move that
out to add an eo_insert_check() operation. It can be called from
ocfs2_insert_extent() and elsewhere.
We also have the wrapper calls ocfs2_et_insert_check() and
ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to
provide useless operations for xattr types.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 69 ++++++++++++++++++++++++++++++++++-------------------
1 files changed, 44 insertions(+), 25 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 243bacf..2083c2c 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -71,6 +71,9 @@ struct ocfs2_extent_tree_operations {
void (*eo_update_clusters)(struct inode *inode,
struct ocfs2_extent_tree *et,
u32 new_clusters);
+ int (*eo_insert_check)(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ struct ocfs2_extent_rec *rec);
int (*eo_sanity_check)(struct inode *inode, struct ocfs2_extent_tree *et);
/* These are internal to ocfs2_extent_tree and don't have
@@ -125,6 +128,25 @@ static void ocfs2_dinode_update_clusters(struct inode *inode,
spin_unlock(&OCFS2_I(inode)->ip_lock);
}
+static int ocfs2_dinode_insert_check(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ struct ocfs2_extent_rec *rec)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+ BUG_ON(OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL);
+ mlog_bug_on_msg(!ocfs2_sparse_alloc(osb) &&
+ (OCFS2_I(inode)->ip_clusters != rec->e_cpos),
+ "Device %s, asking for sparse allocation: inode %llu, "
+ "cpos %u, clusters %u\n",
+ osb->dev_str,
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ rec->e_cpos,
+ OCFS2_I(inode)->ip_clusters);
+
+ return 0;
+}
+
static int ocfs2_dinode_sanity_check(struct inode *inode,
struct ...From: Joel Becker <joel.becker@oracle.com> We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/alloc.c | 300 +++++++++++++++----------------------------------- fs/ocfs2/alloc.h | 111 +++++++++++--------- fs/ocfs2/aops.c | 16 ++- fs/ocfs2/dir.c | 20 ++-- fs/ocfs2/file.c | 36 ++++--- fs/ocfs2/suballoc.c | 12 +-- fs/ocfs2/suballoc.h | 6 +- fs/ocfs2/xattr.c | 71 +++++++------ 8 files changed, 240 insertions(+), 332 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index 2083c2c..d196d40 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -49,20 +49,6 @@ #include "buffer_head_io.h" -/* - * ocfs2_extent_tree and ocfs2_extent_tree_operations are used to abstract - * the b-tree operations in ocfs2. Now all the b-tree operations are not - * limited to ocfs2_dinode only. Any data which need to allocate clusters - * to store can use b-tree. And it only needs to implement its ocfs2_extent_tree - * and operation. - * - * ocfs2_extent_tree contains info for the root of the b-tree, it must have a - * root ocfs2_extent_list and a root_bh so that they can be used ...
From: Joel Becker <joel.becker@oracle.com>
struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2. Describing what those methods do is probably a
good idea.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index d196d40..51c3183 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -50,21 +50,62 @@
#include "buffer_head_io.h"
+/*
+ * Operations for a specific extent tree type.
+ *
+ * To implement an on-disk btree (extent tree) type in ocfs2, add
+ * an ocfs2_extent_tree_operations structure and the matching
+ * ocfs2_get_<thingy>_extent_tree() function. That's pretty much it
+ * for the allocation portion of the extent tree.
+ */
struct ocfs2_extent_tree_operations {
+ /*
+ * last_eb_blk is the block number of the right most leaf extent
+ * block. Most on-disk structures containing an extent tree store
+ * this value for fast access. The ->eo_set_last_eb_blk() and
+ * ->eo_get_last_eb_blk() operations access this value. They are
+ * both required.
+ */
void (*eo_set_last_eb_blk)(struct ocfs2_extent_tree *et,
u64 blkno);
u64 (*eo_get_last_eb_blk)(struct ocfs2_extent_tree *et);
+
+ /*
+ * The on-disk structure usually keeps track of how many total
+ * clusters are stored in this extent tree. This function updates
+ * that value. new_clusters is the delta, and must be
+ * added to the total. Required.
+ */
void (*eo_update_clusters)(struct inode *inode,
struct ocfs2_extent_tree *et,
u32 new_clusters);
+
+ /*
+ * If ->eo_insert_check() exists, it is called before rec is
+ * inserted into the extent tree. It is optional.
+ */
int (*eo_insert_check)(struct inode *inode,
struct ocfs2_extent_tree *et,
struct ...From: Joel Becker <joel.becker@oracle.com>
A caller knows what kind of extent tree they have. There's no reason
they have to call ocfs2_get_extent_tree() with a NULL when they could
just as easily call a specific function to their type of extent tree.
Introduce ocfs2_dinode_get_extent_tree(),
ocfs2_xattr_tree_get_extent_tree(), and
ocfs2_xattr_value_get_extent_tree(). They only take the necessary
arguments, calling into the underlying __ocfs2_get_extent_tree() to do
the real work.
__ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
without needing any switch-by-type logic.
ocfs2_get_extent_tree() is now a wrapper around the specific calls. It
exists because a couple alloc.c functions can take et_type. This will
go later.
Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
struct ocfs2_xattr_value_root* instead of void*. This gives us
typechecking where we didn't have it before.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 76 +++++++++++++++++++++++++++++++++++++++---------------
fs/ocfs2/alloc.h | 2 +-
2 files changed, 56 insertions(+), 22 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 7c0721d..243bacf 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -192,7 +192,7 @@ static int ocfs2_xattr_value_sanity_check(struct inode *inode,
return 0;
}
-static struct ocfs2_extent_tree_operations ocfs2_xattr_et_ops = {
+static struct ocfs2_extent_tree_operations ocfs2_xattr_value_et_ops = {
.eo_set_last_eb_blk = ocfs2_xattr_value_set_last_eb_blk,
.eo_get_last_eb_blk = ocfs2_xattr_value_get_last_eb_blk,
.eo_update_clusters = ocfs2_xattr_value_update_clusters,
@@ -256,27 +256,21 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
.eo_fill_max_leaf_clusters = ocfs2_xattr_tree_fill_max_leaf_clusters,
};
-static void ocfs2_get_extent_tree(struct ocfs2_extent_tree *et,
- struct inode ...From: Joel Becker <joel.becker@oracle.com>
The root_el of an ocfs2_extent_tree needs to be calculated from
et->et_object. Make it an operation on et->et_ops.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 38 ++++++++++++++++++++++++++++++--------
1 files changed, 30 insertions(+), 8 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 93f44f4..fb6ae67 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -72,6 +72,10 @@ struct ocfs2_extent_tree_operations {
struct ocfs2_extent_tree *et,
u32 new_clusters);
int (*eo_sanity_check)(struct inode *inode, struct ocfs2_extent_tree *et);
+
+ /* These are internal to ocfs2_extent_tree and don't have
+ * accessor functions */
+ void (*eo_fill_root_el)(struct ocfs2_extent_tree *et);
};
struct ocfs2_extent_tree {
@@ -83,6 +87,13 @@ struct ocfs2_extent_tree {
unsigned int et_max_leaf_clusters;
};
+static void ocfs2_dinode_fill_root_el(struct ocfs2_extent_tree *et)
+{
+ struct ocfs2_dinode *di = et->et_object;
+
+ et->et_root_el = &di->id2.i_list;
+}
+
static void ocfs2_dinode_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
@@ -136,8 +147,16 @@ static struct ocfs2_extent_tree_operations ocfs2_dinode_et_ops = {
.eo_get_last_eb_blk = ocfs2_dinode_get_last_eb_blk,
.eo_update_clusters = ocfs2_dinode_update_clusters,
.eo_sanity_check = ocfs2_dinode_sanity_check,
+ .eo_fill_root_el = ocfs2_dinode_fill_root_el,
};
+static void ocfs2_xattr_value_fill_root_el(struct ocfs2_extent_tree *et)
+{
+ struct ocfs2_xattr_value_root *xv = et->et_object;
+
+ et->et_root_el = &xv->xr_list;
+}
+
static void ocfs2_xattr_value_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
@@ -176,8 +195,16 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_et_ops = {
.eo_get_last_eb_blk = ocfs2_xattr_value_get_last_eb_blk,
.eo_update_clusters = ...From: Joel Becker <joel.becker@oracle.com> ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/Kconfig | 40 +++++++++++++-------- fs/ocfs2/alloc.c | 28 ++++++--------- fs/ocfs2/aops.c | 21 ++++++++--- fs/ocfs2/file.c | 14 +++++-- fs/ocfs2/inode.c | 5 +++ fs/ocfs2/inode.h | 1 + fs/ocfs2/journal.c | 72 ++++++++++++++++++++------------------ fs/ocfs2/journal.h | 25 +++++++++++-- fs/ocfs2/ocfs2.h | 7 +++- fs/ocfs2/ocfs2_jbd_compat.h | 82 +++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/super.c | 10 +++-- fs/ocfs2/uptodate.c | 6 +++- 12 files changed, 227 insertions(+), 84 deletions(-) create mode 100644 fs/ocfs2/ocfs2_jbd_compat.h diff --git a/fs/Kconfig b/fs/Kconfig index abccb5d..e651a36 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -206,17 +206,16 @@ config JBD tristate help This is a generic journalling layer for block devices. It is - currently used by the ext3 and OCFS2 file systems, but it could - also be used to add journal support to other file systems or block + currently used by the ext3 file system, but it could also be + used to add ...
From: Joel Becker <joel.becker@oracle.com>
ocfs2_num_free_extents() re-implements the logic of
ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not
allocate, let's use it in ocfs2_num_free_extents() to simplify the code.
The inode validation code in ocfs2_num_free_extents() is not needed.
All callers are passing in pre-validated inodes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 30 +++++-------------------------
1 files changed, 5 insertions(+), 25 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index fb6ae67..0b900f6 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -618,34 +618,13 @@ int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct ocfs2_extent_block *eb;
struct buffer_head *eb_bh = NULL;
u64 last_eb_blk = 0;
+ struct ocfs2_extent_tree et;
mlog_entry_void();
- if (type == OCFS2_DINODE_EXTENT) {
- struct ocfs2_dinode *fe =
- (struct ocfs2_dinode *)root_bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(inode->i_sb, fe);
- retval = -EIO;
- goto bail;
- }
-
- if (fe->i_last_eb_blk)
- last_eb_blk = le64_to_cpu(fe->i_last_eb_blk);
- el = &fe->id2.i_list;
- } else if (type == OCFS2_XATTR_VALUE_EXTENT) {
- struct ocfs2_xattr_value_root *xv =
- (struct ocfs2_xattr_value_root *) obj;
-
- last_eb_blk = le64_to_cpu(xv->xr_last_eb_blk);
- el = &xv->xr_list;
- } else if (type == OCFS2_XATTR_TREE_EXTENT) {
- struct ocfs2_xattr_block *xb =
- (struct ocfs2_xattr_block *)root_bh->b_data;
-
- last_eb_blk = le64_to_cpu(xb->xb_attrs.xb_root.xt_last_eb_blk);
- el = &xb->xb_attrs.xb_root.xt_list;
- }
+ ocfs2_get_extent_tree(&et, inode, root_bh, type, obj);
+ el = et.et_root_el;
+ last_eb_blk = ocfs2_et_get_last_eb_blk(&et);
if (last_eb_blk) {
retval = ocfs2_read_block(osb, last_eb_blk,
@@ -665,6 +644,7 @@ bail:
if (eb_bh)
brelse(eb_bh);
...From: Sunil Mushran <sunil.mushran@oracle.com> Patch adds check for [no]user_xattr in ocfs2_show_options() that completes the list of all mount options. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/ocfs2/super.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 39f6238..6b4b86e 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1010,6 +1010,11 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt) seq_printf(s, ",cluster_stack=%.*s", OCFS2_STACK_LABEL_LEN, osb->osb_cluster_stack); + if (opts & OCFS2_MOUNT_NOUSERXATTR) + seq_printf(s, ",nouser_xattr"); + else + seq_printf(s, ",user_xattr"); + if (opts & OCFS2_MOUNT_INODE64) seq_printf(s, ",inode64"); -- 1.5.4.5 --
From: Joel Becker <joel.becker@oracle.com>
ocfs2 inode numbers are block numbers. For any filesystem with less
than 2^32 blocks, this is not a problem. However, when ocfs2 starts
using JDB2, it will be able to support filesystems with more than 2^32
blocks. This would result in inode numbers higher than 2^32.
The problem is that stat(2) can't handle those numbers on 32bit
machines. The simple solution is to have ocfs2 allocate all inodes
below that boundary.
The suballoc code is changed to honor an optional block limit. Only the
inode suballocator sets that limit - all other allocations stay unlimited.
The biggest trick is to grow the inode suballocator beneath that limit.
There's no point in allocating block groups that are above the limit,
then rejecting their elements later on. We want to prevent the inode
allocator from ever having block groups above the limit. This involves
a little gyration with the local alloc code. If the local alloc window
is above the limit, it signals the caller to try the global bitmap but
does not disable the local alloc file (which can be used for other
allocations).
[ Minor cleanup - removed an ML_NOTICE comment. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/localalloc.c | 55 ++++++++++++++++++++++++++++++++
fs/ocfs2/suballoc.c | 83 ++++++++++++++++++++++++++++++++++++++++---------
fs/ocfs2/suballoc.h | 11 ++++--
3 files changed, 130 insertions(+), 19 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index b889f10..02227c3 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -570,6 +570,46 @@ out:
return status;
}
+/* Check to see if the local alloc window is within ac->ac_max_block */
+static int ocfs2_local_alloc_in_range(struct inode *inode,
+ struct ocfs2_alloc_context *ac,
+ u32 bits_wanted)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_dinode ...From: Joel Becker <joel.becker@oracle.com> Now that ocfs2 limits inode numbers to 32bits, add a mount option to disable the limit. This parallels XFS. 64bit systems can handle the larger inode numbers. [ Added description of inode64 mount option in ocfs2.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- Documentation/filesystems/ocfs2.txt | 4 ++++ fs/ocfs2/ocfs2.h | 1 + fs/ocfs2/suballoc.c | 5 +++-- fs/ocfs2/super.c | 17 +++++++++++++++++ 4 files changed, 25 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index c318a8b..6acf1b4 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -76,3 +76,7 @@ localalloc=8(*) Allows custom localalloc size in MB. If the value is too large, the fs will silently revert it to the default. Localalloc is not enabled for local mounts. localflocks This disables cluster aware flock. +inode64 Indicates that Ocfs2 is allowed to create inodes at + any location in the filesystem, including those which + will result in inode numbers occupying more than 32 + bits of significance. diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index 6d3c10d..78ae4f8 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -189,6 +189,7 @@ enum ocfs2_mount_options OCFS2_MOUNT_DATA_WRITEBACK = 1 << 4, /* No data ordering */ OCFS2_MOUNT_LOCALFLOCKS = 1 << 5, /* No cluster aware user file locks */ OCFS2_MOUNT_NOUSERXATTR = 1 << 6, /* No user xattr */ + OCFS2_MOUNT_INODE64 = 1 << 7, /* Allow inode numbers > 2^32 */ }; #define OCFS2_OSB_SOFT_RO 0x0001 diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 213bdca..d7a6f92 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -601,9 +601,10 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb, /* * stat(2) can't handle ...
From: Tao Ma <tao.ma@oracle.com>
In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
have a transaction open. This will deadlock the downconvert thread, so fix
it.
We can clean up how xattr blocks are removed while here - this patch also
moves the mechanism of releasing xattr block (including both value, xattr
tree and xattr block) into this function.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/xattr.c | 152 +++++++++++++++++++++++++++++-------------------------
1 files changed, 82 insertions(+), 70 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 38e3e5e..b2e25a8 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1427,51 +1427,6 @@ out:
}
-static int ocfs2_xattr_free_block(handle_t *handle,
- struct ocfs2_super *osb,
- struct ocfs2_xattr_block *xb)
-{
- struct inode *xb_alloc_inode;
- struct buffer_head *xb_alloc_bh = NULL;
- u64 blk = le64_to_cpu(xb->xb_blkno);
- u16 bit = le16_to_cpu(xb->xb_suballoc_bit);
- u64 bg_blkno = ocfs2_which_suballoc_group(blk, bit);
- int ret = 0;
-
- xb_alloc_inode = ocfs2_get_system_file_inode(osb,
- EXTENT_ALLOC_SYSTEM_INODE,
- le16_to_cpu(xb->xb_suballoc_slot));
- if (!xb_alloc_inode) {
- ret = -ENOMEM;
- mlog_errno(ret);
- goto out;
- }
- mutex_lock(&xb_alloc_inode->i_mutex);
-
- ret = ocfs2_inode_lock(xb_alloc_inode, &xb_alloc_bh, 1);
- if (ret < 0) {
- mlog_errno(ret);
- goto out_mutex;
- }
- ret = ocfs2_extend_trans(handle, OCFS2_SUBALLOC_FREE);
- if (ret < 0) {
- mlog_errno(ret);
- goto out_unlock;
- }
- ret = ocfs2_free_suballoc_bits(handle, xb_alloc_inode, xb_alloc_bh,
- bit, bg_blkno, 1);
- if (ret < 0)
- mlog_errno(ret);
-out_unlock:
- ocfs2_inode_unlock(xb_alloc_inode, 1);
- brelse(xb_alloc_bh);
-out_mutex:
- mutex_unlock(&xb_alloc_inode->i_mutex);
- iput(xb_alloc_inode);
-out:
- return ret;
-}
-
static int ocfs2_remove_value_outside(struct ...From: Tao Ma <tao.ma@oracle.com>
In ocfs2_extend_trans, when we can't extend the current
transaction, it will commit current transaction and restart
a new one. So if the previous credits we have allocated aren't
used(the block isn't dirtied before our extend), we will not
have enough credits for any future operation(it will cause jbd
complain and bug out). So check this and re-extend it.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/xattr.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 1a4de3d..38e3e5e 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1336,8 +1336,9 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
}
if (!(flag & OCFS2_INLINE_XATTR_FL)) {
- /*set extended attribue in external blcok*/
+ /* set extended attribute in external block. */
ret = ocfs2_extend_trans(handle,
+ OCFS2_INODE_UPDATE_CREDITS +
OCFS2_XATTR_BLOCK_UPDATE_CREDITS);
if (ret) {
mlog_errno(ret);
@@ -3701,6 +3702,18 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
}
}
+ if (handle->h_buffer_credits < credits) {
+ /*
+ * The journal has been restarted before, and don't
+ * have enough space for the insertion, so extend it
+ * here.
+ */
+ ret = ocfs2_extend_trans(handle, credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto leave;
+ }
+ }
mlog(0, "Insert %u clusters at block %llu for xattr at %u\n",
num_bits, block, v_start);
ret = ocfs2_insert_extent(osb, handle, inode, &et, v_start, block,
--
1.5.4.5
--
From: Joel Becker <joel.becker@oracle.com>
The original get/put_extent_tree() functions held a reference on
et_root_bh. However, every single caller already has a safe reference,
making the get/put cycle irrelevant.
We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It
no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is
removed. Callers now have a simpler init+use pattern.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 49 +++++++++++++++++++++----------------------------
fs/ocfs2/alloc.h | 26 ++++++++++++--------------
fs/ocfs2/aops.c | 6 ++----
fs/ocfs2/dir.c | 6 ++----
fs/ocfs2/file.c | 10 +++-------
fs/ocfs2/xattr.c | 14 ++++----------
6 files changed, 44 insertions(+), 67 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 51c3183..5f44ef8 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -55,7 +55,7 @@
*
* To implement an on-disk btree (extent tree) type in ocfs2, add
* an ocfs2_extent_tree_operations structure and the matching
- * ocfs2_get_<thingy>_extent_tree() function. That's pretty much it
+ * ocfs2_init_<thingy>_extent_tree() function. That's pretty much it
* for the allocation portion of the extent tree.
*/
struct ocfs2_extent_tree_operations {
@@ -301,14 +301,13 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
.eo_fill_max_leaf_clusters = ocfs2_xattr_tree_fill_max_leaf_clusters,
};
-static void __ocfs2_get_extent_tree(struct ocfs2_extent_tree *et,
- struct inode *inode,
- struct buffer_head *bh,
- void *obj,
- struct ocfs2_extent_tree_operations *ops)
+static void __ocfs2_init_extent_tree(struct ocfs2_extent_tree *et,
+ struct inode *inode,
+ struct buffer_head *bh,
+ void *obj,
+ struct ocfs2_extent_tree_operations *ops)
{
et->et_ops = ops;
- get_bh(bh);
et->et_root_bh = bh;
...From: Joel Becker <joel.becker@oracle.com>
The 'private' pointer was a way to store off xattr values, which don't
live at a set place in the bh. But the concept of "the object
containing the extent tree" is much more generic. For an inode it's the
struct ocfs2_dinode, for an xattr value its the value. Let's save off
the 'object' at all times. If NULL is passed to
ocfs2_get_extent_tree(), 'object' is set to bh->b_data;
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 62 +++++++++++++++++++++++++----------------------------
1 files changed, 29 insertions(+), 33 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 0abf11e..93f44f4 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -79,15 +79,14 @@ struct ocfs2_extent_tree {
struct ocfs2_extent_tree_operations *et_ops;
struct buffer_head *et_root_bh;
struct ocfs2_extent_list *et_root_el;
- void *et_private;
+ void *et_object;
unsigned int et_max_leaf_clusters;
};
static void ocfs2_dinode_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
- struct ocfs2_dinode *di =
- (struct ocfs2_dinode *)et->et_root_bh->b_data;
+ struct ocfs2_dinode *di = et->et_object;
BUG_ON(et->et_type != OCFS2_DINODE_EXTENT);
di->i_last_eb_blk = cpu_to_le64(blkno);
@@ -95,8 +94,7 @@ static void ocfs2_dinode_set_last_eb_blk(struct ocfs2_extent_tree *et,
static u64 ocfs2_dinode_get_last_eb_blk(struct ocfs2_extent_tree *et)
{
- struct ocfs2_dinode *di =
- (struct ocfs2_dinode *)et->et_root_bh->b_data;
+ struct ocfs2_dinode *di = et->et_object;
BUG_ON(et->et_type != OCFS2_DINODE_EXTENT);
return le64_to_cpu(di->i_last_eb_blk);
@@ -106,8 +104,7 @@ static void ocfs2_dinode_update_clusters(struct inode *inode,
struct ocfs2_extent_tree *et,
u32 clusters)
{
- struct ocfs2_dinode *di =
- (struct ocfs2_dinode *)et->et_root_bh->b_data;
+ struct ocfs2_dinode *di = ...From: Joel Becker <joel.becker@oracle.com>
The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
All users are updated.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 118 ++++++++++++++++++++++++++++--------------------------
1 files changed, 61 insertions(+), 57 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 9fe49f2..ab16b89 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -75,28 +75,30 @@ struct ocfs2_extent_tree_operations {
};
struct ocfs2_extent_tree {
- enum ocfs2_extent_tree_type type;
- struct ocfs2_extent_tree_operations *eops;
- struct buffer_head *root_bh;
- struct ocfs2_extent_list *root_el;
- void *private;
- unsigned int max_leaf_clusters;
+ enum ocfs2_extent_tree_type et_type;
+ struct ocfs2_extent_tree_operations *et_ops;
+ struct buffer_head *et_root_bh;
+ struct ocfs2_extent_list *et_root_el;
+ void *et_private;
+ unsigned int et_max_leaf_clusters;
};
static void ocfs2_dinode_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
- struct ocfs2_dinode *di = (struct ocfs2_dinode *)et->root_bh->b_data;
+ struct ocfs2_dinode *di =
+ (struct ocfs2_dinode *)et->et_root_bh->b_data;
- BUG_ON(et->type != OCFS2_DINODE_EXTENT);
+ BUG_ON(et->et_type != OCFS2_DINODE_EXTENT);
di->i_last_eb_blk = cpu_to_le64(blkno);
}
static u64 ocfs2_dinode_get_last_eb_blk(struct ocfs2_extent_tree *et)
{
- struct ocfs2_dinode *di = (struct ocfs2_dinode *)et->root_bh->b_data;
+ struct ocfs2_dinode *di =
+ (struct ocfs2_dinode *)et->et_root_bh->b_data;
- BUG_ON(et->type != OCFS2_DINODE_EXTENT);
+ BUG_ON(et->et_type != OCFS2_DINODE_EXTENT);
return le64_to_cpu(di->i_last_eb_blk);
}
@@ -105,7 +107,7 @@ static void ocfs2_dinode_update_clusters(struct inode *inode,
u32 clusters)
{
struct ocfs2_dinode *di =
- (struct ocfs2_dinode *)et->root_bh->b_data;
+ (struct ocfs2_dinode ...From: Joel Becker <joel.becker@oracle.com>
Rather than allocating a struct ocfs2_extent_tree, just put it on the
stack. Fill it with ocfs2_get_extent_tree() and drop it with
ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet
still safely ref the root_bh.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/alloc.c | 117 ++++++++++++++++-------------------------------------
1 files changed, 36 insertions(+), 81 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index ab16b89..0abf11e 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -223,22 +223,17 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
.eo_sanity_check = ocfs2_xattr_tree_sanity_check,
};
-static struct ocfs2_extent_tree*
- ocfs2_new_extent_tree(struct inode *inode,
- struct buffer_head *bh,
- enum ocfs2_extent_tree_type et_type,
- void *private)
+static void ocfs2_get_extent_tree(struct ocfs2_extent_tree *et,
+ struct inode *inode,
+ struct buffer_head *bh,
+ enum ocfs2_extent_tree_type et_type,
+ void *private)
{
- struct ocfs2_extent_tree *et;
-
- et = kzalloc(sizeof(*et), GFP_NOFS);
- if (!et)
- return NULL;
-
et->et_type = et_type;
get_bh(bh);
et->et_root_bh = bh;
et->et_private = private;
+ et->et_max_leaf_clusters = 0;
if (et_type == OCFS2_DINODE_EXTENT) {
et->et_root_el =
@@ -257,16 +252,11 @@ static struct ocfs2_extent_tree*
et->et_max_leaf_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
OCFS2_MAX_XATTR_TREE_LEAF_SIZE);
}
-
- return et;
}
-static void ocfs2_free_extent_tree(struct ocfs2_extent_tree *et)
+static void ocfs2_put_extent_tree(struct ocfs2_extent_tree *et)
{
- if (et) {
- brelse(et->et_root_bh);
- kfree(et);
- }
+ brelse(et->et_root_bh);
}
static inline void ocfs2_et_set_last_eb_blk(struct ocfs2_extent_tree *et,
@@ -4439,22 +4429,15 @@ int ...Hi Mark, do you see my 2 patches for xattr? http://oss.oracle.com/pipermail/ocfs2-devel/2008-September/002839.html this is pretty straightforward and I think it can be committed with it. http://oss.oracle.com/pipermail/ocfs2-devel/2008-September/002839.html this is the new support for empty bucket. Regards, Tao --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm |
