Hi,
here is the second bunch of UBIFS updates we would like to send to
Linus for 2.6.27 inclusion. The updates contain the following:
Adrian Hunter (2):
UBIFS: always read hashed-key nodes under TNC mutex
UBIFS: allow for racing between GC and TNC
Artem Bityutskiy (10):
UBIFS: fix zero-length truncations
UBIFS: do not update min_idx_lebs in stafs
UBIFS: push empty flash hack down
UBIFS: remove incorrect index space check
UBIFS: improve statfs reporting
UBIFS: fix assertion
UBIFS: add forgotten gc_idx_lebs component
UBIFS: introduce LEB overhead
UBIFS: improve statfs reporting even more
UBIFS: fill f_fsid
fs/ubifs/budget.c | 113 +++++++++++++++++++++++++++++++++++++++------------
fs/ubifs/dir.c | 1 -
fs/ubifs/file.c | 20 +++++++--
fs/ubifs/find.c | 18 ++++-----
fs/ubifs/gc.c | 6 +++
fs/ubifs/misc.h | 49 ++++++++---------------
fs/ubifs/super.c | 19 +++++---
fs/ubifs/tnc.c | 116 ++++++++++++++++++++++++++++-------------------------
fs/ubifs/ubifs.h | 14 +++++-
9 files changed, 217 insertions(+), 139 deletions(-)
These patches are pure fixes:
UBIFS: fix zero-length truncations
UBIFS: always read hashed-key nodes under TNC mutex
UBIFS: allow for racing between GC and TNC
UBIFS: fix assertion
UBIFS: add forgotten gc_idx_lebs component
UBIFS: remove incorrect index space check
These patches partially fix the problem reported in the MTD ML:
http://lists.infradead.org/pipermail/linux-mtd/2008-August/022579.html
UBIFS: do not update min_idx_lebs in stafs
UBIFS: push empty flash hack down
UBIFS: improve statfs reporting
UBIFS: introduce LEB overhead
UBIFS: improve statfs reporting even more
Thus, it should be fine to send them mainline at this late stage.
Unless there are complaints, I'll send a pull request soon.
Thanks.
--
Best regards,
Artem Bityutskiy (From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Always allow truncations to zero, even if budgeting thinks there
is no space. UBIFS reserves some space for deletions anyway.
Otherwise, the following happans:
1. create a file, and write as much as possible there, until ENOSPC
2. truncate the file, which fails with ENOSPC, which is not good.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/dir.c | 1 -
fs/ubifs/file.c | 20 ++++++++++++++++----
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 5c96f1f..2b267c9 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -587,7 +587,6 @@ static int ubifs_unlink(struct inode *dir, struct dentry *dentry)
if (err) {
if (err != -ENOSPC)
return err;
- err = 0;
budgeted = 0;
}
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 4071d1c..3d698e2 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -793,7 +793,7 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
int err;
struct ubifs_budget_req req;
loff_t old_size = inode->i_size, new_size = attr->ia_size;
- int offset = new_size & (UBIFS_BLOCK_SIZE - 1);
+ int offset = new_size & (UBIFS_BLOCK_SIZE - 1), budgeted = 1;
struct ubifs_inode *ui = ubifs_inode(inode);
dbg_gen("ino %lu, size %lld -> %lld", inode->i_ino, old_size, new_size);
@@ -811,8 +811,15 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
/* A funny way to budget for truncation node */
req.dirtied_ino_d = UBIFS_TRUN_NODE_SZ;
err = ubifs_budget_space(c, &req);
- if (err)
- return err;
+ if (err) {
+ /*
+ * Treat truncations to zero as deletion and always allow them,
+ * just like we do for '->unlink()'.
+ */
+ if (new_size || err != -ENOSPC)
+ return err;
+ budgeted = 0;
+ }
err = vmtruncate(inode, new_size);
if (err)
@@ -869,7 +876,12 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
err = ...From: Adrian Hunter <ext-adrian.hunter@nokia.com>
Leaf-nodes that have a hashed key are stored in the
leaf-node-cache (LNC) which is protected by the TNC
mutex. Consequently, when reading a leaf node with
a hashed key (i.e. directory entries, xattr entries)
the TNC mutex is always required.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/tnc.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
index e909f4a..4fbc592 100644
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -1498,7 +1498,6 @@ static int do_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
{
int found, n, err;
struct ubifs_znode *znode;
- struct ubifs_zbranch zbr;
dbg_tnc("name '%.*s' key %s", nm->len, nm->name, DBGKEY(key));
mutex_lock(&c->tnc_mutex);
@@ -1522,11 +1521,7 @@ static int do_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
goto out_unlock;
}
- zbr = znode->zbranch[n];
- mutex_unlock(&c->tnc_mutex);
-
- err = tnc_read_node_nm(c, &zbr, node);
- return err;
+ err = tnc_read_node_nm(c, &znode->zbranch[n], node);
out_unlock:
mutex_unlock(&c->tnc_mutex);
--
1.5.4.1
--
From: Adrian Hunter <ext-adrian.hunter@nokia.com>
The TNC mutex is unlocked prematurely when reading leaf nodes
with non-hashed keys. This is unsafe because the node may be
moved by garbage collection and the eraseblock unmapped, although
that has never actually happened during stress testing.
This patch fixes the flaw by detecting the race and retrying with
the TNC mutex locked.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/gc.c | 6 +++
fs/ubifs/misc.h | 17 ++++++++
fs/ubifs/tnc.c | 109 +++++++++++++++++++++++++++++------------------------
fs/ubifs/ubifs.h | 6 ++-
4 files changed, 87 insertions(+), 51 deletions(-)
diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c
index d0f3dac..13f1019 100644
--- a/fs/ubifs/gc.c
+++ b/fs/ubifs/gc.c
@@ -344,6 +344,12 @@ int ubifs_garbage_collect_leb(struct ubifs_info *c, struct ubifs_lprops *lp)
if (err)
goto out;
+ /* Allow for races with TNC */
+ c->gced_lnum = lnum;
+ smp_wmb();
+ c->gc_seq += 1;
+ smp_wmb();
+
if (c->gc_lnum == -1) {
c->gc_lnum = lnum;
err = LEB_RETAINED;
diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h
index 87dabf9..87ced4c 100644
--- a/fs/ubifs/misc.h
+++ b/fs/ubifs/misc.h
@@ -325,4 +325,21 @@ static inline struct timespec ubifs_current_time(struct inode *inode)
current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
}
+/**
+ * ubifs_tnc_lookup - look up a file-system node.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ *
+ * This function look up and reads node with key @key. The caller has to make
+ * sure the @node buffer is large enough to fit the node. Returns zero in case
+ * of success, %-ENOENT if the node was not found, and a negative error code in
+ * case of failure.
+ */
+static inline int ubifs_tnc_lookup(struct ubifs_info *c,
+ const union ubifs_key *key, void *node)
+{
+ return ubifs_tnc_locate(c, key, node, NULL, NULL);
+}
+
#endif /* ...From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> This is bad because the rest of the code should not depend on it, and this may hide bugss, instead of revealing them. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- fs/ubifs/budget.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c index 1540981..ac0d2e1 100644 --- a/fs/ubifs/budget.c +++ b/fs/ubifs/budget.c @@ -741,7 +741,6 @@ long long ubifs_budg_get_free_space(struct ubifs_info *c) available = ubifs_calc_available(c, min_idx_lebs); outstanding = c->budg_data_growth + c->budg_dd_growth; - c->min_idx_lebs = min_idx_lebs; spin_unlock(&c->space_lock); if (available > outstanding) -- 1.5.4.1 --
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have a hack which forces the amount of flash space to be
equivalent to 'c->blocks_cnt' in case of empty FS. This is
to make users happy and see '%0' used in 'df' when they
mount an empty FS. This hack is not needed in
'ubifs_calc_available()', but it is only needed the caller,
in 'ubifs_budg_get_free_space()'. So push it down there.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/budget.c | 24 +++++++++++-------------
fs/ubifs/super.c | 2 --
2 files changed, 11 insertions(+), 15 deletions(-)
diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index ac0d2e1..f6d2eaa 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -302,18 +302,6 @@ long long ubifs_calc_available(const struct ubifs_info *c, int min_idx_lebs)
int subtract_lebs;
long long available;
- /*
- * Force the amount available to the total size reported if the used
- * space is zero.
- */
- if (c->lst.total_used <= UBIFS_INO_NODE_SZ &&
- c->budg_data_growth + c->budg_dd_growth == 0) {
- /* Do the same calculation as for c->block_cnt */
- available = c->main_lebs - 2;
- available *= c->leb_size - c->dark_wm;
- return available;
- }
-
available = c->main_bytes - c->lst.total_used;
/*
@@ -739,8 +727,18 @@ long long ubifs_budg_get_free_space(struct ubifs_info *c)
return 0;
}
- available = ubifs_calc_available(c, min_idx_lebs);
outstanding = c->budg_data_growth + c->budg_dd_growth;
+
+ /*
+ * Force the amount available to the total size reported if the used
+ * space is zero.
+ */
+ if (c->lst.total_used <= UBIFS_INO_NODE_SZ && !outstanding) {
+ spin_unlock(&c->space_lock);
+ return (long long)c->block_cnt << UBIFS_BLOCK_SHIFT;
+ }
+
+ available = ubifs_calc_available(c, min_idx_lebs);
spin_unlock(&c->space_lock);
if (available > outstanding)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index f71e6b8..1018053 100644
--- a/fs/ubifs/super.c
+++ ...From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When we report free space to user-space, we should not report
0 if the amount of empty LEBs is too low, because they would
be produced by GC when needed. Thus, just call
'ubifs_calc_available()' straight away which would take
'min_idx_lebs' into account anyway.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/budget.c | 15 +--------------
1 files changed, 1 insertions(+), 14 deletions(-)
diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index f6d2eaa..9ef630a 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -709,24 +709,11 @@ void ubifs_release_dirty_inode_budget(struct ubifs_info *c,
*/
long long ubifs_budg_get_free_space(struct ubifs_info *c)
{
- int min_idx_lebs, rsvd_idx_lebs;
+ int min_idx_lebs;
long long available, outstanding, free;
- /* Do exactly the same calculations as in 'do_budget_space()' */
spin_lock(&c->space_lock);
min_idx_lebs = ubifs_calc_min_idx_lebs(c);
-
- if (min_idx_lebs > c->lst.idx_lebs)
- rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
- else
- rsvd_idx_lebs = 0;
-
- if (rsvd_idx_lebs > c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt
- - c->lst.taken_empty_lebs) {
- spin_unlock(&c->space_lock);
- return 0;
- }
-
outstanding = c->budg_data_growth + c->budg_dd_growth;
/*
--
1.5.4.1
--
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Make free space calculation less pessimistic and more realistic, which in turn improves 'statfs()' reports. Now it lies by 10%-20%, instead of 20%-30% (10% more honest). Results of "freespace" test (120MiB volume, 16KiB LEB size, 512 bytes page size). Before the change: freespace: Test 1: fill the space we have 3 times freespace: was free: 78274560 bytes 74.6 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 18214912 bytes 17.4 MiB, wrote 23.3% more than predicted freespace: was free: 76754944 bytes 73.2 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 19738624 bytes 18.8 MiB, wrote 25.7% more than predicted freespace: was free: 76759040 bytes 73.2 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 19730432 bytes 18.8 MiB, wrote 25.7% more than predicted freespace: Test 1 finished freespace: Test 2: gradually lessen amount of free space and fill the FS freespace: do 10 steps, lessen free space by 6977722 bytes 6.7 MiB each time freespace: was free: 72273920 bytes 68.9 MiB, wrote: 88891392 bytes 84.8 MiB, delta: 16617472 bytes 15.8 MiB, wrote 23.0% more than predicted freespace: was free: 66154496 bytes 63.1 MiB, wrote: 81506304 bytes 77.7 MiB, delta: 15351808 bytes 14.6 MiB, wrote 23.2% more than predicted freespace: was free: 58732544 bytes 56.0 MiB, wrote: 72572928 bytes 69.2 MiB, delta: 13840384 bytes 13.2 MiB, wrote 23.6% more than predicted freespace: was free: 51552256 bytes 49.2 MiB, wrote: 63754240 bytes 60.8 MiB, delta: 12201984 bytes 11.6 MiB, wrote 23.7% more than predicted freespace: was free: 44404736 bytes 42.3 MiB, wrote: 54943744 bytes 52.4 MiB, delta: 10539008 bytes 10.1 MiB, wrote 23.7% more than predicted freespace: was free: 37285888 bytes 35.6 MiB, wrote: 46161920 bytes 44.0 MiB, delta: 8876032 bytes 8.5 MiB, wrote 23.8% more than predicted freespace: was free: 30171136 bytes 28.8 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 7213056 bytes 6.9 MiB, wrote 23.9% more than predicted freespace: was free: 23048192 bytes 22.0 MiB, ...
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> The assertion was incorrect, because it did not take into account free space. This patch also amends the comments correspondingly, and cleans them up a little. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- fs/ubifs/find.c | 16 +++++++--------- 1 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c index adee7b5..9fc55ae 100644 --- a/fs/ubifs/find.c +++ b/fs/ubifs/find.c @@ -211,14 +211,8 @@ static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c, * dirty index heap, and it falls-back to LPT scanning if the heaps are empty * or do not have an LEB which satisfies the @min_space criteria. * - * Note: - * o LEBs which have less than dead watermark of dirty space are never picked - * by this function; - * - * Returns zero and the LEB properties of - * found dirty LEB in case of success, %-ENOSPC if no dirty LEB was found and a - * negative error code in case of other failures. The returned LEB is marked as - * "taken". + * Note, LEBs which have less than dead watermark of free + dirty space are + * never picked by this function. * * The additional @pick_free argument controls if this function has to return a * free or freeable LEB if one is present. For example, GC must to set it to %1, @@ -231,6 +225,10 @@ static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c, * * In addition @pick_free is set to %2 by the recovery process in order to * recover gc_lnum in which case an index LEB must not be returned. + * + * This function returns zero and the LEB properties of found dirty LEB in case + * of success, %-ENOSPC if no dirty LEB was found and a negative error code in + * case of other failures. The returned LEB is marked as "taken". */ int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp, int min_space, int pick_free) @@ -317,7 +315,7 @@ int ubifs_find_dirty_leb(struct ...
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> We add this component at other similar places, but not in this one. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- fs/ubifs/find.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c index 9fc55ae..e045c8b 100644 --- a/fs/ubifs/find.c +++ b/fs/ubifs/find.c @@ -243,7 +243,7 @@ int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp, int lebs, rsvd_idx_lebs = 0; spin_lock(&c->space_lock); - lebs = c->lst.empty_lebs; + lebs = c->lst.empty_lebs + c->idx_gc_cnt; lebs += c->freeable_cnt - c->lst.taken_empty_lebs; /* -- 1.5.4.1 --
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a preparational patch for the following statfs()
report fix.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/super.c | 6 ++++++
fs/ubifs/ubifs.h | 5 +++++
2 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 1018053..be23fd3 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -530,6 +530,12 @@ static int init_constants_early(struct ubifs_info *c)
c->dead_wm = ALIGN(MIN_WRITE_SZ, c->min_io_size);
c->dark_wm = ALIGN(UBIFS_MAX_NODE_SZ, c->min_io_size);
+ /*
+ * Calculate how many bytes would be wasted at the end of LEB if it was
+ * fully filled with data nodes of maximum size. This is used in
+ * calculations when reporting free space.
+ */
+ c->leb_overhead = c->leb_size % UBIFS_MAX_DATA_NODE_SZ;
return 0;
}
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 681d46e..57e5854 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -995,6 +995,9 @@ struct ubifs_mount_opts {
* @max_idx_node_sz: maximum indexing node aligned on 8-bytes boundary
* @max_inode_sz: maximum possible inode size in bytes
* @max_znode_sz: size of znode in bytes
+ *
+ * @leb_overhead: how many bytes are wasted in an LEB when it is filled with
+ * data nodes of maximum size - used in free space reporting
* @dead_wm: LEB dead space watermark
* @dark_wm: LEB dark space watermark
* @block_cnt: count of 4KiB blocks on the FS
@@ -1226,6 +1229,8 @@ struct ubifs_info {
int max_idx_node_sz;
long long max_inode_sz;
int max_znode_sz;
+
+ int leb_overhead;
int dead_wm;
int dark_wm;
int block_cnt;
--
1.5.4.1
--
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Since free space we report in statfs is file size which should fit to the FS - change the way we calculate free space and use leb_overhead instead of dark_wm in calculations. Results of "freespace" test (120MiB volume, 16KiB LEB size, 512 bytes page size). Before the change: freespace: Test 1: fill the space we have 3 times freespace: was free: 85204992 bytes 81.3 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 11284480 bytes 10.8 MiB, wrote 13.2% more than predicted freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 12935168 bytes 12.3 MiB, wrote 15.5% more than predicted freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 12939264 bytes 12.3 MiB, wrote 15.5% more than predicted freespace: Test 1 finished freespace: Test 2: gradually lessen amount of free space and fill the FS freespace: do 10 steps, lessen free space by 7596218 bytes 7.2 MiB each time freespace: was free: 78675968 bytes 75.0 MiB, wrote: 88903680 bytes 84.8 MiB, delta: 10227712 bytes 9.8 MiB, wrote 13.0% more than predicted freespace: was free: 72015872 bytes 68.7 MiB, wrote: 81514496 bytes 77.7 MiB, delta: 9498624 bytes 9.1 MiB, wrote 13.2% more than predicted freespace: was free: 63938560 bytes 61.0 MiB, wrote: 72589312 bytes 69.2 MiB, delta: 8650752 bytes 8.2 MiB, wrote 13.5% more than predicted freespace: was free: 56127488 bytes 53.5 MiB, wrote: 63762432 bytes 60.8 MiB, delta: 7634944 bytes 7.3 MiB, wrote 13.6% more than predicted freespace: was free: 48336896 bytes 46.1 MiB, wrote: 54935552 bytes 52.4 MiB, delta: 6598656 bytes 6.3 MiB, wrote 13.7% more than predicted freespace: was free: 40587264 bytes 38.7 MiB, wrote: 46157824 bytes 44.0 MiB, delta: 5570560 bytes 5.3 MiB, wrote 13.7% more than predicted freespace: was free: 32841728 bytes 31.3 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 4542464 bytes 4.3 MiB, wrote 13.8% more than predicted freespace: was free: 25100288 bytes 23.9 MiB, wrote: ...
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> UBIFS stores 16-bit UUID in the superblock, and it is a good idea to return part of it in 'f_fsid' filed of kstatfs structure. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- fs/ubifs/super.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index 1207bd5..0dee404 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -386,6 +386,7 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_files = 0; buf->f_ffree = 0; buf->f_namelen = UBIFS_MAX_NLEN; + memcpy(&buf->f_fsid, c->uuid, sizeof(__kernel_fsid_t)); return 0; } -- 1.5.4.1 --
For btrfs I xor the first 64 bits with the second 64 bits, and put _that_ into f_fsid. You're just putting the first 64 bits in and ignoring the second 64 bits. Neither is really _better_ than the other; you just alter the circumstances in which you get collisions. But I suppose we might as well be consistent about how we do it? (Actually I XOR in the root object ID for the subvol too, but you don't need to worry about that). Alternatively, there's space in the struct statfs to export a couple more uint32_ts, and thus the _whole_ of the uuid. Perhaps we should do that? -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Well, xor-ing should not make the random UUID more random, but for You mean f_spare? I do not think any application would use it so I doubt it is good idea to use it. -- Best regards, Artem Bityutskiy (Битюцкий Артём) --
Well, it wouldn't be called f_spare any more, and applications would know that it contains the rest of the UUID if it's non-zero. I'd fix nfsd to fix it immediately (like as I just fixed nfsd to use the first 64 bits of f_fsid when it's non-zero). -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
May be, but there are only 5 bytes left. But why would you need more UUID bytes? -- Best regards, Artem Bityutskiy (Битюцкий Артём) --
People are mostly using 16-byte UUIDs. We only have space to expose 8 bytes. It just seems to make sense to expose all 16. But there's no overriding reason to do it, really. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Yeah, I would not go for it unless there is a specific problem which should be solved with this. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) --
XFS just puts in the st_dev. And I can't realy find any useful defintion of what it's supposed to b anyway.. --
For me this means that we should rather do what XFS does for consistency then. -- Best regards, Artem Bityutskiy (Битюцкий Артём) --
The fsid is supposed to be a persistent, unique identifier for the filesystem, used by NFS in file handles. Using st_dev is unsafe, because that may change from one server boot to the next, because of device probing order, driver changes, etc. Also, not all filesystems HAVE a valid st_dev in the first place, which is the whole reason for this thread. I think a ->get_fsid() export method would be preferable. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. --
I implemented that, but it doesn't really work. The fsid->export mapping happens in userspace, so mountd needs access to the fsid. So the mount would work fine and return a FH with the appropriate fsid, and then mountd would have no knowledge of how to handle that FH, and mount(8) on the client would eventually time out and fail. It worked if I prepopulated the nfsd.fh cache in expkey_parse() so we didn't end up asking userspace to resolve that FH for us -- but that was just a quick hack. It wouldn't have worked after a server reboot, for example -- we'd have asked userspace again. We'd need to export that fsid to userspace somehow. I did briefly think about adding something like ',uuid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' to each line in /proc/mounts, as if it was a file system option -- but I don't like that much. When looking for other options, I realised that we already have the f_fsid field in struct statfs, and we might as well just use that. That does seem to be what it was _designed_ for, after all. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
My bad - I had looked back in email archives about this issue and got the export operation solution stuck in my head... Yes, using f_fsid is the right way to do this, assuming NFSD plays along. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. --
It does now, if it can't find a UUID by other means. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Umm, different things. f_fsid in stat(v)fs is just a cookie exported to userspac that has never really been documented. We also called the filesystem part of the NFS filehandle in a few places, and for those it's correct that it should be stable. Currently the fsid is either created from the dev_t in kernelspace or from uuids extracted through libuuid in userspace. I can't see anything in the message that started this thread that mentions NFS, btw. --
That was Artem's motivation for filling in f_fsid in the first place. Does anything else even _care_ about f_fsid? -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
I think that's a rather bad idea. As mentioned before f_fsid is basically random. --
It's not that bad. Of the file systems which actually fill it in... For NTFS it's the volume serial number. For ext[234] it's the UUID xor-folded into 64 bits. For btrfs the same, but with the root object ID xor'd in too. For bfs and xfs it's the block device -- which isn't ideal, as Andreas points out, but is what we'd fall back to anyway. BFS seems to have a s_volume field which perhaps we could use, but we should check for it being all spaces and leave f_fsid as zero in that case. Presumably XFS could be using sb_uuid? For efs, it'll be either EFS_MAGIC or EFS_NEWMAGIC, which is probably a bad thing if you want to export multiple EFS file systems. I'll send a patch to fix that (by leaving it zero) shortly. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Yes, right. I am planning NFS support and would like to have UBIFS I do not know anything but NFS. -- Best regards, Artem Bityutskiy (Битюцкий Артём) --
Our man page for statfs(2) says...
The f_fsid field
Solaris, Irix and POSIX have a system call statvfs(2) that returns a
struct statvfs (defined in <sys/statvfs.h>) containing an unsigned long
f_fsid. Linux, SunOS, HP-UX, 4.4BSD have a system call statfs() that
returns a struct statfs (defined in <sys/vfs.h>) containing a fsid_t
f_fsid, where fsid_t is defined as struct { int val[2]; }. The same
holds for FreeBSD, except that it uses the include file <sys/mount.h>.
The general idea is that f_fsid contains some random stuff such that
the pair (f_fsid,ino) uniquely determines a file. Some OSes use (a
variation on) the device number, or the device number combined with the
filesystem type. Several OSes restrict giving out the f_fsid field to
the superuser only (and zero it for unprivileged users), because this
field is used in the filehandle of the filesystem when NFS-exported,
and giving it out is a security concern.
Under some OSes the fsid can be used as second parameter to the sysfs()
system call.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
--
Like this?
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Wed, 3 Sep 2008 14:16:42 +0300
Subject: [PATCH] UBIFS: amend f_fsid
David Woodhouse suggested to be consistent with other FSes
and xor the beginning and the end of the UUID.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/super.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index ab9b5cb..83fe0a5 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -370,6 +370,7 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct ubifs_info *c = dentry->d_sb->s_fs_info;
unsigned long long free;
+ __le32 *uuid = (__le32 *)c->uuid;
free = ubifs_get_free_space(c);
dbg_gen("free space %lld bytes (%lld blocks)",
@@ -386,8 +387,8 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_files = 0;
buf->f_ffree = 0;
buf->f_namelen = UBIFS_MAX_NLEN;
- memcpy(&buf->f_fsid, c->uuid, sizeof(__kernel_fsid_t));
-
+ buf->f_fsid.val[0] = le32_to_cpu(uuid[0]) ^ le32_to_cpu(uuid[2]);
+ buf->f_fsid.val[1] = le32_to_cpu(uuid[1]) ^ le32_to_cpu(uuid[3]);
return 0;
}
--
1.5.4.1
--
Yeah, that's what ext[234] and btrfs do. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Oh, meant to send this to linux-fsdevel@vger.kernel.org, not to linux-fsdevel-owner@vger.kernel.org. Apologies. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss |
