[PATCH] UBIFS: fix zero-length truncations

Previous thread: 2.6.26 (x86 vs. x86_64) different /sys directory structure for md? by Justin Piszcz on Sunday, August 31, 2008 - 3:56 am. (1 message)

Next thread: ACPI PnP on Intel MU440EX by Martin Doucha on Sunday, August 31, 2008 - 6:22 am. (10 messages)
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

Hi,

here is the second bunch of UBIFS updates we would like to send to
Linus for 2.6.27 inclusion. The updates contain the following:

Adrian Hunter (2):
      UBIFS: always read hashed-key nodes under TNC mutex
      UBIFS: allow for racing between GC and TNC

Artem Bityutskiy (10):
      UBIFS: fix zero-length truncations
      UBIFS: do not update min_idx_lebs in stafs
      UBIFS: push empty flash hack down
      UBIFS: remove incorrect index space check
      UBIFS: improve statfs reporting
      UBIFS: fix assertion
      UBIFS: add forgotten gc_idx_lebs component
      UBIFS: introduce LEB overhead
      UBIFS: improve statfs reporting even more
      UBIFS: fill f_fsid

 fs/ubifs/budget.c |  113 +++++++++++++++++++++++++++++++++++++++------------
 fs/ubifs/dir.c    |    1 -
 fs/ubifs/file.c   |   20 +++++++--
 fs/ubifs/find.c   |   18 ++++-----
 fs/ubifs/gc.c     |    6 +++
 fs/ubifs/misc.h   |   49 ++++++++---------------
 fs/ubifs/super.c  |   19 +++++---
 fs/ubifs/tnc.c    |  116 ++++++++++++++++++++++++++++-------------------------
 fs/ubifs/ubifs.h  |   14 +++++-
 9 files changed, 217 insertions(+), 139 deletions(-)

These patches are pure fixes:
UBIFS: fix zero-length truncations
UBIFS: always read hashed-key nodes under TNC mutex
UBIFS: allow for racing between GC and TNC
UBIFS: fix assertion
UBIFS: add forgotten gc_idx_lebs component
UBIFS: remove incorrect index space check

These patches partially fix the problem reported in the MTD ML:
http://lists.infradead.org/pipermail/linux-mtd/2008-August/022579.html

UBIFS: do not update min_idx_lebs in stafs
UBIFS: push empty flash hack down
UBIFS: improve statfs reporting
UBIFS: introduce LEB overhead
UBIFS: improve statfs reporting even more

Thus, it should be fine to send them mainline at this late stage.
Unless there are complaints, I'll send a pull request soon.

Thanks.

-- 
Best regards,
Artem Bityutskiy (
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Always allow truncations to zero, even if budgeting thinks there
is no space. UBIFS reserves some space for deletions anyway.

Otherwise, the following happans:
1. create a file, and write as much as possible there, until ENOSPC
2. truncate the file, which fails with ENOSPC, which is not good.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/dir.c  |    1 -
 fs/ubifs/file.c |   20 ++++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 5c96f1f..2b267c9 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -587,7 +587,6 @@ static int ubifs_unlink(struct inode *dir, struct dentry *dentry)
 	if (err) {
 		if (err != -ENOSPC)
 			return err;
-		err = 0;
 		budgeted = 0;
 	}
 
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 4071d1c..3d698e2 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -793,7 +793,7 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
 	int err;
 	struct ubifs_budget_req req;
 	loff_t old_size = inode->i_size, new_size = attr->ia_size;
-	int offset = new_size & (UBIFS_BLOCK_SIZE - 1);
+	int offset = new_size & (UBIFS_BLOCK_SIZE - 1), budgeted = 1;
 	struct ubifs_inode *ui = ubifs_inode(inode);
 
 	dbg_gen("ino %lu, size %lld -> %lld", inode->i_ino, old_size, new_size);
@@ -811,8 +811,15 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
 	/* A funny way to budget for truncation node */
 	req.dirtied_ino_d = UBIFS_TRUN_NODE_SZ;
 	err = ubifs_budget_space(c, &req);
-	if (err)
-		return err;
+	if (err) {
+		/*
+		 * Treat truncations to zero as deletion and always allow them,
+		 * just like we do for '->unlink()'.
+		 */
+		if (new_size || err != -ENOSPC)
+			return err;
+		budgeted = 0;
+	}
 
 	err = vmtruncate(inode, new_size);
 	if (err)
@@ -869,7 +876,12 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode,
 	err = ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Adrian Hunter <ext-adrian.hunter@nokia.com>

Leaf-nodes that have a hashed key are stored in the
leaf-node-cache (LNC) which is protected by the TNC
mutex.  Consequently, when reading a leaf node with
a hashed key (i.e. directory entries, xattr entries)
the TNC mutex is always required.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
 fs/ubifs/tnc.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
index e909f4a..4fbc592 100644
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -1498,7 +1498,6 @@ static int do_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
 {
 	int found, n, err;
 	struct ubifs_znode *znode;
-	struct ubifs_zbranch zbr;
 
 	dbg_tnc("name '%.*s' key %s", nm->len, nm->name, DBGKEY(key));
 	mutex_lock(&c->tnc_mutex);
@@ -1522,11 +1521,7 @@ static int do_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
 		goto out_unlock;
 	}
 
-	zbr = znode->zbranch[n];
-	mutex_unlock(&c->tnc_mutex);
-
-	err = tnc_read_node_nm(c, &zbr, node);
-	return err;
+	err = tnc_read_node_nm(c, &znode->zbranch[n], node);
 
 out_unlock:
 	mutex_unlock(&c->tnc_mutex);
-- 
1.5.4.1

--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Adrian Hunter <ext-adrian.hunter@nokia.com>

The TNC mutex is unlocked prematurely when reading leaf nodes
with non-hashed keys.  This is unsafe because the node may be
moved by garbage collection and the eraseblock unmapped, although
that has never actually happened during stress testing.

This patch fixes the flaw by detecting the race and retrying with
the TNC mutex locked.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
 fs/ubifs/gc.c    |    6 +++
 fs/ubifs/misc.h  |   17 ++++++++
 fs/ubifs/tnc.c   |  109 +++++++++++++++++++++++++++++------------------------
 fs/ubifs/ubifs.h |    6 ++-
 4 files changed, 87 insertions(+), 51 deletions(-)

diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c
index d0f3dac..13f1019 100644
--- a/fs/ubifs/gc.c
+++ b/fs/ubifs/gc.c
@@ -344,6 +344,12 @@ int ubifs_garbage_collect_leb(struct ubifs_info *c, struct ubifs_lprops *lp)
 		if (err)
 			goto out;
 
+		/* Allow for races with TNC */
+		c->gced_lnum = lnum;
+		smp_wmb();
+		c->gc_seq += 1;
+		smp_wmb();
+
 		if (c->gc_lnum == -1) {
 			c->gc_lnum = lnum;
 			err = LEB_RETAINED;
diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h
index 87dabf9..87ced4c 100644
--- a/fs/ubifs/misc.h
+++ b/fs/ubifs/misc.h
@@ -325,4 +325,21 @@ static inline struct timespec ubifs_current_time(struct inode *inode)
 		current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
 }
 
+/**
+ * ubifs_tnc_lookup - look up a file-system node.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ *
+ * This function look up and reads node with key @key. The caller has to make
+ * sure the @node buffer is large enough to fit the node. Returns zero in case
+ * of success, %-ENOENT if the node was not found, and a negative error code in
+ * case of failure.
+ */
+static inline int ubifs_tnc_lookup(struct ubifs_info *c,
+				   const union ubifs_key *key, void *node)
+{
+	return ubifs_tnc_locate(c, key, node, NULL, NULL);
+}
+
 #endif /* ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

This is bad because the rest of the code should not depend on it,
and this may hide bugss, instead of revealing them.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/budget.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index 1540981..ac0d2e1 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -741,7 +741,6 @@ long long ubifs_budg_get_free_space(struct ubifs_info *c)
 
 	available = ubifs_calc_available(c, min_idx_lebs);
 	outstanding = c->budg_data_growth + c->budg_dd_growth;
-	c->min_idx_lebs = min_idx_lebs;
 	spin_unlock(&c->space_lock);
 
 	if (available > outstanding)
-- 
1.5.4.1

--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

We have a hack which forces the amount of flash space to be
equivalent to 'c->blocks_cnt' in case of empty FS. This is
to make users happy and see '%0' used in 'df' when they
mount an empty FS. This hack is not needed in
'ubifs_calc_available()', but it is only needed the caller,
in 'ubifs_budg_get_free_space()'. So push it down there.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/budget.c |   24 +++++++++++-------------
 fs/ubifs/super.c  |    2 --
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index ac0d2e1..f6d2eaa 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -302,18 +302,6 @@ long long ubifs_calc_available(const struct ubifs_info *c, int min_idx_lebs)
 	int subtract_lebs;
 	long long available;
 
-	/*
-	 * Force the amount available to the total size reported if the used
-	 * space is zero.
-	 */
-	if (c->lst.total_used <= UBIFS_INO_NODE_SZ &&
-	    c->budg_data_growth + c->budg_dd_growth == 0) {
-		/* Do the same calculation as for c->block_cnt */
-		available = c->main_lebs - 2;
-		available *= c->leb_size - c->dark_wm;
-		return available;
-	}
-
 	available = c->main_bytes - c->lst.total_used;
 
 	/*
@@ -739,8 +727,18 @@ long long ubifs_budg_get_free_space(struct ubifs_info *c)
 		return 0;
 	}
 
-	available = ubifs_calc_available(c, min_idx_lebs);
 	outstanding = c->budg_data_growth + c->budg_dd_growth;
+
+	/*
+	 * Force the amount available to the total size reported if the used
+	 * space is zero.
+	 */
+	if (c->lst.total_used <= UBIFS_INO_NODE_SZ && !outstanding) {
+		spin_unlock(&c->space_lock);
+		return (long long)c->block_cnt << UBIFS_BLOCK_SHIFT;
+	}
+
+	available = ubifs_calc_available(c, min_idx_lebs);
 	spin_unlock(&c->space_lock);
 
 	if (available > outstanding)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index f71e6b8..1018053 100644
--- a/fs/ubifs/super.c
+++ ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

When we report free space to user-space, we should not report
0 if the amount of empty LEBs is too low, because they would
be produced by GC when needed. Thus, just call
'ubifs_calc_available()' straight away which would take
'min_idx_lebs' into account anyway.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/budget.c |   15 +--------------
 1 files changed, 1 insertions(+), 14 deletions(-)

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index f6d2eaa..9ef630a 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -709,24 +709,11 @@ void ubifs_release_dirty_inode_budget(struct ubifs_info *c,
  */
 long long ubifs_budg_get_free_space(struct ubifs_info *c)
 {
-	int min_idx_lebs, rsvd_idx_lebs;
+	int min_idx_lebs;
 	long long available, outstanding, free;
 
-	/* Do exactly the same calculations as in 'do_budget_space()' */
 	spin_lock(&c->space_lock);
 	min_idx_lebs = ubifs_calc_min_idx_lebs(c);
-
-	if (min_idx_lebs > c->lst.idx_lebs)
-		rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
-	else
-		rsvd_idx_lebs = 0;
-
-	if (rsvd_idx_lebs > c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt
-				- c->lst.taken_empty_lebs) {
-		spin_unlock(&c->space_lock);
-		return 0;
-	}
-
 	outstanding = c->budg_data_growth + c->budg_dd_growth;
 
 	/*
-- 
1.5.4.1

--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Make free space calculation less pessimistic and more realistic,
which in turn improves 'statfs()' reports. Now it lies by 10%-20%,
instead of 20%-30% (10% more honest).

Results of "freespace" test (120MiB volume, 16KiB LEB size,
512 bytes page size). Before the change:

freespace: Test 1: fill the space we have 3 times
freespace: was free: 78274560 bytes 74.6 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 18214912 bytes 17.4 MiB, wrote 23.3% more than predicted
freespace: was free: 76754944 bytes 73.2 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 19738624 bytes 18.8 MiB, wrote 25.7% more than predicted
freespace: was free: 76759040 bytes 73.2 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 19730432 bytes 18.8 MiB, wrote 25.7% more than predicted
freespace: Test 1 finished

freespace: Test 2: gradually lessen amount of free space and fill the FS
freespace: do 10 steps, lessen free space by 6977722 bytes 6.7 MiB each time
freespace: was free: 72273920 bytes 68.9 MiB, wrote: 88891392 bytes 84.8 MiB, delta: 16617472 bytes 15.8 MiB, wrote 23.0% more than predicted
freespace: was free: 66154496 bytes 63.1 MiB, wrote: 81506304 bytes 77.7 MiB, delta: 15351808 bytes 14.6 MiB, wrote 23.2% more than predicted
freespace: was free: 58732544 bytes 56.0 MiB, wrote: 72572928 bytes 69.2 MiB, delta: 13840384 bytes 13.2 MiB, wrote 23.6% more than predicted
freespace: was free: 51552256 bytes 49.2 MiB, wrote: 63754240 bytes 60.8 MiB, delta: 12201984 bytes 11.6 MiB, wrote 23.7% more than predicted
freespace: was free: 44404736 bytes 42.3 MiB, wrote: 54943744 bytes 52.4 MiB, delta: 10539008 bytes 10.1 MiB, wrote 23.7% more than predicted
freespace: was free: 37285888 bytes 35.6 MiB, wrote: 46161920 bytes 44.0 MiB, delta: 8876032 bytes 8.5 MiB, wrote 23.8% more than predicted
freespace: was free: 30171136 bytes 28.8 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 7213056 bytes 6.9 MiB, wrote 23.9% more than predicted
freespace: was free: 23048192 bytes 22.0 MiB, ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

The assertion was incorrect, because it did not take into
account free space.

This patch also amends the comments correspondingly, and
cleans them up a little.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/find.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c
index adee7b5..9fc55ae 100644
--- a/fs/ubifs/find.c
+++ b/fs/ubifs/find.c
@@ -211,14 +211,8 @@ static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c,
  * dirty index heap, and it falls-back to LPT scanning if the heaps are empty
  * or do not have an LEB which satisfies the @min_space criteria.
  *
- * Note:
- *   o LEBs which have less than dead watermark of dirty space are never picked
- *   by this function;
- *
- * Returns zero and the LEB properties of
- * found dirty LEB in case of success, %-ENOSPC if no dirty LEB was found and a
- * negative error code in case of other failures. The returned LEB is marked as
- * "taken".
+ * Note, LEBs which have less than dead watermark of free + dirty space are
+ * never picked by this function.
  *
  * The additional @pick_free argument controls if this function has to return a
  * free or freeable LEB if one is present. For example, GC must to set it to %1,
@@ -231,6 +225,10 @@ static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c,
  *
  * In addition @pick_free is set to %2 by the recovery process in order to
  * recover gc_lnum in which case an index LEB must not be returned.
+ *
+ * This function returns zero and the LEB properties of found dirty LEB in case
+ * of success, %-ENOSPC if no dirty LEB was found and a negative error code in
+ * case of other failures. The returned LEB is marked as "taken".
  */
 int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp,
 			 int min_space, int pick_free)
@@ -317,7 +315,7 @@ int ubifs_find_dirty_leb(struct ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

We add this component at other similar places, but not in this
one.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/find.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c
index 9fc55ae..e045c8b 100644
--- a/fs/ubifs/find.c
+++ b/fs/ubifs/find.c
@@ -243,7 +243,7 @@ int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp,
 		int lebs, rsvd_idx_lebs = 0;
 
 		spin_lock(&c->space_lock);
-		lebs = c->lst.empty_lebs;
+		lebs = c->lst.empty_lebs + c->idx_gc_cnt;
 		lebs += c->freeable_cnt - c->lst.taken_empty_lebs;
 
 		/*
-- 
1.5.4.1

--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

This is a preparational patch for the following statfs()
report fix.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |    6 ++++++
 fs/ubifs/ubifs.h |    5 +++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 1018053..be23fd3 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -530,6 +530,12 @@ static int init_constants_early(struct ubifs_info *c)
 	c->dead_wm = ALIGN(MIN_WRITE_SZ, c->min_io_size);
 	c->dark_wm = ALIGN(UBIFS_MAX_NODE_SZ, c->min_io_size);
 
+	/*
+	 * Calculate how many bytes would be wasted at the end of LEB if it was
+	 * fully filled with data nodes of maximum size. This is used in
+	 * calculations when reporting free space.
+	 */
+	c->leb_overhead = c->leb_size % UBIFS_MAX_DATA_NODE_SZ;
 	return 0;
 }
 
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 681d46e..57e5854 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -995,6 +995,9 @@ struct ubifs_mount_opts {
  * @max_idx_node_sz: maximum indexing node aligned on 8-bytes boundary
  * @max_inode_sz: maximum possible inode size in bytes
  * @max_znode_sz: size of znode in bytes
+ *
+ * @leb_overhead: how many bytes are wasted in an LEB when it is filled with
+ *                data nodes of maximum size - used in free space reporting
  * @dead_wm: LEB dead space watermark
  * @dark_wm: LEB dark space watermark
  * @block_cnt: count of 4KiB blocks on the FS
@@ -1226,6 +1229,8 @@ struct ubifs_info {
 	int max_idx_node_sz;
 	long long max_inode_sz;
 	int max_znode_sz;
+
+	int leb_overhead;
 	int dead_wm;
 	int dark_wm;
 	int block_cnt;
-- 
1.5.4.1

--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Since free space we report in statfs is file size which should
fit to the FS - change the way we calculate free space and use
leb_overhead instead of dark_wm in calculations.

Results of "freespace" test (120MiB volume, 16KiB LEB size,
512 bytes page size). Before the change:

freespace: Test 1: fill the space we have 3 times
freespace: was free: 85204992 bytes 81.3 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 11284480 bytes 10.8 MiB, wrote 13.2% more than predicted
freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 12935168 bytes 12.3 MiB, wrote 15.5% more than predicted
freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 12939264 bytes 12.3 MiB, wrote 15.5% more than predicted
freespace: Test 1 finished

freespace: Test 2: gradually lessen amount of free space and fill the FS
freespace: do 10 steps, lessen free space by 7596218 bytes 7.2 MiB each time
freespace: was free: 78675968 bytes 75.0 MiB, wrote: 88903680 bytes 84.8 MiB, delta: 10227712 bytes 9.8 MiB, wrote 13.0% more than predicted
freespace: was free: 72015872 bytes 68.7 MiB, wrote: 81514496 bytes 77.7 MiB, delta: 9498624 bytes 9.1 MiB, wrote 13.2% more than predicted
freespace: was free: 63938560 bytes 61.0 MiB, wrote: 72589312 bytes 69.2 MiB, delta: 8650752 bytes 8.2 MiB, wrote 13.5% more than predicted
freespace: was free: 56127488 bytes 53.5 MiB, wrote: 63762432 bytes 60.8 MiB, delta: 7634944 bytes 7.3 MiB, wrote 13.6% more than predicted
freespace: was free: 48336896 bytes 46.1 MiB, wrote: 54935552 bytes 52.4 MiB, delta: 6598656 bytes 6.3 MiB, wrote 13.7% more than predicted
freespace: was free: 40587264 bytes 38.7 MiB, wrote: 46157824 bytes 44.0 MiB, delta: 5570560 bytes 5.3 MiB, wrote 13.7% more than predicted
freespace: was free: 32841728 bytes 31.3 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 4542464 bytes 4.3 MiB, wrote 13.8% more than predicted
freespace: was free: 25100288 bytes 23.9 MiB, wrote: ...
From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:52 am

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBIFS stores 16-bit UUID in the superblock, and it is a good
idea to return part of it in 'f_fsid' filed of kstatfs structure.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 1207bd5..0dee404 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -386,6 +386,7 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_files = 0;
 	buf->f_ffree = 0;
 	buf->f_namelen = UBIFS_MAX_NLEN;
+	memcpy(&buf->f_fsid, c->uuid, sizeof(__kernel_fsid_t));
 
 	return 0;
 }
-- 
1.5.4.1

--

From: David Woodhouse
Date: Monday, September 1, 2008 - 2:43 am

For btrfs I xor the first 64 bits with the second 64 bits, and put
_that_ into f_fsid. You're just putting the first 64 bits in and
ignoring the second 64 bits. Neither is really _better_ than the other;
you just alter the circumstances in which you get collisions. But I
suppose we might as well be consistent about how we do it?

(Actually I XOR in the root object ID for the subvol too, but you don't
need to worry about that).

Alternatively, there's space in the struct statfs to export a couple
more uint32_ts, and thus the _whole_ of the uuid. Perhaps we should do
that?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Monday, September 1, 2008 - 4:16 am

Well, xor-ing should not make the random UUID more random, but for

You mean f_spare? I do not think any application would use it so I doubt
it is good idea to use it.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

--

From: David Woodhouse
Date: Monday, September 1, 2008 - 4:28 am

Well, it wouldn't be called f_spare any more, and applications would
know that it contains the rest of the UUID if it's non-zero.

I'd fix nfsd to fix it immediately (like as I just fixed nfsd to use the
first 64 bits of f_fsid when it's non-zero).

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Monday, September 1, 2008 - 4:43 am

May be, but there are only 5 bytes left. But why would you need more
UUID bytes?

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

--

From: David Woodhouse
Date: Monday, September 1, 2008 - 4:50 am

People are mostly using 16-byte UUIDs. We only have space to expose 8
bytes. It just seems to make sense to expose all 16.

But there's no overriding reason to do it, really.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Monday, September 1, 2008 - 4:56 am

Yeah, I would not go for it unless there is a specific problem
which should be solved with this.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

From: Christoph Hellwig
Date: Monday, September 1, 2008 - 8:01 am

XFS just puts in the st_dev.  And I can't realy find any useful
defintion of what it's supposed to b anyway..

--

From: Artem Bityutskiy
Date: Tuesday, September 2, 2008 - 12:03 am

For me this means that we should rather do what XFS does for
consistency then.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

--

From: Andreas Dilger
Date: Tuesday, September 2, 2008 - 10:09 am

The fsid is supposed to be a persistent, unique identifier for the
filesystem, used by NFS in file handles.  Using st_dev is unsafe,
because that may change from one server boot to the next, because
of device probing order, driver changes, etc.  Also, not all filesystems
HAVE a valid st_dev in the first place, which is the whole reason
for this thread.

I think a ->get_fsid() export method would be preferable.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--

From: David Woodhouse
Date: Tuesday, September 2, 2008 - 10:29 am

I implemented that, but it doesn't really work. The fsid->export mapping
happens in userspace, so mountd needs access to the fsid. So the mount
would work fine and return a FH with the appropriate fsid, and then
mountd would have no knowledge of how to handle that FH, and mount(8) on
the client would eventually time out and fail.

It worked if I prepopulated the nfsd.fh cache in expkey_parse() so we
didn't end up asking userspace to resolve that FH for us -- but that was
just a quick hack. It wouldn't have worked after a server reboot, for
example -- we'd have asked userspace again.

We'd need to export that fsid to userspace somehow. I did briefly think
about adding something like ',uuid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' to
each line in /proc/mounts, as if it was a file system option -- but I
don't like that much.

When looking for other options, I realised that we already have the
f_fsid field in struct statfs, and we might as well just use that. That
does seem to be what it was _designed_ for, after all.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Andreas Dilger
Date: Tuesday, September 2, 2008 - 12:13 pm

My bad - I had looked back in email archives about this issue and got
the export operation solution stuck in my head...

Yes, using f_fsid is the right way to do this, assuming NFSD plays along.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--

From: David Woodhouse
Date: Tuesday, September 2, 2008 - 12:32 pm

It does now, if it can't find a UUID by other means.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Christoph Hellwig
Date: Tuesday, September 2, 2008 - 2:02 pm

Umm, different things.  f_fsid in stat(v)fs is just a cookie exported to
userspac that has never really been documented.

We also called the filesystem part of the NFS filehandle in a few
places, and for those it's correct that it should be stable.  Currently
the fsid is either created from the dev_t in kernelspace or from
uuids extracted through libuuid in userspace.

I can't see anything in the message that started this thread that
mentions NFS, btw.
--

From: David Woodhouse
Date: Tuesday, September 2, 2008 - 2:26 pm

That was Artem's motivation for filling in f_fsid in the first place.
Does anything else even _care_ about f_fsid? 

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Christoph Hellwig
Date: Tuesday, September 2, 2008 - 2:48 pm

I think that's a rather bad idea.  As mentioned before f_fsid is
basically random.

--

From: David Woodhouse
Date: Tuesday, September 2, 2008 - 3:12 pm

It's not that bad. Of the file systems which actually fill it in...

For NTFS it's the volume serial number.

For ext[234] it's the UUID xor-folded into 64 bits.
For btrfs the same, but with the root object ID xor'd in too.

For bfs and xfs it's the block device -- which isn't ideal, as Andreas
points out, but is what we'd fall back to anyway.

BFS seems to have a s_volume field which perhaps we could use, but we
should check for it being all spaces and leave f_fsid as zero in that
case. Presumably XFS could be using sb_uuid?

For efs, it'll be either EFS_MAGIC or EFS_NEWMAGIC, which is probably a
bad thing if you want to export multiple EFS file systems. I'll send a
patch to fix that (by leaving it zero) shortly.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Tuesday, September 2, 2008 - 11:20 pm

Yes, right. I am planning NFS support and would like to have UBIFS

I do not know anything but NFS.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

--

From: David Woodhouse
Date: Tuesday, September 2, 2008 - 3:32 pm

Our man page for statfs(2) says...

 The f_fsid field

   Solaris,  Irix  and  POSIX have a system call statvfs(2) that returns a
   struct statvfs (defined in <sys/statvfs.h>) containing an unsigned long
   f_fsid.   Linux,  SunOS, HP-UX, 4.4BSD have a system call statfs() that
   returns a struct statfs (defined in <sys/vfs.h>)  containing  a  fsid_t
   f_fsid,  where  fsid_t  is defined as struct { int val[2]; }.  The same
   holds for FreeBSD, except that it uses the include file  <sys/mount.h>.

   The  general  idea  is that f_fsid contains some random stuff such that
   the pair (f_fsid,ino) uniquely determines a file.   Some  OSes  use  (a
   variation on) the device number, or the device number combined with the
   filesystem type.  Several OSes restrict giving out the f_fsid field  to
   the  superuser  only (and zero it for unprivileged users), because this
   field is used in the filehandle of the  filesystem  when  NFS-exported,
   and giving it out is a security concern.

   Under some OSes the fsid can be used as second parameter to the sysfs()
   system call.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Wednesday, September 3, 2008 - 2:44 am

Like this?

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Wed, 3 Sep 2008 14:16:42 +0300
Subject: [PATCH] UBIFS: amend f_fsid

David Woodhouse suggested to be consistent with other FSes
and xor the beginning and the end of the UUID.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index ab9b5cb..83fe0a5 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -370,6 +370,7 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct ubifs_info *c = dentry->d_sb->s_fs_info;
 	unsigned long long free;
+	__le32 *uuid = (__le32 *)c->uuid;
 
 	free = ubifs_get_free_space(c);
 	dbg_gen("free space %lld bytes (%lld blocks)",
@@ -386,8 +387,8 @@ static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_files = 0;
 	buf->f_ffree = 0;
 	buf->f_namelen = UBIFS_MAX_NLEN;
-	memcpy(&buf->f_fsid, c->uuid, sizeof(__kernel_fsid_t));
-
+	buf->f_fsid.val[0] = le32_to_cpu(uuid[0]) ^ le32_to_cpu(uuid[2]);
+	buf->f_fsid.val[1] = le32_to_cpu(uuid[1]) ^ le32_to_cpu(uuid[3]);
 	return 0;
 }
 
-- 
1.5.4.1


--

From: David Woodhouse
Date: Wednesday, September 3, 2008 - 3:14 am

Yeah, that's what ext[234] and btrfs do.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



--

From: Artem Bityutskiy
Date: Sunday, August 31, 2008 - 7:32 am

Oh, meant to send this to linux-fsdevel@vger.kernel.org, not to
linux-fsdevel-owner@vger.kernel.org. Apologies.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

Previous thread: 2.6.26 (x86 vs. x86_64) different /sys directory structure for md? by Justin Piszcz on Sunday, August 31, 2008 - 3:56 am. (1 message)

Next thread: ACPI PnP on Intel MU440EX by Martin Doucha on Sunday, August 31, 2008 - 6:22 am. (10 messages)