Re: [PATCH 02/27] nilfs2: disk format and userland interface

Previous thread: none

Next thread: [PATCH] Fix section for snd-aica platform driver by Uwe Kleine-König on Sunday, September 14, 2008 - 12:32 pm. (2 messages)
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:07 pm

This is a new patch set for the nilfs2 file system.

The first submission is found at:

 http://marc.info/?l=linux-fsdevel&m=121920195516073

The old patch was not divided, and this time I divided it into 27
patches:

 The patch #1 adds a document to Documentation/filesystems/
 The patch #2 adds a header file of the disk format to include/linux/
 The patches #2-#26 adds nilfs2 source files to fs/nilfs2/
 and the patch #27 updates Makefile and Kconfig.

This patch set also includes some cleanups and improvements.  The main
changes from the previous patch are as follows:

 * Use the standard mm/ cache instead of uniquely implemented page
   cache to hold B-tree node buffers; the peculiar page cache was
   removed by this.

 * Integrate two similar allocators found in DAT and inode file.

 * Read requests for GC blocks are now submitted in parallel to
   mitigate GC overhead.

More than 2,000 lines are reduced by the cleanups.


The patch set is available from

 http://www.nilfs.org/pub/patch/nilfs2-2.6.27-rc6/

If you like a git tree:

 http://git.nilfs.org/nilfs2-2.6.git nilfs2
 ( gitweb: http://www.nilfs.org/git/?p=nilfs2-2.6.git )

The userland tools are included in the nilfs-utils package
which is available on

 http://www.nilfs.org/en/download.html


Example:
In this example, /dev/sdb1 is used as a nilfs2 partition.

- To use nilfs2 as a local file system, simply:

 # mkfs -t nilfs2 /dev/sdb1
 # mount -t nilfs2 /dev/sdb1 /dir

 This will also invoke the cleaner through the mount helper program
 (mount.nilfs2).

- Checkpoints and snapshots are managed by the following commands.
 Their manpages are included in the nilfs-utils package.

  lscp     list checkpoints or snapshots.
  mkcp     make a checkpoint or a snapshot.
  chcp     change an existing checkpoint to a snapshot or vice versa.
  rmcp     invalidate specified checkpoint(s).

 For example,

 # lscp /dev/sdb1

 will list checkpoints on the device.
 The block device argument is ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:07 pm

This adds a document describing the features, mount options, userland
tools, usage, disk format, and related URLs for the nilfs2 file
system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 Documentation/filesystems/00-INDEX   |    2 +
 Documentation/filesystems/nilfs2.txt |  202 ++++++++++++++++++++++++++++++++++
 2 files changed, 204 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/nilfs2.txt

diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 52cd611..8dd6db7 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -68,6 +68,8 @@ ncpfs.txt
 	- info on Novell Netware(tm) filesystem using NCP protocol.
 nfsroot.txt
 	- short guide on setting up a diskless box with NFS root filesystem.
+nilfs2.txt
+	- info and mount options for the NILFS2 filesystem.
 ntfs.txt
 	- info and mount options for the NTFS filesystem (Windows NT).
 ocfs2.txt
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
new file mode 100644
index 0000000..f323b01
--- /dev/null
+++ b/Documentation/filesystems/nilfs2.txt
@@ -0,0 +1,202 @@
+NILFS2
+------
+
+NILFS2 is a log-structured file system (LFS) supporting continuous
+snapshotting.  In addition to versioning capability of the entire file
+system, users can even restore files mistakenly overwritten or
+destroyed just a few seconds ago.  Since NILFS2 can keep consistency
+like conventional LFS, it achieves quick recovery after system
+crashes.
+
+NILFS2 creates a number of checkpoints every few seconds or per
+synchronous write basis (unless there is no change).  Users can select
+significant versions among continuously created checkpoints, and can
+change them into snapshots which will be preserved until they are
+changed back to checkpoints.
+
+There is no limit on the number of snapshots until the volume gets
+full.  Each snapshot is mountable as a read-only file system
+concurrently ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:07 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds a header file which specifies the on-disk format and ioctl
interface of the nilfs2 file system.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 include/linux/nilfs2_fs.h |  854 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 854 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/nilfs2_fs.h

diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
new file mode 100644
index 0000000..e38fad2
--- /dev/null
+++ b/include/linux/nilfs2_fs.h
@@ -0,0 +1,854 @@
+/*
+ * nilfs2_fs.h - NILFS2 on-disk structures and common declarations.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>
+ *            Ryusuke Konishi <ryusuke@osrg.net>
+ */
+/*
+ *  linux/include/linux/ext2_fs.h
+ *
+ * Copyright (C) 1992, 1993, 1994, 1995
+ * Remy Card (card@masi.ibp.fr)
+ * Laboratoire MASI - Institut Blaise Pascal
+ * Universite Pierre et Marie Curie (Paris VI)
+ *
+ *  from
+ *
+ *  linux/include/linux/minix_fs.h
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ */
+
+#ifndef _LINUX_NILFS_FS_H
+#define ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds the following common structures of the NILFS2 file system.

* nilfs_inode_info structure:
  gives on-memory inode.

* nilfs_sb_info structure:
  keeps per-mount state and a special inode for the ifile.
  This structure is attached to the super_block structure.

* the_nilfs structure:
  keeps shared state and locks among a read/write mount and snapshot
  mounts.  This keeps special inodes for the sufile, cpfile, dat, and
  another dat inode used during GC (gcdat).  This also has a hash table
  of dummy inodes to cache disk blocks during GC (gcinodes).

* nilfs_transaction_info structure:
  keeps per task state while nilfs is writing logs or doing indivisible
  inode or namespace operations.  This structure is used to identify
  context during log making and store nest level of the lock which
  ensures atomicity of file system operations.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/nilfs.h     |  323 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/sb.h        |  102 ++++++++++++++++
 fs/nilfs2/the_nilfs.h |  290 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 715 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/nilfs.h
 create mode 100644 fs/nilfs2/sb.h
 create mode 100644 fs/nilfs2/the_nilfs.h

diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
new file mode 100644
index 0000000..c33b8db
--- /dev/null
+++ b/fs/nilfs2/nilfs.h
@@ -0,0 +1,323 @@
+/*
+ * nilfs.h - NILFS local header file.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds structures and operations for the block mapping
(bmap for short). NILFS2 uses direct mappings for short files or
B-tree based mappings for longer files.

Every on-disk data block is held with inodes and managed through this
block mapping.  The nilfs_bmap structure and a set of functions here
provide this capability to the NILFS2 inode.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/bmap.c       |  783 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/bmap.h       |  251 ++++++++++++++++
 fs/nilfs2/bmap_union.h |   42 +++
 3 files changed, 1076 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/bmap.c
 create mode 100644 fs/nilfs2/bmap.h
 create mode 100644 fs/nilfs2/bmap_union.h

diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
new file mode 100644
index 0000000..4c6b481
--- /dev/null
+++ b/fs/nilfs2/bmap.c
@@ -0,0 +1,783 @@
+/*
+ * bmap.c - NILFS block mapping.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds declarations and functions of NILFS2 B-tree.

Two variants are integrated in the NILFS2 B-tree.  The B-tree for the
most files points to the child nodes or data blocks with virtual block
addresses, whereas the B-tree of the DAT uses actual block addresses.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/btree.c | 2276 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/btree.h |  117 +++
 2 files changed, 2393 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/btree.c
 create mode 100644 fs/nilfs2/btree.h

diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
new file mode 100644
index 0000000..893f019
--- /dev/null
+++ b/fs/nilfs2/btree.c
@@ -0,0 +1,2276 @@
+/*
+ * btree.c - NILFS B-tree.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/pagevec.h>
+#include "nilfs.h"
+#include "page.h"
+#include "btnode.h"
+#include "btree.h"
+#include "alloc.h"
+
+/**
+ * struct nilfs_btree_path - A path on which B-tree ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds block mappings using direct pointers which are stored in the
i_bmap array of inode.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/direct.c |  429 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/direct.h |   78 ++++++++++
 2 files changed, 507 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/direct.c
 create mode 100644 fs/nilfs2/direct.h

diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c
new file mode 100644
index 0000000..303d7f1
--- /dev/null
+++ b/fs/nilfs2/direct.c
@@ -0,0 +1,429 @@
+/*
+ * direct.c - NILFS direct block pointer.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/errno.h>
+#include "nilfs.h"
+#include "page.h"
+#include "direct.h"
+#include "alloc.h"
+
+static inline __le64 *nilfs_direct_dptrs(const struct nilfs_direct *direct)
+{
+	return (__le64 *)
+		((struct nilfs_direct_node *)direct->d_bmap.b_u.u_data + 1);
+}
+
+static inline __u64
+nilfs_direct_get_ptr(const struct nilfs_direct *direct, __u64 key)
+{
+	return ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds routines for B-tree node buffers.

Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/btnode.c |  316 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/btnode.h |   58 ++++++++++
 2 files changed, 374 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/btnode.c
 create mode 100644 fs/nilfs2/btnode.h

diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
new file mode 100644
index 0000000..4cc07b2
--- /dev/null
+++ b/fs/nilfs2/btnode.c
@@ -0,0 +1,316 @@
+/*
+ * btnode.c - NILFS B-tree node cache
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * This file was originally written by Seiji Kihara <kihara@osrg.net>
+ * and fully revised by Ryusuke Konishi <ryusuke@osrg.net> for
+ * stabilization and simplification.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/buffer_head.h>
+#include <linux/mm.h>
+#include <linux/backing-dev.h>
+#include "nilfs.h"
+#include "mdt.h"
+#include "dat.h"
+#include "page.h"
+#include "btnode.h"
+
+
+void nilfs_btnode_cache_init_once(struct address_space *btnc)
+{
+	INIT_RADIX_TREE(&btnc->page_tree, ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds common routines for buffer/page operations used in B-tree
node caches, meta data files, or segment constructor (log writer).

NILFS uses copy functions for buffers and pages due to the following
reasons:

 1) Relocation required for COW
    Since NILFS changes address of on-disk blocks, moving buffers
    in page cache is needed for the buffers which are not addressed
    by a file offset.  If buffer size is smaller than page size,
    this involves partial copy of pages.

 2) Freezing mmapped pages
    NILFS calculates checksums for each log to ensure its validity.
    If page data changes after the checksum calculation, this validity
    check will not work correctly.  To avoid this failure for mmaped
    pages, NILFS freezes their data by copying.

 3) Copy-on-write for DAT pages
    NILFS makes clones of DAT page caches in a copy-on-write manner
    during GC processes, and this ensures atomicity and consistency
    of the DAT in the transient state.

In addition, NILFS uses two obsolete functions, nilfs_mark_buffer_dirty()
and nilfs_clear_page_dirty() respectively.

* nilfs_mark_buffer_dirty() was required to avoid NULL pointer
  dereference faults:

  Since the page cache of B-tree node pages or data page cache of pseudo
  inodes does not have a valid mapping->host, calling mark_buffer_dirty()
  for their buffers causes the fault; it calls __mark_inode_dirty(NULL)
  through __set_page_dirty().

* nilfs_clear_page_dirty() was needed in the two cases:

 1) For B-tree node pages and data pages of the dat/gcdat, NILFS2 clears
    page dirty flags when it copies back pages from the cloned cache
    (gcdat->{i_mapping,i_btnode_cache}) to its original cache
    (dat->{i_mapping,i_btnode_cache}).

 2) Some B-tree operations like insertion or deletion may dispose buffers
    in dirty state, and this needs to cancel the dirty state of their
    pages.  clear_page_dirty_for_io() caused faults because it does not
    clear the dirty tag on the page ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds the meta data file, which serves common buffer
functions to the DAT, sufile, cpfile, ifile, and so forth.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/mdt.c |  562 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/mdt.h |  125 ++++++++++++
 2 files changed, 687 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/mdt.c
 create mode 100644 fs/nilfs2/mdt.h

diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
new file mode 100644
index 0000000..6ab8475
--- /dev/null
+++ b/fs/nilfs2/mdt.c
@@ -0,0 +1,562 @@
+/*
+ * mdt.c - meta data file for NILFS
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/mm.h>
+#include <linux/writeback.h>
+#include <linux/backing-dev.h>
+#include <linux/swap.h>
+#include "nilfs.h"
+#include "segment.h"
+#include "page.h"
+#include "mdt.h"
+
+
+#define NILFS_MDT_MAX_RA_BLOCKS		(16 - 1)
+
+#define INIT_UNUSED_INODE_FIELDS
+
+static int
+nilfs_mdt_insert_new_block(struct inode *inode, unsigned long block,
+			   struct buffer_head *bh,
+			   void (*init_block)(struct inode *,
+					      struct ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds common functions to allocate or deallocate entries with
bitmaps on a meta data file.  This feature is used by the DAT and
ifile.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
 fs/nilfs2/alloc.c |  504 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/alloc.h |   72 ++++++++
 2 files changed, 576 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/alloc.c
 create mode 100644 fs/nilfs2/alloc.h

diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
new file mode 100644
index 0000000..d69e6ae
--- /dev/null
+++ b/fs/nilfs2/alloc.c
@@ -0,0 +1,504 @@
+/*
+ * alloc.c - NILFS dat/inode allocator
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Original code was written by Koji Sato <koji@osrg.net>.
+ * Two allocators were unified by Ryusuke Konishi <ryusuke@osrg.net>,
+ *                                Amagai Yoshiji <amagai@osrg.net>.
+ */
+
+#include <linux/types.h>
+#include <linux/buffer_head.h>
+#include <linux/fs.h>
+#include <linux/bitops.h>
+#include "mdt.h"
+#include "alloc.h"
+
+
+static inline unsigned ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds the disk address translation file (DAT) whose primary
function is to convert virtual disk block numbers to actual disk block
numbers.

The virtual block numbers of NILFS are associated with checkpoint
generation numbers, and this file also provides functions to manage
the lifetime information of each virtual block number.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/dat.c |  429 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/dat.h |   52 +++++++
 2 files changed, 481 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/dat.c
 create mode 100644 fs/nilfs2/dat.h

diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
new file mode 100644
index 0000000..9360920
--- /dev/null
+++ b/fs/nilfs2/dat.c
@@ -0,0 +1,429 @@
+/*
+ * dat.c - NILFS disk address translation.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/types.h>
+#include <linux/buffer_head.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include "nilfs.h"
+#include "mdt.h"
+#include "alloc.h"
+#include "dat.h"
+
+
+#define ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds a meta data file which stores on-disk inodes in its data
blocks.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
 fs/nilfs2/ifile.c |  150 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/ifile.h |   53 +++++++++++++++++++
 2 files changed, 203 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/ifile.c
 create mode 100644 fs/nilfs2/ifile.h

diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
new file mode 100644
index 0000000..de86401
--- /dev/null
+++ b/fs/nilfs2/ifile.c
@@ -0,0 +1,150 @@
+/*
+ * ifile.c - NILFS inode file
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Amagai Yoshiji <amagai@osrg.net>.
+ * Revised by Ryusuke Konishi <ryusuke@osrg.net>.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/buffer_head.h>
+#include "nilfs.h"
+#include "mdt.h"
+#include "alloc.h"
+#include "ifile.h"
+
+/**
+ * nilfs_ifile_create_inode - create a new disk inode
+ * @ifile: ifile inode
+ * @out_ino: pointer to a variable to store inode number
+ * @out_bh: buffer_head contains newly allocated disk inode
+ *
+ * Return Value: On success, 0 is returned and the newly allocated inode
+ ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds a meta data file which holds checkpoint entries in its data
blocks.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/cpfile.c |  908 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/cpfile.h |   45 +++
 2 files changed, 953 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/cpfile.c
 create mode 100644 fs/nilfs2/cpfile.h

diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
new file mode 100644
index 0000000..991633a
--- /dev/null
+++ b/fs/nilfs2/cpfile.c
@@ -0,0 +1,908 @@
+/*
+ * cpfile.c - NILFS checkpoint file.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/buffer_head.h>
+#include <linux/errno.h>
+#include <linux/nilfs2_fs.h>
+#include "mdt.h"
+#include "cpfile.h"
+
+
+static inline unsigned long
+nilfs_cpfile_checkpoints_per_block(const struct inode *cpfile)
+{
+	return NILFS_MDT(cpfile)->mi_entries_per_block;
+}
+
+/* block number from the beginning of the file */
+static unsigned ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds a meta data file which stores the allocation state of
segments.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/sufile.c |  627 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/sufile.h |   54 +++++
 2 files changed, 681 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/sufile.c
 create mode 100644 fs/nilfs2/sufile.h

diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
new file mode 100644
index 0000000..d672924
--- /dev/null
+++ b/fs/nilfs2/sufile.c
@@ -0,0 +1,627 @@
+/*
+ * sufile.c - NILFS segment usage file.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/buffer_head.h>
+#include <linux/errno.h>
+#include <linux/nilfs2_fs.h>
+#include "mdt.h"
+#include "sufile.h"
+
+
+static inline unsigned long
+nilfs_sufile_segment_usages_per_block(const struct inode *sufile)
+{
+	return NILFS_MDT(sufile)->mi_entries_per_block;
+}
+
+static unsigned long
+nilfs_sufile_get_blkoff(const struct inode *sufile, __u64 ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds inode level operations of the nilfs2 file system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/inode.c |  819 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/inode.c

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
new file mode 100644
index 0000000..4a526bc
--- /dev/null
+++ b/fs/nilfs2/inode.c
@@ -0,0 +1,819 @@
+/*
+ * inode.c - NILFS inode operations.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/writeback.h>
+#include "nilfs.h"
+#include "segment.h"
+#include "page.h"
+#include "mdt.h"
+#include "cpfile.h"
+#include "ifile.h"
+
+
+/**
+ * nilfs_get_block() - get a file block on the filesystem (callback function)
+ * @inode - inode struct of the target file
+ * @blkoff - file block number
+ * @bh_result - buffer head to be mapped on
+ * @create - indicate whether allocating the block or not when it has not
+ *      been allocated yet.
+ *
+ * This function does not issue actual read request of the specified data
+ * block. It is done ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds primitives for regular file handling.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/file.c |  125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 125 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/file.c

diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
new file mode 100644
index 0000000..7ddd42e
--- /dev/null
+++ b/fs/nilfs2/file.c
@@ -0,0 +1,125 @@
+/*
+ * file.c - NILFS regular file handling primitives including fsync().
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Amagai Yoshiji <amagai@osrg.net>,
+ *            Ryusuke Konishi <ryusuke@osrg.net>
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/writeback.h>
+#include "nilfs.h"
+#include "segment.h"
+
+int nilfs_sync_file(struct file *file, struct dentry *dentry, int datasync)
+{
+	/*
+	 * Called from fsync() system call
+	 * This is the only entry point that can catch write and synch
+	 * timing for both data blocks and intermediate blocks.
+	 *
+	 * This function should be implemented when the writeback function
+	 * will be implemented.
+	 */
+	struct inode *inode = dentry->d_inode;
+	int err;
+
+	if ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>

This adds directory handling functions, most of which comes from the
ext2 file system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
 fs/nilfs2/dir.c |  711 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 711 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/dir.c

diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
new file mode 100644
index 0000000..1b7e6dd
--- /dev/null
+++ b/fs/nilfs2/dir.c
@@ -0,0 +1,711 @@
+/*
+ * dir.c - NILFS directory entry operations
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Modified for NILFS by Amagai Yoshiji <amagai@osrg.net>
+ */
+/*
+ *  linux/fs/ext2/dir.c
+ *
+ * Copyright (C) 1992, 1993, 1994, 1995
+ * Remy Card (card@masi.ibp.fr)
+ * Laboratoire MASI - Institut Blaise Pascal
+ * Universite Pierre et Marie Curie (Paris VI)
+ *
+ *  from
+ *
+ *  linux/fs/minix/dir.c
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ *  ext2 directory handling functions
+ *
+ *  Big-endian to little-endian byte-swapping/bitmaps by
+ *        David S. Miller (davem@caip.rutgers.edu), 1995
+ *
+ * All code that ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds pathname operations, most of which comes from the ext2 file
system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/namei.c |  452 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 452 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/namei.c

diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
new file mode 100644
index 0000000..daf382c
--- /dev/null
+++ b/fs/nilfs2/namei.c
@@ -0,0 +1,452 @@
+/*
+ * namei.c - NILFS pathname lookup operations.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Modified for NILFS by Amagai Yoshiji <amagai@osrg.net>,
+ *                       Ryusuke Konishi <ryusuke@osrg.net>
+ */
+/*
+ *  linux/fs/ext2/namei.c
+ *
+ * Copyright (C) 1992, 1993, 1994, 1995
+ * Remy Card (card@masi.ibp.fr)
+ * Laboratoire MASI - Institut Blaise Pascal
+ * Universite Pierre et Marie Curie (Paris VI)
+ *
+ *  from
+ *
+ *  linux/fs/minix/namei.c
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ *  Big-endian to little-endian byte-swapping/bitmaps by
+ *        David S. Miller (davem@caip.rutgers.edu), 1995
+ */
+
+#include <linux/pagemap.h>
+#include "nilfs.h"
+
+
+static inline int nilfs_add_nondir(struct dentry ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds functions on the_nilfs object, which keeps shared resources
and states among a read/write mount and snapshots mounts going
individually.

the_nilfs is allocated per block device; it is created when user first
mount a snapshot or a read/write mount on the device, then it is
reused for successive mounts. It will be freed when all mount
instances on the device are detached.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/the_nilfs.c |  524 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 524 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/the_nilfs.c

diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
new file mode 100644
index 0000000..852e0bf
--- /dev/null
+++ b/fs/nilfs2/the_nilfs.c
@@ -0,0 +1,524 @@
+/*
+ * the_nilfs.c - the_nilfs shared structure.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+#include "nilfs.h"
+#include "segment.h"
+#include "alloc.h"
+#include "cpfile.h"
+#include "sufile.h"
+#include "dat.h"
+#include "seglist.h"
+#include ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds super block operations for the nilfs2 file system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/super.c | 1365 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 1365 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/super.c

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
new file mode 100644
index 0000000..85dd42f
--- /dev/null
+++ b/fs/nilfs2/super.c
@@ -0,0 +1,1365 @@
+/*
+ * super.c - NILFS module and super block management.
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ */
+/*
+ *  linux/fs/ext2/super.c
+ *
+ * Copyright (C) 1992, 1993, 1994, 1995
+ * Remy Card (card@masi.ibp.fr)
+ * Laboratoire MASI - Institut Blaise Pascal
+ * Universite Pierre et Marie Curie (Paris VI)
+ *
+ *  from
+ *
+ *  linux/fs/minix/inode.c
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ *  Big-endian to little-endian byte-swapping/bitmaps by
+ *        David S. Miller (davem@caip.rutgers.edu), 1995
+ */
+
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/blkdev.h>
+#include <linux/parser.h>
+#include ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds the segment buffer which is used to constuct logs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/segbuf.c |  461 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/segbuf.h |  203 +++++++++++++++++++++++
 2 files changed, 664 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/segbuf.c
 create mode 100644 fs/nilfs2/segbuf.h

diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
new file mode 100644
index 0000000..6505a6a
--- /dev/null
+++ b/fs/nilfs2/segbuf.c
@@ -0,0 +1,461 @@
+/*
+ * segbuf.c - NILFS segment buffer
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/writeback.h>
+#include <linux/crc32.h>
+#include "page.h"
+#include "segbuf.h"
+#include "seglist.h"
+
+
+static struct kmem_cache *nilfs_segbuf_cachep;
+
+static void nilfs_segbuf_init_once(void *obj)
+{
+	memset(obj, 0, sizeof(struct nilfs_segment_buffer));
+}
+
+int __init nilfs_init_segbuf_cache(void)
+{
+	nilfs_segbuf_cachep =
+		kmem_cache_create("nilfs2_segbuf_cache",
+				  sizeof(struct nilfs_segment_buffer),
+				  0, SLAB_RECLAIM_ACCOUNT,
+				  ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds the segment constructor (also called log writer).

The segment constructor collects dirty buffers for every dirty inode,
makes summaries of the buffers, assigns disk block addresses to the
buffers, and then submits BIOs for the buffers.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/seglist.h |   85 ++
 fs/nilfs2/segment.c | 3221 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nilfs2/segment.h |  246 ++++
 3 files changed, 3552 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/seglist.h
 create mode 100644 fs/nilfs2/segment.c
 create mode 100644 fs/nilfs2/segment.h

diff --git a/fs/nilfs2/seglist.h b/fs/nilfs2/seglist.h
new file mode 100644
index 0000000..d39df91
--- /dev/null
+++ b/fs/nilfs2/seglist.h
@@ -0,0 +1,85 @@
+/*
+ * seglist.h - expediential structure and routines to handle list of segments
+ *             (would be removed in a future release)
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ *
+ */
+#ifndef _NILFS_SEGLIST_H
+#define _NILFS_SEGLIST_H
+
+#include <linux/fs.h>
+#include <linux/buffer_head.h>
+#include <linux/nilfs2_fs.h>
+#include "sufile.h"
+
+struct nilfs_segment_entry ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds recovery function on mount.

Usually the recovery is achieved by just finding the latest super
root.  When logs without checkpoints were appended for data sync
operations after the latest super root, the recovery function will
perform roll forwarding and reconstruct new log(s) with a super root.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/recovery.c |  941 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 941 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/recovery.c

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
new file mode 100644
index 0000000..877dc1b
--- /dev/null
+++ b/fs/nilfs2/recovery.c
@@ -0,0 +1,941 @@
+/*
+ * recovery.c - NILFS recovery logic
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/blkdev.h>
+#include <linux/swap.h>
+#include <linux/crc32.h>
+#include "nilfs.h"
+#include "segment.h"
+#include "sufile.h"
+#include "page.h"
+#include "seglist.h"
+#include "segbuf.h"
+
+/*
+ * Segment check result
+ */
+enum ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

NILFS2 uses another DAT inode during garbage collection to ensure
atomicity and consistency of the DAT in the transient state.  This
twin inode is called GCDAT.

This adds functions to initialize the GCDAT and to switch page caches
and B-tree node caches between these two inodes.

Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
 fs/nilfs2/gcdat.c |   84 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/gcdat.c

diff --git a/fs/nilfs2/gcdat.c b/fs/nilfs2/gcdat.c
new file mode 100644
index 0000000..93383c5
--- /dev/null
+++ b/fs/nilfs2/gcdat.c
@@ -0,0 +1,84 @@
+/*
+ * gcdat.c - NILFS shadow DAT inode for GC
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Seiji Kihara <kihara@osrg.net>, Amagai Yoshiji <amagai@osrg.net>,
+ *            and Ryusuke Konishi <ryusuke@osrg.net>.
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include "nilfs.h"
+#include "page.h"
+#include "mdt.h"
+
+int nilfs_init_gcdat_inode(struct the_nilfs *nilfs)
+{
+	struct inode *dat = nilfs->ns_dat, *gcdat = ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds the cache of on-disk blocks to be moved in garbage
collection.  The disk blocks are held with dummy inodes (called
gcinodes), and this file provides lookup function of the dummy inodes,
and their buffer read function.

Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
 fs/nilfs2/gcinode.c |  270 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 270 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/gcinode.c

diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
new file mode 100644
index 0000000..0013952
--- /dev/null
+++ b/fs/nilfs2/gcinode.c
@@ -0,0 +1,270 @@
+/*
+ * gcinode.c - NILFS memory inode for GC
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Seiji Kihara <kihara@osrg.net>, Amagai Yoshiji <amagai@osrg.net>,
+ *            and Ryusuke Konishi <ryusuke@osrg.net>.
+ * Revised by Ryusuke Konishi <ryusuke@osrg.net>.
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/hash.h>
+#include <linux/swap.h>
+#include "nilfs.h"
+#include "page.h"
+#include "mdt.h"
+#include "dat.h"
+#include ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

From: Koji Sato <sato.koji@lab.ntt.co.jp>

This adds userland interface implemented with ioctl.

Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/ioctl.c |  941 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 941 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/ioctl.c

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
new file mode 100644
index 0000000..35ba60e
--- /dev/null
+++ b/fs/nilfs2/ioctl.c
@@ -0,0 +1,941 @@
+/*
+ * ioctl.c - NILFS ioctl operations.
+ *
+ * Copyright (C) 2007, 2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/fs.h>
+#include <linux/wait.h>
+#include <linux/smp_lock.h>	/* lock_kernel(), unlock_kernel() */
+#include <linux/capability.h>	/* capable() */
+#include <linux/uaccess.h>	/* copy_from_user(), copy_to_user() */
+#include <linux/nilfs2_fs.h>
+#include "nilfs.h"
+#include "segment.h"
+#include "bmap.h"
+#include "cpfile.h"
+#include "sufile.h"
+#include "dat.h"
+
+
+#define KMALLOC_SIZE_MIN	4096	/* 4KB */
+#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
+
+static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
+				 struct ...
From: Ryusuke Konishi
Date: Sunday, September 14, 2008 - 12:08 pm

This adds a Makefile for the nilfs2 file system, and updates the
makefile and Kconfig file in the file system directory.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/Kconfig         |   26 ++++++++++++++++++++++++++
 fs/Makefile        |    1 +
 fs/nilfs2/Makefile |    5 +++++
 3 files changed, 32 insertions(+), 0 deletions(-)
 create mode 100644 fs/nilfs2/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index abccb5d..294092c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1383,6 +1383,32 @@ config MINIX_FS
 	  partition (the one containing the directory /) cannot be compiled as
 	  a module.
 
+config NILFS2_FS
+	tristate "NILFS2 file system support (EXPERIMENTAL)"
+	depends on BLOCK && EXPERIMENTAL
+	select CRC32
+	help
+	  NILFS2 is a log-structured file system (LFS) supporting continuous
+	  snapshotting.  In addition to versioning capability of the entire
+	  file system, users can even restore files mistakenly overwritten or
+	  destroyed just a few seconds ago.  Since this file system can keep
+	  consistency like conventional LFS, it achieves quick recovery after
+	  system crashes.
+
+	  NILFS2 creates a number of checkpoints every few seconds or per
+	  synchronous write basis (unless there is no change).  Users can
+	  select significant versions among continuously created checkpoints,
+	  and can change them into snapshots which will be preserved for long
+	  periods until they are changed back to checkpoints.  Each
+	  snapshot is mountable as a read-only file system concurrently with
+	  its writable mount, and this feature is convenient for online backup.
+
+	  Some features including atime, extended attributes, and POSIX ACLs,
+	  are not supported yet.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called nilfs2.  If unsure, say N.
+
 config OMFS_FS
 	tristate "SonicBlue Optimized MPEG File System support"
 	depends on BLOCK
diff --git a/fs/Makefile b/fs/Makefile
index ...
From: Jörn
Date: Wednesday, September 17, 2008 - 7:41 am

Nice explanation.  Can you add it to the comment header at the top of
the file?  Unlike the GPL preample, it actually helps non-lawyers. ;)

Using dummy inodes is... unusual.  Why can you not use the actual inodes
those blocks belong to?  Or alternatively a single inode that simply
covers the complete physical device?

Jörn

-- 
All models are wrong. Some models are useful.
-- George Box
--

From: Ryusuke Konishi
Date: Wednesday, September 17, 2008 - 12:09 pm

Because we have to treat blocks that belong to a same file but have
different checkpoint numbers.  (NILFS2 keeps up multiple
checkpoints/snapshots across GC)

Of course, if the standard inode hash is applicable, I prefer it.
ilookup5 or its variant may be applicable for this.

If so, the remaining problem would be the lock dependencies as you

NILFS2 writes GC blocks per file like other files, so the per file
caches (even separate inodes) are convienient for this end.

Ryusuke
--

From: Jörn
Date: Wednesday, September 17, 2008 - 3:49 pm

You should have the same problem already - in some shape or another.  If
you can have two data structures for the same content, a real inode and
a dummy inode, you have a race condition.  Quite possibly one involving
data corruption.

Well, one way to avoid both the race and the locking complexity is by
stopping all writes during GC and destroying all dummy inodes before
writes resume.  But that would be inefficient in several cases.  When
GC'ing data that is dirty in the caches, you move the old stale data
during GC and write the new data soon after.  And you always flush the
caches after GC, even if your machine has no better use for the memory.

So unless I missed something important, I believe the locking is well
worth the effort.

BTW: Some of the explanation you just gave me would do well as
documentation in the source file as well.  That's the sort of background
information new developers can spend month of mistakes and reverse
engineering on. :)

Jörn

-- 
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--

From: Ryusuke Konishi
Date: Saturday, September 20, 2008 - 3:43 am

Hi Jörn,

The current version of NILFS2 really takes this approach.
Pages held by the dummy inodes will be released after they are copied

As for as NILFS2, the dirty blocks and the blocks to be moved by GC
never overlap because the dirty blocks make a new generation.
So, they rather must be written individually.

Though we can reuse pages in the GC cache, the effect of this
optimization may be much lower than usual LFSes because most of
blocks in the pages may not belong to the latest generation.

Hmm, we would be better off counting frequency of true overlap if

Well, thanks.  I'll do that.
NILFS2 needs explication than usual file systems; it needs time
perspective as well as it is an LFS. :)

Regards,
Ryusuke
--

From: Jörn
Date: Saturday, September 20, 2008 - 4:04 am

Yet again I've tried to apply techniques that simply don't work with

At least if you take me as a standard, I think you have proven that
point rather well. :)

Jörn

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan
--

From: Pekka Enberg
Date: Monday, September 15, 2008 - 11:20 am

Hi Ryusuke,

On Sun, Sep 14, 2008 at 10:08 PM, Ryusuke Konishi

OK, I don't understand this. The only way nilfs_transaction_end() can
fail is if we have NILFS_TI_SYNC set and we fail to construct the
segment. But why do we want to construct a segment if we don't commit?

I guess what I'm asking is why don't we have a separate
nilfs_transaction_abort() function that can't fail for the erroneous
case to avoid this double error value tracking thing?

                      Pekka
--

From: konishi.ryusuke
Date: Monday, September 15, 2008 - 10:31 pm

Hi Pekka!


Yeah, that's quite right.  nilfs_transaction_end() should not call
nilfs_construct_segment() in the error case, and this double error
handling seems to be avoidable.

The ``commit'' argument of nilfs_transaction_end() is insufficient
because it does not cancel the commit state.

I'd like to correct these error hanlings by adding
nilfs_transaction_abort() as you told me.

Thank you for the comment.

Regards,
Ryusuke
--

From: Jörn
Date: Wednesday, September 17, 2008 - 7:31 am

No atime.  Seems familiar. :)

Did you test the filesystem on big endian systems?  It is relatively
easy to miss bugs if conversion isn't actually necessary.

Jörn

-- 
When people work hard for you for a pat on the back, you've got
to give them that pat.
-- Robert Heinlein
--

From: Ryusuke Konishi
Date: Wednesday, September 17, 2008 - 8:51 am

Yes, we did.  We have test machines for this end.  We had actually
conversion misses and alignment errors in the early days.
And know big endian system is a requisite for us :)

Regards,
Ryusuke
--

From: Pavel Machek
Date: Monday, September 15, 2008 - 2:54 am

Hmm, undelete done right. Just one question... how slow/fast is it
compared to conventional filesystems (ext3?)?
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: konishi.ryusuke
Date: Monday, September 15, 2008 - 1:10 pm

Hi!


After my first submission, Szabolcs Szakacsits showed benchmark
results using compilebench.

On Thu, 21 Aug 2008 00:25:55 +0300 (MET DST)

Accordint to his measurement, NILFS2 showed a very low performance
on the first measument, and it recovered after a while.

I still don't know the reason why NILFS2 shows such behaviour, and
I'm thinking to follow the benchmark.

A little while ago, I tried another benchmark on the kernel 2.6.27-rc6.
The iozone benchmark.  The result was as follows:

 Throughput in MB/s (buffer size = 8B, file size = 500MB)
      <write> <rewrite> <read> <reread> <rand-read> <rand-write>
 ext3    44.918  46.691  56.541  56.505  1.562   5.716
 nilfs2  56.076  43.703  41.364  41.356  1.231   37.650
 
 Throughput in MB/s (buffer size = 64KB, file size = 500MB)
        <write> <rewrite> <read> <reread> <rand-read> <rand-write>
 ext3     45.369  46.438  56.542  56.457 11.025  36.300
 nilfs2   56.119  43.630  41.330  41.498  8.572  37.671

(Here I used -e and -U option to measure true disk read performance
 not cache read performance)

As often said for LFS, NILFS2 showed high random write performance
for small writes, but the read throughput was lower.  It was about
-27% lower than ext3.

For sequential write, first write was good, but overwrite was slower
because it involves read of existing meta data.


Cheers,
Ryusuke
--

From: Chris Mason
Date: Tuesday, September 16, 2008 - 6:38 am

Normally, compilebench has read phases that time how quickly the FS can
read the files after a bunch of operations.  The runs above didn't
include the read phase, but in order to be fair to all the filesystems,
compilebench figures out the native readdir order of the FS so it can
create files in the optimal order for each fs.

It does this by creating all the files in its datasets and using readdir
to find out what order the FS returns.  The files are all deleted
and the real runs start.

It is possible that bad perf in the first compilebench run is from
cleanup or transaction commits being done after the deletions.

-chris
--

From: Jörn
Date: Wednesday, September 17, 2008 - 7:54 am

How are the sufile and the DAT written?  If you naively stick to the
log-structured approach, their contents will reflect a filesystem state
prior to writing them and be outdated by the time they hit the medium.
So either you bend the rules here and update those files in-place or you
do something tricky.  Can you explain your solution?

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy
--

From: Ryusuke Konishi
Date: Wednesday, September 17, 2008 - 10:52 am

That's right.  The DAT, sufile, and cpfile are written at a time so
that they will become consistent and self contained.

Checkpoint creations are predictable, so the cpfile is OK.  But the
sufile depends on the length of logs, therefore it depends on
construction of other files including DAT and the super root block.
Since the virtual block numbers are assigned also to the sufile, there
is a circulation problem.

So, nilfs2 makes the sufile in a speculative way; it will retry
collection of dirty blocks for these three files if it turned out that
more segments are required than expected.  It is not a problem for the
case that expected segments are too many because allocation of
oversupplied segments can be cancelled without breaking consistency.
nilfs2 does this retry on memory and writes the three files at a time
to avoid I/O penalty.

Regards,
Ryusuke
--

Previous thread: none

Next thread: [PATCH] Fix section for snd-aica platform driver by Uwe Kleine-König on Sunday, September 14, 2008 - 12:32 pm. (2 messages)