This is a new patch set for the nilfs2 file system. The first submission is found at: http://marc.info/?l=linux-fsdevel&m=121920195516073 The old patch was not divided, and this time I divided it into 27 patches: The patch #1 adds a document to Documentation/filesystems/ The patch #2 adds a header file of the disk format to include/linux/ The patches #2-#26 adds nilfs2 source files to fs/nilfs2/ and the patch #27 updates Makefile and Kconfig. This patch set also includes some cleanups and improvements. The main changes from the previous patch are as follows: * Use the standard mm/ cache instead of uniquely implemented page cache to hold B-tree node buffers; the peculiar page cache was removed by this. * Integrate two similar allocators found in DAT and inode file. * Read requests for GC blocks are now submitted in parallel to mitigate GC overhead. More than 2,000 lines are reduced by the cleanups. The patch set is available from http://www.nilfs.org/pub/patch/nilfs2-2.6.27-rc6/ If you like a git tree: http://git.nilfs.org/nilfs2-2.6.git nilfs2 ( gitweb: http://www.nilfs.org/git/?p=nilfs2-2.6.git ) The userland tools are included in the nilfs-utils package which is available on http://www.nilfs.org/en/download.html Example: In this example, /dev/sdb1 is used as a nilfs2 partition. - To use nilfs2 as a local file system, simply: # mkfs -t nilfs2 /dev/sdb1 # mount -t nilfs2 /dev/sdb1 /dir This will also invoke the cleaner through the mount helper program (mount.nilfs2). - Checkpoints and snapshots are managed by the following commands. Their manpages are included in the nilfs-utils package. lscp list checkpoints or snapshots. mkcp make a checkpoint or a snapshot. chcp change an existing checkpoint to a snapshot or vice versa. rmcp invalidate specified checkpoint(s). For example, # lscp /dev/sdb1 will list checkpoints on the device. The block device argument is ...
This adds a document describing the features, mount options, userland tools, usage, disk format, and related URLs for the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- Documentation/filesystems/00-INDEX | 2 + Documentation/filesystems/nilfs2.txt | 202 ++++++++++++++++++++++++++++++++++ 2 files changed, 204 insertions(+), 0 deletions(-) create mode 100644 Documentation/filesystems/nilfs2.txt diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 52cd611..8dd6db7 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -68,6 +68,8 @@ ncpfs.txt - info on Novell Netware(tm) filesystem using NCP protocol. nfsroot.txt - short guide on setting up a diskless box with NFS root filesystem. +nilfs2.txt + - info and mount options for the NILFS2 filesystem. ntfs.txt - info and mount options for the NTFS filesystem (Windows NT). ocfs2.txt diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt new file mode 100644 index 0000000..f323b01 --- /dev/null +++ b/Documentation/filesystems/nilfs2.txt @@ -0,0 +1,202 @@ +NILFS2 +------ + +NILFS2 is a log-structured file system (LFS) supporting continuous +snapshotting. In addition to versioning capability of the entire file +system, users can even restore files mistakenly overwritten or +destroyed just a few seconds ago. Since NILFS2 can keep consistency +like conventional LFS, it achieves quick recovery after system +crashes. + +NILFS2 creates a number of checkpoints every few seconds or per +synchronous write basis (unless there is no change). Users can select +significant versions among continuously created checkpoints, and can +change them into snapshots which will be preserved until they are +changed back to checkpoints. + +There is no limit on the number of snapshots until the volume gets +full. Each snapshot is mountable as a read-only file system +concurrently ...
From: Koji Sato <sato.koji@lab.ntt.co.jp> This adds a header file which specifies the on-disk format and ioctl interface of the nilfs2 file system. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- include/linux/nilfs2_fs.h | 854 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 854 insertions(+), 0 deletions(-) create mode 100644 include/linux/nilfs2_fs.h diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h new file mode 100644 index 0000000..e38fad2 --- /dev/null +++ b/include/linux/nilfs2_fs.h @@ -0,0 +1,854 @@ +/* + * nilfs2_fs.h - NILFS2 on-disk structures and common declarations. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Koji Sato <koji@osrg.net> + * Ryusuke Konishi <ryusuke@osrg.net> + */ +/* + * linux/include/linux/ext2_fs.h + * + * Copyright (C) 1992, 1993, 1994, 1995 + * Remy Card (card@masi.ibp.fr) + * Laboratoire MASI - Institut Blaise Pascal + * Universite Pierre et Marie Curie (Paris VI) + * + * from + * + * linux/include/linux/minix_fs.h + * + * Copyright (C) 1991, 1992 Linus Torvalds + */ + +#ifndef _LINUX_NILFS_FS_H +#define ...
This adds the following common structures of the NILFS2 file system. * nilfs_inode_info structure: gives on-memory inode. * nilfs_sb_info structure: keeps per-mount state and a special inode for the ifile. This structure is attached to the super_block structure. * the_nilfs structure: keeps shared state and locks among a read/write mount and snapshot mounts. This keeps special inodes for the sufile, cpfile, dat, and another dat inode used during GC (gcdat). This also has a hash table of dummy inodes to cache disk blocks during GC (gcinodes). * nilfs_transaction_info structure: keeps per task state while nilfs is writing logs or doing indivisible inode or namespace operations. This structure is used to identify context during log making and store nest level of the lock which ensures atomicity of file system operations. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/nilfs.h | 323 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/sb.h | 102 ++++++++++++++++ fs/nilfs2/the_nilfs.h | 290 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 715 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/nilfs.h create mode 100644 fs/nilfs2/sb.h create mode 100644 fs/nilfs2/the_nilfs.h diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h new file mode 100644 index 0000000..c33b8db --- /dev/null +++ b/fs/nilfs2/nilfs.h @@ -0,0 +1,323 @@ +/* + * nilfs.h - NILFS local header file. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even ...
From: Koji Sato <sato.koji@lab.ntt.co.jp> This adds structures and operations for the block mapping (bmap for short). NILFS2 uses direct mappings for short files or B-tree based mappings for longer files. Every on-disk data block is held with inodes and managed through this block mapping. The nilfs_bmap structure and a set of functions here provide this capability to the NILFS2 inode. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/bmap.c | 783 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/bmap.h | 251 ++++++++++++++++ fs/nilfs2/bmap_union.h | 42 +++ 3 files changed, 1076 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/bmap.c create mode 100644 fs/nilfs2/bmap.h create mode 100644 fs/nilfs2/bmap_union.h diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c new file mode 100644 index 0000000..4c6b481 --- /dev/null +++ b/fs/nilfs2/bmap.c @@ -0,0 +1,783 @@ +/* + * bmap.c - NILFS block mapping. + * + * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Koji Sato <koji@osrg.net>. + */ + +#include <linux/fs.h> +#include <linux/string.h> +#include <linux/errno.h> +#include ...
From: Koji Sato <sato.koji@lab.ntt.co.jp> This adds declarations and functions of NILFS2 B-tree. Two variants are integrated in the NILFS2 B-tree. The B-tree for the most files points to the child nodes or data blocks with virtual block addresses, whereas the B-tree of the DAT uses actual block addresses. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/btree.c | 2276 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/btree.h | 117 +++ 2 files changed, 2393 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/btree.c create mode 100644 fs/nilfs2/btree.h diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c new file mode 100644 index 0000000..893f019 --- /dev/null +++ b/fs/nilfs2/btree.c @@ -0,0 +1,2276 @@ +/* + * btree.c - NILFS B-tree. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Koji Sato <koji@osrg.net>. + */ + +#include <linux/slab.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/pagevec.h> +#include "nilfs.h" +#include "page.h" +#include "btnode.h" +#include "btree.h" +#include "alloc.h" + +/** + * struct nilfs_btree_path - A path on which B-tree ...
From: Koji Sato <sato.koji@lab.ntt.co.jp>
This adds block mappings using direct pointers which are stored in the
i_bmap array of inode.
Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/direct.c | 429 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/direct.h | 78 ++++++++++
2 files changed, 507 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/direct.c
create mode 100644 fs/nilfs2/direct.h
diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c
new file mode 100644
index 0000000..303d7f1
--- /dev/null
+++ b/fs/nilfs2/direct.c
@@ -0,0 +1,429 @@
+/*
+ * direct.c - NILFS direct block pointer.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/errno.h>
+#include "nilfs.h"
+#include "page.h"
+#include "direct.h"
+#include "alloc.h"
+
+static inline __le64 *nilfs_direct_dptrs(const struct nilfs_direct *direct)
+{
+ return (__le64 *)
+ ((struct nilfs_direct_node *)direct->d_bmap.b_u.u_data + 1);
+}
+
+static inline __u64
+nilfs_direct_get_ptr(const struct nilfs_direct *direct, __u64 key)
+{
+ return ...This adds routines for B-tree node buffers.
Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/btnode.c | 316 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/btnode.h | 58 ++++++++++
2 files changed, 374 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/btnode.c
create mode 100644 fs/nilfs2/btnode.h
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
new file mode 100644
index 0000000..4cc07b2
--- /dev/null
+++ b/fs/nilfs2/btnode.c
@@ -0,0 +1,316 @@
+/*
+ * btnode.c - NILFS B-tree node cache
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * This file was originally written by Seiji Kihara <kihara@osrg.net>
+ * and fully revised by Ryusuke Konishi <ryusuke@osrg.net> for
+ * stabilization and simplification.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/buffer_head.h>
+#include <linux/mm.h>
+#include <linux/backing-dev.h>
+#include "nilfs.h"
+#include "mdt.h"
+#include "dat.h"
+#include "page.h"
+#include "btnode.h"
+
+
+void nilfs_btnode_cache_init_once(struct address_space *btnc)
+{
+ INIT_RADIX_TREE(&btnc->page_tree, ...This adds common routines for buffer/page operations used in B-tree
node caches, meta data files, or segment constructor (log writer).
NILFS uses copy functions for buffers and pages due to the following
reasons:
1) Relocation required for COW
Since NILFS changes address of on-disk blocks, moving buffers
in page cache is needed for the buffers which are not addressed
by a file offset. If buffer size is smaller than page size,
this involves partial copy of pages.
2) Freezing mmapped pages
NILFS calculates checksums for each log to ensure its validity.
If page data changes after the checksum calculation, this validity
check will not work correctly. To avoid this failure for mmaped
pages, NILFS freezes their data by copying.
3) Copy-on-write for DAT pages
NILFS makes clones of DAT page caches in a copy-on-write manner
during GC processes, and this ensures atomicity and consistency
of the DAT in the transient state.
In addition, NILFS uses two obsolete functions, nilfs_mark_buffer_dirty()
and nilfs_clear_page_dirty() respectively.
* nilfs_mark_buffer_dirty() was required to avoid NULL pointer
dereference faults:
Since the page cache of B-tree node pages or data page cache of pseudo
inodes does not have a valid mapping->host, calling mark_buffer_dirty()
for their buffers causes the fault; it calls __mark_inode_dirty(NULL)
through __set_page_dirty().
* nilfs_clear_page_dirty() was needed in the two cases:
1) For B-tree node pages and data pages of the dat/gcdat, NILFS2 clears
page dirty flags when it copies back pages from the cloned cache
(gcdat->{i_mapping,i_btnode_cache}) to its original cache
(dat->{i_mapping,i_btnode_cache}).
2) Some B-tree operations like insertion or deletion may dispose buffers
in dirty state, and this needs to cancel the dirty state of their
pages. clear_page_dirty_for_io() caused faults because it does not
clear the dirty tag on the page ...This adds the meta data file, which serves common buffer functions to the DAT, sufile, cpfile, ifile, and so forth. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/mdt.c | 562 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/mdt.h | 125 ++++++++++++ 2 files changed, 687 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/mdt.c create mode 100644 fs/nilfs2/mdt.h diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c new file mode 100644 index 0000000..6ab8475 --- /dev/null +++ b/fs/nilfs2/mdt.c @@ -0,0 +1,562 @@ +/* + * mdt.c - meta data file for NILFS + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + */ + +#include <linux/buffer_head.h> +#include <linux/mpage.h> +#include <linux/mm.h> +#include <linux/writeback.h> +#include <linux/backing-dev.h> +#include <linux/swap.h> +#include "nilfs.h" +#include "segment.h" +#include "page.h" +#include "mdt.h" + + +#define NILFS_MDT_MAX_RA_BLOCKS (16 - 1) + +#define INIT_UNUSED_INODE_FIELDS + +static int +nilfs_mdt_insert_new_block(struct inode *inode, unsigned long block, + struct buffer_head *bh, + void (*init_block)(struct inode *, + struct ...
This adds common functions to allocate or deallocate entries with bitmaps on a meta data file. This feature is used by the DAT and ifile. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> --- fs/nilfs2/alloc.c | 504 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/alloc.h | 72 ++++++++ 2 files changed, 576 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/alloc.c create mode 100644 fs/nilfs2/alloc.h diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c new file mode 100644 index 0000000..d69e6ae --- /dev/null +++ b/fs/nilfs2/alloc.c @@ -0,0 +1,504 @@ +/* + * alloc.c - NILFS dat/inode allocator + * + * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Original code was written by Koji Sato <koji@osrg.net>. + * Two allocators were unified by Ryusuke Konishi <ryusuke@osrg.net>, + * Amagai Yoshiji <amagai@osrg.net>. + */ + +#include <linux/types.h> +#include <linux/buffer_head.h> +#include <linux/fs.h> +#include <linux/bitops.h> +#include "mdt.h" +#include "alloc.h" + + +static inline unsigned ...
From: Koji Sato <sato.koji@lab.ntt.co.jp> This adds the disk address translation file (DAT) whose primary function is to convert virtual disk block numbers to actual disk block numbers. The virtual block numbers of NILFS are associated with checkpoint generation numbers, and this file also provides functions to manage the lifetime information of each virtual block number. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/dat.c | 429 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/dat.h | 52 +++++++ 2 files changed, 481 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/dat.c create mode 100644 fs/nilfs2/dat.h diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c new file mode 100644 index 0000000..9360920 --- /dev/null +++ b/fs/nilfs2/dat.c @@ -0,0 +1,429 @@ +/* + * dat.c - NILFS disk address translation. + * + * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Koji Sato <koji@osrg.net>. + */ + +#include <linux/types.h> +#include <linux/buffer_head.h> +#include <linux/string.h> +#include <linux/errno.h> +#include "nilfs.h" +#include "mdt.h" +#include "alloc.h" +#include "dat.h" + + +#define ...
This adds a meta data file which stores on-disk inodes in its data blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> --- fs/nilfs2/ifile.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/ifile.h | 53 +++++++++++++++++++ 2 files changed, 203 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/ifile.c create mode 100644 fs/nilfs2/ifile.h diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c new file mode 100644 index 0000000..de86401 --- /dev/null +++ b/fs/nilfs2/ifile.c @@ -0,0 +1,150 @@ +/* + * ifile.c - NILFS inode file + * + * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Amagai Yoshiji <amagai@osrg.net>. + * Revised by Ryusuke Konishi <ryusuke@osrg.net>. + * + */ + +#include <linux/types.h> +#include <linux/buffer_head.h> +#include "nilfs.h" +#include "mdt.h" +#include "alloc.h" +#include "ifile.h" + +/** + * nilfs_ifile_create_inode - create a new disk inode + * @ifile: ifile inode + * @out_ino: pointer to a variable to store inode number + * @out_bh: buffer_head contains newly allocated disk inode + * + * Return Value: On success, 0 is returned and the newly allocated inode + ...
From: Koji Sato <sato.koji@lab.ntt.co.jp>
This adds a meta data file which holds checkpoint entries in its data
blocks.
Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/cpfile.c | 908 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/cpfile.h | 45 +++
2 files changed, 953 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/cpfile.c
create mode 100644 fs/nilfs2/cpfile.h
diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
new file mode 100644
index 0000000..991633a
--- /dev/null
+++ b/fs/nilfs2/cpfile.c
@@ -0,0 +1,908 @@
+/*
+ * cpfile.c - NILFS checkpoint file.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/buffer_head.h>
+#include <linux/errno.h>
+#include <linux/nilfs2_fs.h>
+#include "mdt.h"
+#include "cpfile.h"
+
+
+static inline unsigned long
+nilfs_cpfile_checkpoints_per_block(const struct inode *cpfile)
+{
+ return NILFS_MDT(cpfile)->mi_entries_per_block;
+}
+
+/* block number from the beginning of the file */
+static unsigned ...From: Koji Sato <sato.koji@lab.ntt.co.jp>
This adds a meta data file which stores the allocation state of
segments.
Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/sufile.c | 627 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/sufile.h | 54 +++++
2 files changed, 681 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/sufile.c
create mode 100644 fs/nilfs2/sufile.h
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
new file mode 100644
index 0000000..d672924
--- /dev/null
+++ b/fs/nilfs2/sufile.c
@@ -0,0 +1,627 @@
+/*
+ * sufile.c - NILFS segment usage file.
+ *
+ * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Koji Sato <koji@osrg.net>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/buffer_head.h>
+#include <linux/errno.h>
+#include <linux/nilfs2_fs.h>
+#include "mdt.h"
+#include "sufile.h"
+
+
+static inline unsigned long
+nilfs_sufile_segment_usages_per_block(const struct inode *sufile)
+{
+ return NILFS_MDT(sufile)->mi_entries_per_block;
+}
+
+static unsigned long
+nilfs_sufile_get_blkoff(const struct inode *sufile, __u64 ...This adds inode level operations of the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/inode.c | 819 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 819 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/inode.c diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c new file mode 100644 index 0000000..4a526bc --- /dev/null +++ b/fs/nilfs2/inode.c @@ -0,0 +1,819 @@ +/* + * inode.c - NILFS inode operations. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + * + */ + +#include <linux/buffer_head.h> +#include <linux/mpage.h> +#include <linux/writeback.h> +#include "nilfs.h" +#include "segment.h" +#include "page.h" +#include "mdt.h" +#include "cpfile.h" +#include "ifile.h" + + +/** + * nilfs_get_block() - get a file block on the filesystem (callback function) + * @inode - inode struct of the target file + * @blkoff - file block number + * @bh_result - buffer head to be mapped on + * @create - indicate whether allocating the block or not when it has not + * been allocated yet. + * + * This function does not issue actual read request of the specified data + * block. It is done ...
This adds primitives for regular file handling.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/file.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 125 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/file.c
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
new file mode 100644
index 0000000..7ddd42e
--- /dev/null
+++ b/fs/nilfs2/file.c
@@ -0,0 +1,125 @@
+/*
+ * file.c - NILFS regular file handling primitives including fsync().
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Amagai Yoshiji <amagai@osrg.net>,
+ * Ryusuke Konishi <ryusuke@osrg.net>
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/writeback.h>
+#include "nilfs.h"
+#include "segment.h"
+
+int nilfs_sync_file(struct file *file, struct dentry *dentry, int datasync)
+{
+ /*
+ * Called from fsync() system call
+ * This is the only entry point that can catch write and synch
+ * timing for both data blocks and intermediate blocks.
+ *
+ * This function should be implemented when the writeback function
+ * will be implemented.
+ */
+ struct inode *inode = dentry->d_inode;
+ int err;
+
+ if ...From: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> This adds directory handling functions, most of which comes from the ext2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> --- fs/nilfs2/dir.c | 711 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 711 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/dir.c diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c new file mode 100644 index 0000000..1b7e6dd --- /dev/null +++ b/fs/nilfs2/dir.c @@ -0,0 +1,711 @@ +/* + * dir.c - NILFS directory entry operations + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Modified for NILFS by Amagai Yoshiji <amagai@osrg.net> + */ +/* + * linux/fs/ext2/dir.c + * + * Copyright (C) 1992, 1993, 1994, 1995 + * Remy Card (card@masi.ibp.fr) + * Laboratoire MASI - Institut Blaise Pascal + * Universite Pierre et Marie Curie (Paris VI) + * + * from + * + * linux/fs/minix/dir.c + * + * Copyright (C) 1991, 1992 Linus Torvalds + * + * ext2 directory handling functions + * + * Big-endian to little-endian byte-swapping/bitmaps by + * David S. Miller (davem@caip.rutgers.edu), 1995 + * + * All code that ...
This adds pathname operations, most of which comes from the ext2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/namei.c | 452 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 452 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/namei.c diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c new file mode 100644 index 0000000..daf382c --- /dev/null +++ b/fs/nilfs2/namei.c @@ -0,0 +1,452 @@ +/* + * namei.c - NILFS pathname lookup operations. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Modified for NILFS by Amagai Yoshiji <amagai@osrg.net>, + * Ryusuke Konishi <ryusuke@osrg.net> + */ +/* + * linux/fs/ext2/namei.c + * + * Copyright (C) 1992, 1993, 1994, 1995 + * Remy Card (card@masi.ibp.fr) + * Laboratoire MASI - Institut Blaise Pascal + * Universite Pierre et Marie Curie (Paris VI) + * + * from + * + * linux/fs/minix/namei.c + * + * Copyright (C) 1991, 1992 Linus Torvalds + * + * Big-endian to little-endian byte-swapping/bitmaps by + * David S. Miller (davem@caip.rutgers.edu), 1995 + */ + +#include <linux/pagemap.h> +#include "nilfs.h" + + +static inline int nilfs_add_nondir(struct dentry ...
This adds functions on the_nilfs object, which keeps shared resources and states among a read/write mount and snapshots mounts going individually. the_nilfs is allocated per block device; it is created when user first mount a snapshot or a read/write mount on the device, then it is reused for successive mounts. It will be freed when all mount instances on the device are detached. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/the_nilfs.c | 524 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 524 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/the_nilfs.c diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c new file mode 100644 index 0000000..852e0bf --- /dev/null +++ b/fs/nilfs2/the_nilfs.c @@ -0,0 +1,524 @@ +/* + * the_nilfs.c - the_nilfs shared structure. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + * + */ + +#include <linux/buffer_head.h> +#include <linux/slab.h> +#include <linux/blkdev.h> +#include <linux/backing-dev.h> +#include "nilfs.h" +#include "segment.h" +#include "alloc.h" +#include "cpfile.h" +#include "sufile.h" +#include "dat.h" +#include "seglist.h" +#include ...
This adds super block operations for the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/super.c | 1365 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 1365 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/super.c diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c new file mode 100644 index 0000000..85dd42f --- /dev/null +++ b/fs/nilfs2/super.c @@ -0,0 +1,1365 @@ +/* + * super.c - NILFS module and super block management. + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + */ +/* + * linux/fs/ext2/super.c + * + * Copyright (C) 1992, 1993, 1994, 1995 + * Remy Card (card@masi.ibp.fr) + * Laboratoire MASI - Institut Blaise Pascal + * Universite Pierre et Marie Curie (Paris VI) + * + * from + * + * linux/fs/minix/inode.c + * + * Copyright (C) 1991, 1992 Linus Torvalds + * + * Big-endian to little-endian byte-swapping/bitmaps by + * David S. Miller (davem@caip.rutgers.edu), 1995 + */ + +#include <linux/module.h> +#include <linux/string.h> +#include <linux/slab.h> +#include <linux/init.h> +#include <linux/blkdev.h> +#include <linux/parser.h> +#include ...
This adds the segment buffer which is used to constuct logs.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
fs/nilfs2/segbuf.c | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/segbuf.h | 203 +++++++++++++++++++++++
2 files changed, 664 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/segbuf.c
create mode 100644 fs/nilfs2/segbuf.h
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
new file mode 100644
index 0000000..6505a6a
--- /dev/null
+++ b/fs/nilfs2/segbuf.c
@@ -0,0 +1,461 @@
+/*
+ * segbuf.c - NILFS segment buffer
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Ryusuke Konishi <ryusuke@osrg.net>
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/writeback.h>
+#include <linux/crc32.h>
+#include "page.h"
+#include "segbuf.h"
+#include "seglist.h"
+
+
+static struct kmem_cache *nilfs_segbuf_cachep;
+
+static void nilfs_segbuf_init_once(void *obj)
+{
+ memset(obj, 0, sizeof(struct nilfs_segment_buffer));
+}
+
+int __init nilfs_init_segbuf_cache(void)
+{
+ nilfs_segbuf_cachep =
+ kmem_cache_create("nilfs2_segbuf_cache",
+ sizeof(struct nilfs_segment_buffer),
+ 0, SLAB_RECLAIM_ACCOUNT,
+ ...This adds the segment constructor (also called log writer). The segment constructor collects dirty buffers for every dirty inode, makes summaries of the buffers, assigns disk block addresses to the buffers, and then submits BIOs for the buffers. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/seglist.h | 85 ++ fs/nilfs2/segment.c | 3221 +++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/segment.h | 246 ++++ 3 files changed, 3552 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/seglist.h create mode 100644 fs/nilfs2/segment.c create mode 100644 fs/nilfs2/segment.h diff --git a/fs/nilfs2/seglist.h b/fs/nilfs2/seglist.h new file mode 100644 index 0000000..d39df91 --- /dev/null +++ b/fs/nilfs2/seglist.h @@ -0,0 +1,85 @@ +/* + * seglist.h - expediential structure and routines to handle list of segments + * (would be removed in a future release) + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + * + */ +#ifndef _NILFS_SEGLIST_H +#define _NILFS_SEGLIST_H + +#include <linux/fs.h> +#include <linux/buffer_head.h> +#include <linux/nilfs2_fs.h> +#include "sufile.h" + +struct nilfs_segment_entry ...
This adds recovery function on mount. Usually the recovery is achieved by just finding the latest super root. When logs without checkpoints were appended for data sync operations after the latest super root, the recovery function will perform roll forwarding and reconstruct new log(s) with a super root. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/recovery.c | 941 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 941 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/recovery.c diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c new file mode 100644 index 0000000..877dc1b --- /dev/null +++ b/fs/nilfs2/recovery.c @@ -0,0 +1,941 @@ +/* + * recovery.c - NILFS recovery logic + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Ryusuke Konishi <ryusuke@osrg.net> + */ + +#include <linux/buffer_head.h> +#include <linux/blkdev.h> +#include <linux/swap.h> +#include <linux/crc32.h> +#include "nilfs.h" +#include "segment.h" +#include "sufile.h" +#include "page.h" +#include "seglist.h" +#include "segbuf.h" + +/* + * Segment check result + */ +enum ...
NILFS2 uses another DAT inode during garbage collection to ensure
atomicity and consistency of the DAT in the transient state. This
twin inode is called GCDAT.
This adds functions to initialize the GCDAT and to switch page caches
and B-tree node caches between these two inodes.
Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp>
---
fs/nilfs2/gcdat.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 84 insertions(+), 0 deletions(-)
create mode 100644 fs/nilfs2/gcdat.c
diff --git a/fs/nilfs2/gcdat.c b/fs/nilfs2/gcdat.c
new file mode 100644
index 0000000..93383c5
--- /dev/null
+++ b/fs/nilfs2/gcdat.c
@@ -0,0 +1,84 @@
+/*
+ * gcdat.c - NILFS shadow DAT inode for GC
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Written by Seiji Kihara <kihara@osrg.net>, Amagai Yoshiji <amagai@osrg.net>,
+ * and Ryusuke Konishi <ryusuke@osrg.net>.
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include "nilfs.h"
+#include "page.h"
+#include "mdt.h"
+
+int nilfs_init_gcdat_inode(struct the_nilfs *nilfs)
+{
+ struct inode *dat = nilfs->ns_dat, *gcdat = ...This adds the cache of on-disk blocks to be moved in garbage collection. The disk blocks are held with dummy inodes (called gcinodes), and this file provides lookup function of the dummy inodes, and their buffer read function. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> --- fs/nilfs2/gcinode.c | 270 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 270 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/gcinode.c diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c new file mode 100644 index 0000000..0013952 --- /dev/null +++ b/fs/nilfs2/gcinode.c @@ -0,0 +1,270 @@ +/* + * gcinode.c - NILFS memory inode for GC + * + * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Seiji Kihara <kihara@osrg.net>, Amagai Yoshiji <amagai@osrg.net>, + * and Ryusuke Konishi <ryusuke@osrg.net>. + * Revised by Ryusuke Konishi <ryusuke@osrg.net>. + * + */ + +#include <linux/buffer_head.h> +#include <linux/mpage.h> +#include <linux/hash.h> +#include <linux/swap.h> +#include "nilfs.h" +#include "page.h" +#include "mdt.h" +#include "dat.h" +#include ...
From: Koji Sato <sato.koji@lab.ntt.co.jp> This adds userland interface implemented with ioctl. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/nilfs2/ioctl.c | 941 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 941 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/ioctl.c diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c new file mode 100644 index 0000000..35ba60e --- /dev/null +++ b/fs/nilfs2/ioctl.c @@ -0,0 +1,941 @@ +/* + * ioctl.c - NILFS ioctl operations. + * + * Copyright (C) 2007, 2008 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Written by Koji Sato <koji@osrg.net>. + */ + +#include <linux/fs.h> +#include <linux/wait.h> +#include <linux/smp_lock.h> /* lock_kernel(), unlock_kernel() */ +#include <linux/capability.h> /* capable() */ +#include <linux/uaccess.h> /* copy_from_user(), copy_to_user() */ +#include <linux/nilfs2_fs.h> +#include "nilfs.h" +#include "segment.h" +#include "bmap.h" +#include "cpfile.h" +#include "sufile.h" +#include "dat.h" + + +#define KMALLOC_SIZE_MIN 4096 /* 4KB */ +#define KMALLOC_SIZE_MAX 131072 /* 128 KB */ + +static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs, + struct ...
This adds a Makefile for the nilfs2 file system, and updates the makefile and Kconfig file in the file system directory. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> --- fs/Kconfig | 26 ++++++++++++++++++++++++++ fs/Makefile | 1 + fs/nilfs2/Makefile | 5 +++++ 3 files changed, 32 insertions(+), 0 deletions(-) create mode 100644 fs/nilfs2/Makefile diff --git a/fs/Kconfig b/fs/Kconfig index abccb5d..294092c 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1383,6 +1383,32 @@ config MINIX_FS partition (the one containing the directory /) cannot be compiled as a module. +config NILFS2_FS + tristate "NILFS2 file system support (EXPERIMENTAL)" + depends on BLOCK && EXPERIMENTAL + select CRC32 + help + NILFS2 is a log-structured file system (LFS) supporting continuous + snapshotting. In addition to versioning capability of the entire + file system, users can even restore files mistakenly overwritten or + destroyed just a few seconds ago. Since this file system can keep + consistency like conventional LFS, it achieves quick recovery after + system crashes. + + NILFS2 creates a number of checkpoints every few seconds or per + synchronous write basis (unless there is no change). Users can + select significant versions among continuously created checkpoints, + and can change them into snapshots which will be preserved for long + periods until they are changed back to checkpoints. Each + snapshot is mountable as a read-only file system concurrently with + its writable mount, and this feature is convenient for online backup. + + Some features including atime, extended attributes, and POSIX ACLs, + are not supported yet. + + To compile this file system support as a module, choose M here: the + module will be called nilfs2. If unsure, say N. + config OMFS_FS tristate "SonicBlue Optimized MPEG File System support" depends on BLOCK diff --git a/fs/Makefile b/fs/Makefile index ...
Nice explanation. Can you add it to the comment header at the top of the file? Unlike the GPL preample, it actually helps non-lawyers. ;) Using dummy inodes is... unusual. Why can you not use the actual inodes those blocks belong to? Or alternatively a single inode that simply covers the complete physical device? Jörn -- All models are wrong. Some models are useful. -- George Box --
Because we have to treat blocks that belong to a same file but have different checkpoint numbers. (NILFS2 keeps up multiple checkpoints/snapshots across GC) Of course, if the standard inode hash is applicable, I prefer it. ilookup5 or its variant may be applicable for this. If so, the remaining problem would be the lock dependencies as you NILFS2 writes GC blocks per file like other files, so the per file caches (even separate inodes) are convienient for this end. Ryusuke --
You should have the same problem already - in some shape or another. If you can have two data structures for the same content, a real inode and a dummy inode, you have a race condition. Quite possibly one involving data corruption. Well, one way to avoid both the race and the locking complexity is by stopping all writes during GC and destroying all dummy inodes before writes resume. But that would be inefficient in several cases. When GC'ing data that is dirty in the caches, you move the old stale data during GC and write the new data soon after. And you always flush the caches after GC, even if your machine has no better use for the memory. So unless I missed something important, I believe the locking is well worth the effort. BTW: Some of the explanation you just gave me would do well as documentation in the source file as well. That's the sort of background information new developers can spend month of mistakes and reverse engineering on. :) Jörn -- Those who come seeking peace without a treaty are plotting. -- Sun Tzu --
Hi Jörn, The current version of NILFS2 really takes this approach. Pages held by the dummy inodes will be released after they are copied As for as NILFS2, the dirty blocks and the blocks to be moved by GC never overlap because the dirty blocks make a new generation. So, they rather must be written individually. Though we can reuse pages in the GC cache, the effect of this optimization may be much lower than usual LFSes because most of blocks in the pages may not belong to the latest generation. Hmm, we would be better off counting frequency of true overlap if Well, thanks. I'll do that. NILFS2 needs explication than usual file systems; it needs time perspective as well as it is an LFS. :) Regards, Ryusuke --
Yet again I've tried to apply techniques that simply don't work with At least if you take me as a standard, I think you have proven that point rather well. :) Jörn -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan --
Hi Ryusuke,
On Sun, Sep 14, 2008 at 10:08 PM, Ryusuke Konishi
OK, I don't understand this. The only way nilfs_transaction_end() can
fail is if we have NILFS_TI_SYNC set and we fail to construct the
segment. But why do we want to construct a segment if we don't commit?
I guess what I'm asking is why don't we have a separate
nilfs_transaction_abort() function that can't fail for the erroneous
case to avoid this double error value tracking thing?
Pekka
--
Hi Pekka! Yeah, that's quite right. nilfs_transaction_end() should not call nilfs_construct_segment() in the error case, and this double error handling seems to be avoidable. The ``commit'' argument of nilfs_transaction_end() is insufficient because it does not cancel the commit state. I'd like to correct these error hanlings by adding nilfs_transaction_abort() as you told me. Thank you for the comment. Regards, Ryusuke --
No atime. Seems familiar. :) Did you test the filesystem on big endian systems? It is relatively easy to miss bugs if conversion isn't actually necessary. Jörn -- When people work hard for you for a pat on the back, you've got to give them that pat. -- Robert Heinlein --
Yes, we did. We have test machines for this end. We had actually conversion misses and alignment errors in the early days. And know big endian system is a requisite for us :) Regards, Ryusuke --
Hmm, undelete done right. Just one question... how slow/fast is it compared to conventional filesystems (ext3?)? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Hi!
After my first submission, Szabolcs Szakacsits showed benchmark
results using compilebench.
On Thu, 21 Aug 2008 00:25:55 +0300 (MET DST)
Accordint to his measurement, NILFS2 showed a very low performance
on the first measument, and it recovered after a while.
I still don't know the reason why NILFS2 shows such behaviour, and
I'm thinking to follow the benchmark.
A little while ago, I tried another benchmark on the kernel 2.6.27-rc6.
The iozone benchmark. The result was as follows:
Throughput in MB/s (buffer size = 8B, file size = 500MB)
<write> <rewrite> <read> <reread> <rand-read> <rand-write>
ext3 44.918 46.691 56.541 56.505 1.562 5.716
nilfs2 56.076 43.703 41.364 41.356 1.231 37.650
Throughput in MB/s (buffer size = 64KB, file size = 500MB)
<write> <rewrite> <read> <reread> <rand-read> <rand-write>
ext3 45.369 46.438 56.542 56.457 11.025 36.300
nilfs2 56.119 43.630 41.330 41.498 8.572 37.671
(Here I used -e and -U option to measure true disk read performance
not cache read performance)
As often said for LFS, NILFS2 showed high random write performance
for small writes, but the read throughput was lower. It was about
-27% lower than ext3.
For sequential write, first write was good, but overwrite was slower
because it involves read of existing meta data.
Cheers,
Ryusuke
--
Normally, compilebench has read phases that time how quickly the FS can read the files after a bunch of operations. The runs above didn't include the read phase, but in order to be fair to all the filesystems, compilebench figures out the native readdir order of the FS so it can create files in the optimal order for each fs. It does this by creating all the files in its datasets and using readdir to find out what order the FS returns. The files are all deleted and the real runs start. It is possible that bad perf in the first compilebench run is from cleanup or transaction commits being done after the deletions. -chris --
How are the sufile and the DAT written? If you naively stick to the log-structured approach, their contents will reflect a filesystem state prior to writing them and be outdated by the time they hit the medium. So either you bend the rules here and update those files in-place or you do something tricky. Can you explain your solution? Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy --
That's right. The DAT, sufile, and cpfile are written at a time so that they will become consistent and self contained. Checkpoint creations are predictable, so the cpfile is OK. But the sufile depends on the length of logs, therefore it depends on construction of other files including DAT and the super root block. Since the virtual block numbers are assigned also to the sufile, there is a circulation problem. So, nilfs2 makes the sufile in a speculative way; it will retry collection of dirty blocks for these three files if it turned out that more segments are required than expected. It is not a problem for the case that expected segments are too many because allocation of oversupplied segments can be cancelled without breaking consistency. nilfs2 does this retry on memory and writes the three files at a time to avoid I/O penalty. Regards, Ryusuke --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
