Re: [patch 0/15] LogFS take five

Previous thread: [REGRESSION] BCM4306 cannot associate with any version of b43 driver by Gabriel A. Devenyi on Thursday, April 3, 2008 - 11:15 am. (2 messages)

Next thread: [patch 1/15] Makefiles and Kconfig by joern on Tuesday, April 1, 2008 - 2:13 pm. (7 messages)
To: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>
Cc: Nick Piggin <npiggin@...>, Jens Axboe <jens.axboe@...>, David Woodhouse <dwmw2@...>
Date: Tuesday, April 1, 2008 - 2:13 pm

Add LogFS, a scalable flash filesystem.

Patch is split into individual files for review. Several details
will surely raise eyebrows and likely require changes:
- Using two page flags where only one is generally available for
filesystems. One of the flags is necessary to deal with a deadlock
when writepage() sends a locked page for writing. Details can be
found in readwrite.c, around the definition of PG_pre_locked.
Added Nick Piggin to Cc: for that detail.
- Caching in the mtd layer. This should likely be moved into
drivers/mtd/mtdcore.c.
David Woodhouse on Cc: for this.
- An not-quite-polished btree library that should get some more polish
and move to lib/btree.c. Unless someone else has better code
already.

Checkpatch.pl spits out two errors. Neither of these looks like a
clear-cut case to me. If someone has a good suggestion for either
one, I'll happily follow that.

And it is currently reasonably simple to run into a deadlock when
using logfs on a block device. The problem appears to be the block
layer allocating memory for its cache without GFP_NOFS, so that under
memory pressure logfs writes through block layer may recurse back to
logfs writes. Not entirely sure who is to blame for this bug and how to
solve it.

Added Jens Axboe for this detail.

Motivation 1:

Linux currently has 1-2 flash filesystems to choose from, JFFS2 and
YAFFS. The latter has never made a serious attempt of kernel
integration, which may disqualify it to some.

The two main problems of JFFS2 are memory consumption and mount time.
Unlike most filesystems, there is no tree structure of any sorts on
the medium, so the complete medium needs to be scanned at mount time
and a tree structure kept in-memory while the filesystem is mounted.
With bigger devices, both mount time and memory consumption increase
linearly.

JFFS2 has recently gained summary support, which helps reduce mount
time by a constant factor. Linear scalability remains. YAFFS
appears to be better by...

To: <joern@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>, Nick Piggin <npiggin@...>, David Woodhouse <dwmw2@...>
Date: Friday, April 4, 2008 - 7:46 am

So you mean for writes through the page cache, you are seeing pages

A good starting point would be doing a stack trace dump in logfs if you
see such back recursion into the fs. A quick guess would be a missing
setting of mapping gfp mask?

--
Jens Axboe

--

To: Jens Axboe <jens.axboe@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>, Nick Piggin <npiggin@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 7, 2008 - 4:22 am

It sure looks like it. On top, the patch at the bottom seems to solve

Sorry, should have sent that right along.

[<ffffffff802ca83f>] elv_insert+0x156/0x219
[<ffffffff8037d96d>] __mutex_lock_slowpath+0x57/0x81
[<ffffffff8037d804>] mutex_lock+0xd/0xf
[<ffffffff802c07e7>] logfs_get_wblocks+0x33/0x54
[<ffffffff802c025c>] logfs_write_buf+0x3d/0x322
[<ffffffff802bbae0>] __logfs_writepage+0x24/0x67
[<ffffffff802bbbfb>] logfs_writepage+0xd8/0xe3
[<ffffffff8024ba78>] shrink_page_list+0x2ee/0x514
[<ffffffff8024b466>] isolate_lru_pages+0x6c/0x1ff
[<ffffffff8024c2a9>] shrink_zone+0x60b/0x85b
[<ffffffff802cc0e5>] generic_make_request+0x329/0x364
[<ffffffff80245ea1>] mempool_alloc_slab+0x11/0x13
[<ffffffff802367b3>] up_read+0x9/0xb
[<ffffffff8024c638>] shrink_slab+0x13f/0x151
[<ffffffff8024cc1c>] try_to_free_pages+0x111/0x209
[<ffffffff8024859a>] __alloc_pages+0x1b1/0x2f5
[<ffffffff80243f6b>] read_cache_page_async+0x7e/0x15c
[<ffffffff8027fba9>] blkdev_readpage+0x0/0x15
[<ffffffff80245612>] read_cache_page+0xe/0x46
[<ffffffff802c2842>] bdev_read+0x61/0xee
[<ffffffff802bc741>] __logfs_gc_pass+0x219/0x7dc
[<ffffffff802bcd1b>] logfs_gc_pass+0x17/0x19
[<ffffffff802c0798>] logfs_flush_dirty+0x7d/0x99
[<ffffffff802c0800>] logfs_get_wblocks+0x4c/0x54
[<ffffffff802c025c>] logfs_write_buf+0x3d/0x322
[<ffffffff802bbe1e>] logfs_commit_write+0x77/0x7d
[<ffffffff80244ec2>] generic_file_buffered_write+0x49d/0x62c
[<ffffffff802704da>] file_update_time+0x7f/0xad
[<ffffffff802453a5>] __generic_file_aio_write_nolock+0x354/0x3be
[<ffffffff80237077>] atomic_notifier_call_chain+0xf/0x11
[<ffffffff80245abb>] filemap_fault+0x1b4/0x320
[<ffffffff80245473>] generic_file_aio_write+0x64/0xc0
[<ffffffff8025ebc8>] do_sync_write+0xe2/0x126
[<ffffffff80224b4f>] release_console_sem+0...

To: Jörn <joern@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>, Nick Piggin <npiggin@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 7, 2008 - 4:28 am

It's not the right fix, generally GFP_FS is fine here. So do that in
logfs when you cannot traverse back into the fs, eg

mapping_gfp_mask(mapping) & ~__GFP_FS;

locally.

--
Jens Axboe

--

To: Jens Axboe <jens.axboe@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>, Nick Piggin <npiggin@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 7, 2008 - 5:10 am

struct address_space *mapping;

/* Prevent bdev from calling back into fs */
mapping = &logfs_super(sb)->s_bdev->bd_inode->i_data;
mapping_set_gfp_mask(mapping, mapping_gfp_mask(mapping) & ~__GFP_FS);

bd_inode has an interesting comment:
struct inode * bd_inode; /* will die */

Should I be worried about that? It seems to predate git history, so
I'm not too concerned about immediate changes.

Jörn

--
Unless something dramatically changes, by 2015 we'll be largely
wondering what all the fuss surrounding Linux was really about.
-- Rob Enderle
--

To: Jörn <joern@...>
Cc: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>, Nick Piggin <npiggin@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 7, 2008 - 5:17 am

I'd just ignore it, it's widely used anyway...

--
Jens Axboe

--

To: <linux-kernel@...>, <linux-fsdevel@...>, <linux-mtd@...>
Cc: Nick Piggin <npiggin@...>, Jens Axboe <jens.axboe@...>, David Woodhouse <dwmw2@...>
Date: Thursday, April 3, 2008 - 1:13 pm

/me cannot count that high. Bah!

Jörn

--
A defeated army first battles and then seeks victory.
-- Sun Tzu
--

Previous thread: [REGRESSION] BCM4306 cannot associate with any version of b43 driver by Gabriel A. Devenyi on Thursday, April 3, 2008 - 11:15 am. (2 messages)

Next thread: [patch 1/15] Makefiles and Kconfig by joern on Tuesday, April 1, 2008 - 2:13 pm. (7 messages)