Moin David, if I read the code correctly, JFFS2 will happily perform a NOP on sys_sync() again. And this time it appears to be Jens' fault. JFFS2 does not appear to set s_bdi anywhere. And as of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set. As a result, sync_fs() is never called for jffs2 and whatever remains in the wbuf will not make it to the device. The patch also adds a BUG_ON to catch this problem in any remaining or future offenders. I am not sure about network filesystems, but at least bdev- and mtd-based ones should be caught. Opinions? Jörn -- No art, however minor, demands less than total dedication if you want to excel in it. -- Leon Battista Alberti diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c index af8b42e..7c00319 100644 --- a/drivers/mtd/mtdsuper.c +++ b/drivers/mtd/mtdsuper.c @@ -13,6 +13,7 @@ #include <linux/mtd/super.h> #include <linux/namei.h> #include <linux/ctype.h> +#include <linux/slab.h> /* * compare superblocks to see if they're equivalent @@ -44,6 +45,7 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd) sb->s_mtd = mtd; sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index); + sb->s_bdi = mtd->backing_dev_info; return 0; } diff --git a/fs/super.c b/fs/super.c index f35ac60..e8af253 100644 --- a/fs/super.c +++ b/fs/super.c @@ -954,10 +954,12 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void if (error < 0) goto out_free_secdata; BUG_ON(!mnt->mnt_sb); + BUG_ON(!mnt->mnt_sb->s_bdi && + (mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)); - error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); - if (error) - goto out_sb; + error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); + if (error) + goto out_sb; /* * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE --
I think that BUG_ON() would be a lot better as a printk() and fail mount instead. There's really little point in killing the kernel for something you could easily warn about and recover nicely. -- Jens Axboe --
*shrug* The BUG_ON directly above is not qualitatively different. In both cases the only solution is to find and fix the bug in some other file, recompile and try again. But ultimately I don't care, as long as we catch the bug before people lose their data. Feel free to respin this patch. You caused the problem in the first place. ;) For the record, while I consider the two-liner that causes this mess a real turd, the rest of your patch was brilliant. A shame I didn't catch this earlier. Jörn -- You ain't got no problem, Jules. I'm on the motherfucker. Go back in there, chill them niggers out and wait for the Wolf, who should be coming directly. -- Marsellus Wallace --
Thanks, we definitely should have put a debug statement to catch this in from day 1, good debugging should be an important part of any new infrastructure. Care to send your jffs2 patch separately to David? Then I'll commit a modified variant for complaining about missing ->s_bdi on mount. -- Jens Axboe --
Sure. David, this patch is untested. It looks trivially correct and fixes a nasty bug, but I don't test jffs2 and only noticed the problem in passing. Jörn -- Maintenance in other professions and of other articles is concerned with the return of the item to its original state; in Software, maintenance is concerned with moving an item away from its original state. -- Les Belady diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c index af8b42e..7c00319 100644 --- a/drivers/mtd/mtdsuper.c +++ b/drivers/mtd/mtdsuper.c @@ -44,6 +45,7 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd) sb->s_mtd = mtd; sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index); + sb->s_bdi = mtd->backing_dev_info; return 0; } --
Woke up early and had another look at this. Looks like a much more widespread problem. Based on a quick grep an uncaffeinated brain: 9p no s_bdi afs no s_bdi ceph creates its own s_bdi cifs no s_bdi coda no s_bdi ecryptfs no s_bdi exofs no s_bdi fuse creates its own s_bdi? gfs2 creates its own s_bdi? jffs2 patch exists logfs fixed now ncpfs no s_bdi nfs creates its own s_bdi ocfs2 no s_bdi smbfs no s_bdi ubifs creates its own s_bdi I excluded all filesystems that appear to be read-only, block device based or lack any sort of backing store. So there is a chance I have missed some as well. Jörn -- Simplicity is prerequisite for reliability. -- Edsger W. Dijkstra --
Linus, I think this is bad enough that you should be involved. 32a88aa1 broke a number of filesystems in a way that sync() would return 0 without doing any work. Even politicians are better at keeping the promises. This is caused by the two-liner in __sync_filesystem: if (!sb->s_bdi) return 0; s_bdi is set implicitly for all filesystems using set_bdev_super(), so most block device based filesystems are safe. There are, however, a number of odd-balls around: Obviously this list should get checked and all affected filesystems get repaired. Additionally we should add an assertion and BUG() or refuse to mount or something. My original patch to that extend was this: diff --git a/fs/super.c b/fs/super.c index f35ac60..e8af253 100644 --- a/fs/super.c +++ b/fs/super.c @@ -954,6 +954,8 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void if (error < 0) goto out_free_secdata; BUG_ON(!mnt->mnt_sb); + BUG_ON(!mnt->mnt_sb->s_bdi && + (mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)); error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); if (error) goto out_sb; The real problem is finding a condition that has neither false positives nor false negatives. The "(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)" part takes care of false positives like tmpfs, but it would catch none of the network filesystems. Should we instead annotate tmpfs and friends with something like sb->s_dont_need_bdi? It is the only way I can think of not to miss something. Jörn -- People will accept your ideas much more readily if you tell them that Benjamin Franklin said it first. -- unknown --
Umm. Why not just remove the two-liner? It was incorrect. The comment says "this should be safe", and if it wasn't, then the commit that caused this all was total crap to begin with. Why even discuss any other measures? Linus --
Grr. Ok, so we do need it. Because Jens made it not work without it, and didn't fix up the filesystems, just added a random comment saying "we shouldn't need to". Double-grr. I hate misleading comments. It makes the patch and the code look like people knew what they were doing. Jens - please help fix this up. Linus --
Yeah sorry, that part was apparently not well thought through. It did go Of course, I already posted a series of patches to fix this up. I want to test them a bit, and I'll send them in tomorrow. -- Jens Axboe --
How about something like this to catch future cases? It compiles and survived a test boot, so it does seem to work for the common cases like tmpfs, procfs, etc. Jens, you know the bdi code 10x better than me, would this work? Jörn -- ticks = jiffies; while (ticks == jiffies); ticks = jiffies; -- /usr/src/linux/init/main.c noop_backing_dev_info is used only as a flag to mark filesystems that don't have any backing store, like tmpfs, procfs, spufs, etc. Signed-off-by: Joern Engel <joern@logfs.org> diff --git a/fs/super.c b/fs/super.c index f35ac60..dc72491 100644 --- a/fs/super.c +++ b/fs/super.c @@ -693,6 +693,7 @@ int set_anon_super(struct super_block *s, void *data) return -EMFILE; } s->s_dev = MKDEV(0, dev & MINORMASK); + s->s_bdi = &noop_backing_dev_info; return 0; } @@ -954,10 +955,11 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void if (error < 0) goto out_free_secdata; BUG_ON(!mnt->mnt_sb); + BUG_ON(!mnt->mnt_sb->s_bdi); - error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); - if (error) - goto out_sb; + error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); + if (error) + goto out_sb; /* * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE diff --git a/fs/sync.c b/fs/sync.c index fc5c3d7..92b2281 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -14,6 +14,7 @@ #include <linux/pagemap.h> #include <linux/quotaops.h> #include <linux/buffer_head.h> +#include <linux/backing-dev.h> #include "internal.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ @@ -32,7 +33,7 @@ static int __sync_filesystem(struct super_block *sb, int wait) * This should be safe, as we require bdi backing to actually * write out data in the first place */ - if (!sb->s_bdi) + if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info) return 0; if (sb->s_qcop && sb->s_qcop->quota_sync) diff --git a/include/linux/backing-dev.h ...
Looks sane, it's a good start. I think we should augment that with a
check to ensure that we don't ever add dirty inodes to this bdi, since
it's not going to be flushed.
Something like a:
WARN_ON(bdi == &noop_backing_dev_info);
to __mark_inode_dirty(). Looking at the code it should already trigger a
warning, since it'll check for BDI_CAP_NO_WRITEBACK (which isn't set
for noop_backing_dev_info) and the fact that noop-bdi isn't registered
to begin with.
So it's probably safe and good enough as-is, I'll add it. Thanks!
--
Jens Axboe
--
I cannot see this patch in your tree yet. Could be the weekend or a deliberate decision not to send this for 2.6.34-rc anymore. In case it was a deliberate decision, can we please make it explicit? I don't like the idea of adding a BUG_ON() that potentially triggers for thousands of people this late in the stabilization process - but it is better than having people lose data. Even if we already ran two stable kernels that way. Damned if you do, damned if you don't. :( Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt --
It's there, I put it in yesterday. It's definitely 2.6.34-rc material, I Yeah, it's a bad situation to be in. I changed that BUG_ON() to a WARN_ON(). I'm not too worried about that part, I'm more worried about the file system changed. OTOH, they do lack proper flushing now, so it's likely not a huge risk from that perspective. -- Jens Axboe --
It is worse still. Using mtd->backing_dev_info results in kernel BUG at fs/fs-writeback.c:157 which is BUG_ON(!work->seen); in bdi_queue_work(). Logfs is affected and I bet jffs2 is as well. So much for dwmw2 or me actually testing the fix. :( I did a hexdump to see what sb->s_bdi actually contained and the result was this: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 ...
The important bit is that each bdi must be initialized and registered if it's going to be handling dirty data, it can't just be a static placeholder. See the bdi_setup_and_register() helper I added. -- Jens Axboe --
Took a quick look, and you want bdi_setup_and_register() for the three bdis listed in mtdbdi.c in mtdcore.c:init_mtd(). Or manual bdi_init() and bdi_register(). I'll take a look post-dinner. Either is workable, but since the flags are already setup, the latter may be cleaner. -- Jens Axboe --
Ok, here are two patches that appear to solve the problem for me. Jörn -- Courage is not the absence of fear, but rather the judgement that something else is more important than fear. -- Ambrose Redmoon Removes one .h and one .c file that are never used outside of mtdcore.c. Signed-off-by: Joern Engel <joern@logfs.org> --- drivers/mtd/Makefile | 2 +- drivers/mtd/devices/drais.c | 2 +- drivers/mtd/internal.h | 17 ----------------- drivers/mtd/mtdbdi.c | 43 ------------------------------------------- drivers/mtd/mtdcore.c | 33 ++++++++++++++++++++++++++++++++- 5 files changed, 34 insertions(+), 63 deletions(-) diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile index 82d1e4d..4521b1e 100644 --- a/drivers/mtd/Makefile +++ b/drivers/mtd/Makefile @@ -4,7 +4,7 @@ # Core functionality. obj-$(CONFIG_MTD) += mtd.o -mtd-y := mtdcore.o mtdsuper.o mtdbdi.o +mtd-y := mtdcore.o mtdsuper.o mtd-$(CONFIG_MTD_PARTITIONS) += mtdpart.o obj-$(CONFIG_MTD_CONCAT) += mtdconcat.o diff --git a/drivers/mtd/devices/drais.c b/drivers/mtd/devices/drais.c index b93ff3f..6f81b28 100644 --- a/drivers/mtd/devices/drais.c +++ b/drivers/mtd/devices/drais.c @@ -33,7 +33,7 @@ #include <linux/sched.h> #include "drais.h" -#undef DEBUG_ERASE +#define DEBUG_ERASE #define DRAIS_BB_MAGIC 0x3911b26ba5a931d8ull diff --git a/drivers/mtd/internal.h b/drivers/mtd/internal.h index c658fe7..e69de29 100644 --- a/drivers/mtd/internal.h +++ b/drivers/mtd/internal.h @@ -1,17 +0,0 @@ -/* Internal MTD definitions - * - * Copyright © 2006 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -/* - * mtdbdi.c - */ -extern struct ...
Otherwise we hit a BUG_ON in bdi_queue_work().
Signed-off-by: Joern Engel <joern@logfs.org>
---
drivers/mtd/mtdcore.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index cb4858b..8dd3e46 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -299,7 +299,7 @@ static struct device_type mtd_devtype = {
int add_mtd_device(struct mtd_info *mtd)
{
- int i;
+ int i, err;
if (!mtd->backing_dev_info) {
switch (mtd->type) {
@@ -322,6 +322,12 @@ int add_mtd_device(struct mtd_info *mtd)
if (!mtd_table[i]) {
struct mtd_notifier *not;
+ err = bdi_register(mtd->backing_dev_info, NULL, "mtd%d",
+ i);
+ if (err) {
+ /* We lose the errno information :( */
+ break;
+ }
mtd_table[i] = mtd;
mtd->index = i;
mtd->usecount = 0;
@@ -692,6 +698,15 @@ static int __init init_mtd(void)
int ret;
ret = class_register(&mtd_class);
+ ret = bdi_init(&mtd_bdi_unmappable);
+ if (ret)
+ return ret;
+ ret = bdi_init(&mtd_bdi_ro_mappable);
+ if (ret)
+ return ret;
+ ret = bdi_init(&mtd_bdi_rw_mappable);
+ if (ret)
+ return ret;
if (ret) {
pr_err("Error registering mtd class: %d\n", ret);
return ret;
--
1.6.2.1
--
Do the bdi_register() here as well. -- Jens Axboe --
-ENOBRAIN Initially I wanted to do just than. Then I looked at the block layer and thought we could create one backing_dev_info per mtd as well. Not necessarily a bad if I would actually _create_ them and not just reuse the same ones over and over again. Jörn -- Linux [...] existed just for discussion between people who wanted to show off how geeky they were. -- Rob Enderle --
I cooked up that patch myself, here: http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0661b1ac5d48eb47c8a5948c0554fea... Care to give it a quick spin? -- Jens Axboe --
I Jens, I'm Paolo Minazzi and I have a problem with logfs and 2.6.34rc5. Can I apply the patch to rc5 and make a test on my ARM board ? Paolo --
Yes please do, below are the collected fixes for mtd in this area. It should apply to recent -git just fine. diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile index 82d1e4d..4521b1e 100644 --- a/drivers/mtd/Makefile +++ b/drivers/mtd/Makefile @@ -4,7 +4,7 @@ # Core functionality. obj-$(CONFIG_MTD) += mtd.o -mtd-y := mtdcore.o mtdsuper.o mtdbdi.o +mtd-y := mtdcore.o mtdsuper.o mtd-$(CONFIG_MTD_PARTITIONS) += mtdpart.o obj-$(CONFIG_MTD_CONCAT) += mtdconcat.o diff --git a/drivers/mtd/internal.h b/drivers/mtd/internal.h index c658fe7..e69de29 100644 --- a/drivers/mtd/internal.h +++ b/drivers/mtd/internal.h @@ -1,17 +0,0 @@ -/* Internal MTD definitions - * - * Copyright © 2006 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -/* - * mtdbdi.c - */ -extern struct backing_dev_info mtd_bdi_unmappable; -extern struct backing_dev_info mtd_bdi_ro_mappable; -extern struct backing_dev_info mtd_bdi_rw_mappable; diff --git a/drivers/mtd/mtdbdi.c b/drivers/mtd/mtdbdi.c index 5ca5aed..e69de29 100644 --- a/drivers/mtd/mtdbdi.c +++ b/drivers/mtd/mtdbdi.c @@ -1,43 +0,0 @@ -/* MTD backing device capabilities - * - * Copyright © 2006 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include <linux/backing-dev.h> -#include <linux/mtd/mtd.h> -#include "internal.h" - -/* - * backing device capabilities for non-mappable devices (such as NAND flash) - * ...
Works for me. Acked-and-tested-by: Joern Engel <joern@logfs.org> Jörn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack --
Goodness, thanks for testing. Now we are pretty close to have something we can send out. -- Jens Axboe --
I have tried this patch.
I have enabled LOGFS, but not mounted partition with it.
/dev/mtdblock1 is my romfs root partition and it is OK.
The problem is that init cannot mount my /dev/mtdblock1 romfs root.
This is the fault :
Platform: Cirrus Logic EDB9315A Board (ARM920T) Rev A
Copyright (C) 2000, 2001, 2002, Red Hat, Inc.
|----------------------------------------------
Raw file loaded 0x00080000-0x001dce6b, assumed entry at
0x00080000-----------------------------------------------------------
RedBoot> exec -s 0x00b00000 -r 0x00a00000 -c "root=/dev/mtdblock1
console=ttyAM console=tty1"--------------------------------
ENTRY=0xc0008000-------------------------------------------------------------------------------------------------------------
LENGTH=0x00300000------------------------------------------------------------------------------------------------------------
BASE_ADDR=0x00080000---------------------------------------------------------------------------------------------------------
Using base address 0x00080000 and length
0x00300000--------------------------------------------------------------------------
Uncompressing Linux... done, booting the kernel.
Linux version 2.6.34-rc5 (root@darkstar) (gcc version 3.4.3) #18 Tue
Apr 27 10:55:55 CEST 2010
CPU: ARM920T [41129200] revision 0 (ARMv4T), cr=c0007177
CPU: VIVT data cache, VIVT instruction cache
Machine: Cirrus Logic EDB9315A Evaluation Board
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
Kernel command line: root=/dev/mtdblock1 console=ttyAM console=tty1
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 126928k/126928k available, 4144k reserved, 0K highmem
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xfff00000 - 0xfffe0000 ...So this issue looks unrelated, but a bug none the less (if just enabling logfs breaks the mount). -- Jens Axboe --
To be precise, I say : - downloaded 2.6.34rc5 - apply the jens patch - the logfs code is the code in the 2.6.34rc5 (I have no applied other jorn patch) Paolo --
(please stop top posting, I fixed this one up for you). Just to be on the safe side - without the patch, does it work or not? -- Jens Axboe --
Without the patch for me means the standard 2.6.34rc5. In this case : - enabling logfs I have an oop in fs/write-back.c:157 before kernel calls init. - disabling logfs I can do normal mount my root /dev/mtdblock1 This is my experience. Paolo --
If you add "rootfstype=romfs" to the command line, does the problem still exist? Jörn -- And spam is a useful source of entropy for /dev/random too! -- Jasmine Strong --
Jorn , you are right. It seems work.... please wait.... --
Ok, I'm pretty sure that logfs returns -EIO where it should return -EINVAL. If filesystems are tried in alphabetical order, logfs comes first and -EIO tells the kernel to stop trying and panic, essentially. Will cook up a patch... Jörn -- A defeated army first battles and then seeks victory. -- Sun Tzu --
Does the patch below solve the problem for you (without the explicit "rootfstype=romfs")? Jörn -- One of my most productive days was throwing away 1000 lines of code. -- Ken Thompson. Signed-off-by: Joern Engel <joern@logfs.org> --- fs/logfs/super.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/logfs/super.c b/fs/logfs/super.c index 5866ee6..f649038 100644 --- a/fs/logfs/super.c +++ b/fs/logfs/super.c @@ -413,7 +413,7 @@ static int __logfs_read_sb(struct super_block *sb) page = find_super_block(sb); if (!page) - return -EIO; + return -EINVAL; ds = page_address(page); super->s_size = be64_to_cpu(ds->ds_filesystem_size); -- 1.6.2.1 --
The above patch solve the problem for me. Now I can mount root without rootfstype=romfs Bye, Paolo --
Ok, now it is very better. My problem now is that logfs does not save files in the following case: 1) power-off, then power-on (without umount) 2) sync, then power-off, then power-on (without umount) Using umount sometimes seems works, sometimes I have the following oop : ~ # mount -t logfs /dev/mtdblock7 /mitrolbackup ~ # cd /mitrolbackup /mitrolbackup # echo ciao > ciao /mitrolbackup # echo pino > pino /mitrolbackup # ls ciao pino /mitrolbackup # cd .. ~ # umount /mitrolbackup/ kernel BUG at fs/logfs/readwrite.c:1976! Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c7db0000 [00000000] *pgd=c7dac031, *pte=00000000, *ppte=00000000 Internal error: Oops: 817 [#1] last sysfs file: Modules linked in: CPU: 0 Not tainted (2.6.34-rc5 #19) PC is at __bug+0x20/0x2c LR is at release_console_sem+0x1b8/0x1ec pc : [<c003a1ac>] lr : [<c004760c>] psr: 60000013 sp : c7d5fe60 ip : c7d5fd98 fp : c7d5fe6c r10: 00000003 r9 : 00000000 r8 : c7d5feb4 r7 : 00000003 r6 : c7d5feb4 r5 : c7819878 r4 : c7819870 r3 : 00000000 r2 : 60000013 r1 : 000019c2 r0 : 0000002f Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: c000717f Table: c7db0000 DAC: 00000015 Process umount (pid: 307, stack limit = 0xc7d5e260) Stack: (0xc7d5fe60 to 0xc7d60000) fe60: c7d5fe7c c7d5fe70 c00fb2f0 c003a19c c7d5fe90 c7d5fe80 c00ad358 c00fb274 fe80: c7819870 c7d5feb0 c7d5fe94 c00ad3e4 c00ad2bc c7d46a64 c7d46a54 c7819880 fea0: c7d46a64 c7d5fee4 c7d5feb4 c00ad5a8 c00ad3d0 c7d5feb4 c7d5feb4 c7d46a00 fec0: c7d5e000 c022a324 00000000 c7d5ff54 c7d2b880 00000000 c7d5ff00 c7d5fee8 fee0: c009c8ec c00ad4c0 c7d46a00 c7c14000 c7d2b880 c7d5ff1c c7d5ff04 c00feb38 ff00: c009c84c c7d46a00 c02b7b84 c7d2b880 c7d5ff34 c7d5ff20 c009c740 c00feb00 ff20: c7d2b880 c7d46a00 c7d5ff4c c7d5ff38 c00b0bd0 c009c708 00000000 c7d2b898 ff40: c7d5ff94 c7d5ff50 c00b1e90 c00b0b7c 00000000 c7d5ff54 c7d5ff54 c7d5ff5c ff60: c7d5ff5c c7d2b880 c7817500 be8e3e0c 00000000 be8e5fb4 ...
Ok, I'll have a look. Can we move this and any other logfs problems to a seperate thread? I'd like to leave this one for the sync issues and not spam everyone on Cc: with unrelated bugs. :) Jörn -- Don't patch bad code, rewrite it. -- Kernigham and Pike, according to Rusty --
Yes. Can you just remove that hunk or should I resend? Jörn -- "Error protection by error detection and correction." -- from a university class --
I can kill that hunk, no problem. -- Jens Axboe --
It's funky, I was pretty sure there was/is code to set a default bdi for non-bdev file systems. It appears to be missing, that's not good. So options include: - Add the appropriate per-sb bdi for these file systems (right fix), or - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least ensure that data gets flushed (quick fix) I'll slap together a set of fixes for this. -- Jens Axboe --
Here's a series for fixing these. At this point they are totally untested except that I did compile them. Note that your analysis appeared correct for all cases but ocfs2, which does use get_sb_bdev() and hence gets ->s_bdi assigned. You can see them here, I'll post the series soon: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus The first patch is a helper addition, the rest are per-fs fixups. -- Jens Axboe --
Do you want to include Jörn's addition of same to get_sb_mtd_set(), with my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ? -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Yeah will do. I also really wanted to provide a WARN and mount fail if we get it wrong, but I don't see an easy way to do that. Basically I'd want to check whether the storage backing is volatile or not. -- Jens Axboe --
Looks good at a cursory glance. What's still missing is some sort of assertion. You are a smart person and missed this problem, twice even. Even if you hadn't, a not so smart person can add a new filesystem and miss s_bdi, like I did. We want some automatism to catch this. Jörn -- "Security vulnerabilities are here to stay." -- Scott Culp, Manager of the Microsoft Security Response Center, 2001 --
I totally agree, we want some way to catch this problem in the future.
Really the check needs to be something ala:
if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount)
yell_and_fail;
but I'm not sure how best to accomplish that. We can check for ->s_bdev
and mtd like you did, but that does not catch network file systems for
instance.
--
Jens Axboe
--
One way would be to either add a flag to all safe filesystems or create a noop_bdi for them. It adds a line of code and some bytes[*] per instance to most filesystems, but that's the only catchall I can think of. I guess if noone comes up with a better plan I'll look into that. [*] Maybe we can steal a bit somewhere to make it less expensive. Jörn -- You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. -- Rob Pike --
