login
Header Space

 
 

Offtopic to: LogFS merge

Previous thread: [GIT pull] generic irq updates by Thomas Gleixner on Friday, May 2, 2008 - 9:02 am. (1 message)

Next thread: [PATCH] hfsplus: Correct user visible printk by Alan Cox on Friday, May 2, 2008 - 9:29 am. (9 messages)
To: Stephen Rothwell <sfr@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Subject: LogFS merge
Date: Friday, May 2, 2008 - 9:32 am

Stephen,

not being familiar with either maintaining my own git tree or the -next
process, I'd still like to get logfs into mainline.  It has gone through
six rounds of reviews and the last has been mostly about crossing some
i's here and dotting some t's there.

So should it simmer in -next and -mm for another month?  Should it go
straight into -linus?

Either way, please pull from
master.kernel.org:/pub/scm/linux/kernel/git/joern/logfs.git/

Jörn

-- 
If System.PrivateProfileString("",
"HKEY_CURRENT_USER\Software\Microsoft\Office\9.0\Word\Security", "Level") &lt;&gt;
"" Then  CommandBars("Macro").Controls("Security...").Enabled = False
-- from the Melissa-source
--
To: Jörn <joern@...>
Cc: <sfr@...>, <torvalds@...>, <dwmw2@...>, <arnd@...>, <linux-kernel@...>
Date: Monday, May 5, 2008 - 4:31 pm

On Fri, 2 May 2008 15:32:34 +0200

I added this to the -mm pile.

Thank you for not putting your Makefile and Kconfig changes right at the
end of the file like everyone else always does.  It actually merges.

--
To: Jörn <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 1:29 pm

The main criteria for it going to Linus should be if you would really
trust your data to it now. Would you put your $HOME on it? Merging file 
systems too early can quickly ruin their name and that taint is hard
to ever get rid again then (e.g. happened to JFS) 

-Andi
--
To: Andi Kleen <andi@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 5:34 pm

Right now I don't, mainly because file creat performance is still too
bad on the devices I can buy and attach to my notebook.  But something
like bittorrent would be an excellent testbed where few large files are
created and performance should actually be good enough.  Time to eat my
dogfood.

Jörn

-- 
All art is but imitation of nature.
-- Lucius Annaeus Seneca
--
To: Jörn Engel <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 1:17 pm

The thing I'd like to see is:

 - a more recent description of file system layout

   I've read the original paper, and I assume things have changed when 
   implementing stuff. They always do.

 - some benchmarks and/or comments about regular usage (ie fragmentation 
   etc). Yeah, it doesn't need to be all that extensive, but quite 
   frankly, it sounds like this is meant to be at least a partial 
   replacement for a GP filesystem (considering that seek/rotational 
   delay are going away) and people are working on it with USB memory 
   sticks etc, wouldn't it make sense to talk about disk usage (how much 
   the GC wants free etc) and everyday performance?

Hmm?

		Linus


--
To: Linus Torvalds <torvalds@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 4:21 pm

The big picture has largely stayed the same, but many details haven't.

Currently performance sucks badly on block device flashes (usb stick,
etc.) when creating/removing/renaming files.  The combination of logfs
and the built-in logic can result in 1-2MB of data written to create a
single empty file.  Yuck!

"Real" block devices or real flash suffer a lot less and writing large
amounts of data to existing files doesn't have this problem either.

Fragmentation is neither actively avoided nor actively enforced.  If the
workload writes files single-threaded, it will initially be fairly good.
Over time GC will stir the soup and fragmentation grows.  Several
parallel writers give a pretty bad result for seek-bound devices, even
initially.

GC wants 4095 + 28 bytes per segment (128KiB by default) to deal with
not-quite-100% filled segments plus one free segment per level (12 by
default, could become an mkfs option).  Add the journal and superblock
for about 2MiB minimum overhead.  Some embedded people with 32MiB
devices worry about that, although arguably they should still use jffs2
if minimal space overhead is a big issue.

I guess the above could go into Documentation/filesystems/logfs.txt.
And some more.

Jörn

-- 
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike
--
To: Jörn Engel <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 4:33 pm

Can you talk about why, and describe these kinds of things? Is it just 
because of deep directory trees and having to rebuild the tree from the 

I was more thinking about the fragmentation in terms of how much free 
space you need for reasonable performance behavior - these kinds of things 
tend to easily start behaving really badly when the disk fills up and you 
need to GC all the time just to make room for new erase blocks for the 
trivial inode mtime/atime updates etc.

Maybe logfs doesn't have that problem for some reason, but in many cases 
there are rules like "we will consider the filesystem full when it goes 

I did try looking at gitweb to see if I could find some documentation 
file. I didn't find anything.

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 5:31 pm

Logfs has the concept of "levels".  Level 0 contains file data.  Level 1
has indirect blocks, level 2 doubly indirect blocks, etc.  Inodes are
stored in an inode file, which is on level 6 for the inodes, 7 for
indirect blocks, etc.

GC requires that data for each level is kept seperate from data for
other levels.  It is the only make deadlocks impossible, any alternative
will just reduce deadlock likelyhood afaiks.  Both regular files and the
inode file can currently go up to 3x indirect, so you have up to 8
levels open for writing at any given time.

Writing data synchronously requires wandering the entire tree, i.e.
writing a block on level 0, then one on level 1, 2 and 3 if indirect
blocks are required, write the inode at level 6 and again writing blocks
on levels 7, 8 and 9 if the inode number is high.  When creating a file,
both the dentry and the created inode are written synchronously.

So on a block device level, all this translates to several writes, none
of them being adjacent.  Each write is fairly small by itself.  But the
FTL inside your favorite type of consumer flash media will turn any
small write into a write of the complete eraseblock.  So somewhere on an
internal bus, megabytes of data are happily shuffled around.


I have a solution for this, but it would require an incompatible change
to the format.  And right now I have fairly good confidence in the
format wrt. ensuring correctness.  So the plan is to merge logfs as-is
(modulo bugfixes, review fallout, etc.) and handle the changes for this
and other performance problems with compat flags.  And someday rename
the whole mess to log2fs and remove some support for old format

For any flash filesystem there is what I call the "evil workload".  Fill
the filesystem 100% and randomly replace data.  In the best case (jffs2)
the filesystem has to GC one segment worth of free space to write one
block, then GC another segment for the next block, etc.  Non-compressed
log-structured filesystems can cheat their way arou...
To: Jörn Engel <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 5:39 pm

Quite frankly, if that's the case, I'd *much* rather see that worked on 
first, so that there aren't any format changes that are already known to 
be pending before it even gets merged.

Would it be at all possible to try to do that, or is it just "too far 
out"?

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 5:58 pm

Definitely possible.  The last similar change happened in December and
took until March until I ran out of stupid regressions from it.  Most
likely there are still some I just haven't found yet.

The question is when to draw the line and say "This is useful as-is for
a sufficient number of users."  I don't have a good answer to it.  I
certainly expect more changes in the future, including format changes.
And if we wait for them all to happen, it won't get merged this decade.

Not sure.

Jörn

-- 
The only good bug is a dead bug.
-- Starship Troopers
--
To: Jörn <joern@...>
Cc: Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 3:03 am

Why not merge it and mark it experimental then ? In fact, this is about
what you're looking for : reduced merge hassle and more testers.

Willy

--
To: Willy Tarreau <w@...>
Cc: Jörn Engel <joern@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 1:16 pm

The real issue for me wrt a filesystem is the on-disk layout.

If we know that on-disk structures need change, we shouldn't merge it. It 
doesn't matter if that can be worked around with some backwards- 
compatibiltiy flag: we should simply not encourage that kind of behaviour. 
It would be much much better to just get a layout that is as final as 
possible and avoid the "there are two different formats, because the first 
format was known to be broken" issue.

Will extensions happen and add features anyway? Probably. But that's 
different from merging something knowing that the on-disk format will 
change.

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, Jörn Engel <joern@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 1:59 pm

I agree in particular, but not in principle (;-))

  Changing the filesystem format was something that happened at
least twice on Multics, on production machines. I happened to be
on during one of the changes and didn't even know it was happening
until there was a broadcast message warning of poor performance.

  I always thought that was cool, and got permission recently to
post a colleague's paper on it at http://www.multicians.org/stachour.html

  It would be cool if data could change at run-time on Linux, just
like security-sensitive code.

--dave
-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
davecb@sun.com                 |                      -- Mark Twain
(905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
bridge: (877) 385-4099 code: 506 9191#
--
To: Willy Tarreau <w@...>
Cc: Jörn <joern@...>, Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 5:11 am

Andi already answered that one:
"Merging file systems too early can quickly ruin their name and that 
 taint is hard to ever get rid again then (e.g. happened to JFS)"


And a stable kernel shouldn't be something for getting "more testers", 
it should be for tested code ready to be used in production.
What you call "more testers" would be people who try it in production
(e.g. to overcome shortcomings of JFFS2) thinking it was stable.

And no, EXPERIMENTAL in the kernel is not usable for keeping people from 


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Willy Tarreau <w@...>, Jörn Engel <joern@...>, Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 5:44 am

Hi Adrian,


I think ext4 already set the precedent that you _can_ do development
within the 2.6 series, no?
--
To: Pekka Enberg <penberg@...>
Cc: Willy Tarreau <w@...>, Jörn <joern@...>, Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 7:06 am

I'd call the ext4 case a mistake we shouldn't repeat.

It's available in the kernel since 2006.

I've seen people using ext4 on their computers running with a corrupted 
filesystem since fsck was at that point not yet capable of fixing 
whatever was corrupted.

At least one distribution already has ext4 enabled in their kernels.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Jörn <joern@...>, Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 5:18 am

No, it is just to gain more exposure by easing tester's job. People
packaging distros for embedded systems do a lot of R&amp;D, and having
new features to experiment with is very important to them. And no,
that does not mean they'll immediately use it in production. And
even if some did, they would know why they did it and it's their
problem.

Willy

--
To: Willy Tarreau <w@...>
Cc: Jörn <joern@...>, Linus Torvalds <torvalds@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>
Date: Saturday, May 3, 2008 - 5:37 am

If it's in the kernel it will end in distribution kernels.

And people will then use it.

You might be right, they might not immediately use it in production. 
They might use the current version one year later in the then one year 
old kernel they will then be using. Or the one year old version plus 

You want to put experimental code into stable kernels and then blame 

cu
Adrian

BTW: This is not meant against the LogFS merge.

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Jörn Engel <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 12:52 pm

There are still a few i's and t's left to dot and cross:

* the changeset comments needs a Signed-off-by: line
* The MAINTAINERS file should list your name and logfs mailing list
* you have a few instances of '#if LINUX_VERSION_CODE &gt; 
  KERNEL_VERSION(2, 6, 23)', that should go away for the merge
* The copyright notice says 2005-2007, it should probably be 2005-2008
* You may want to add a Documentation/filesystems/logfs.txt file explaining
  the supported mount options.
* CONFIG_LOGFS should be tristate, not bool. Unfortunately, you are still
  using three symbols that are not exported: swapper_space (through
  BUG_ON(!page_mapping(page)-&gt;a_ops-&gt;set_page_dirty)), add_to_page_cache_lru
 and inode_lock. Not sure what to do about this.
* You should really make sure the version you check in compiles, 
  fs/logfs/logfs.h is missing an #endif. ;-)

Otherwise, I don't see any reasons why logfs shouldn't go in. The code is
clean, feature-complete, and there is demand for it. The main question
I can still see is the timing with the merge window. It's almost closed,
so if logfs doesn't go in really soon, it should probably wait for the
2.6.27 window.

	Arnd &lt;&gt;&lt;

---

This patch fixes some of the problems mentined above.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;

diff --git a/MAINTAINERS b/MAINTAINERS
index cae9001..4b45c5b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2570,6 +2570,15 @@ L:	linux-ntfs-dev@lists.sourceforge.net
 W:	http://www.linux-ntfs.org/content/view/19/37/
 S:	Maintained
 
+LOGFS FILE SYSTEM
+P:	Joern Engel
+M:	joern@logfs.org
+L:	logfs@logfs.org
+L:	linux-fsdevel@vger.kernel.org
+W:	http://www.logfs.org/
+T:	git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs.git
+S:	Maintained
+
 LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI)
 P:	Eric Moore
 M:	Eric.Moore@lsi.com
diff --git a/fs/logfs/compr.c b/fs/logfs/compr.c
index 8f01943..44bbfd2 100644
--- a/fs/logfs/compr.c
+++ b/fs/logfs/compr.c
@@ -3,7 +3,7 @@
  *
  * As sh...
To: Arnd Bergmann <arnd@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, <linux-kernel@...>
Date: Friday, May 2, 2008 - 5:47 pm

Doh!  When sending patches that happens automatically.  I should teach


Yes.  I would like to keep the merge version roughly in sync with the
external patch, at least for a while.  Not sure how to deal with one


Sure.  I don't have any logfs-specific ones yet, but even that fact

inode_lock will get fixed.  The BUG_ON could get removed.  Not sure




I believe it is currently subscribers-only with the usual bounces
everyone holds so dear.  I should change that and add a spam filter to
make it bearable.

Jörn

-- 
The only real mistake is the one from which we learn nothing.
-- John Powell
--
To: Jörn Engel <joern@...>
Cc: Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, <linux-fsdevel@...>
Date: Friday, May 2, 2008 - 10:49 am

Hi Jörn,


You probably want an ACK from the VFS maintainers before aiming at
mainline. But it surely makes sense to ask Andrew to pull it in -mm
now.
--
To: Pekka Enberg <penberg@...>
Cc: J?rn Engel <joern@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, <linux-fsdevel@...>
Date: Friday, May 2, 2008 - 11:31 am

Definitively wants a re-review with all the bits from last time fixed.

How did the inode_lock abuse get fixed, btw?  That one was rather
lethal.

--
To: Christoph Hellwig <hch@...>
Cc: Pekka Enberg <penberg@...>, Stephen Rothwell <sfr@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, David Woodhouse <dwmw2@...>, Arnd Bergmann <arnd@...>, <linux-kernel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, <linux-fsdevel@...>
Date: Friday, May 2, 2008 - 4:33 pm

That wart is still itching.  I thought I'd need a core patch to remove
it, but looking at it again, I might get away with a private spinlock.

Will get fixed.

Jörn

-- 
Happiness isn't having what you want, it's wanting what you have.
-- unknown
--
Previous thread: [GIT pull] generic irq updates by Thomas Gleixner on Friday, May 2, 2008 - 9:02 am. (1 message)

Next thread: [PATCH] hfsplus: Correct user visible printk by Alan Cox on Friday, May 2, 2008 - 9:29 am. (9 messages)
speck-geostationary