[RFC PATCH 16/26] UBIFS: add LEB properties tree

Previous thread: none

Next thread: Linux ATA support now has a wiki by Jeff Garzik on Thursday, March 27, 2008 - 9:19 am. (2 messages)
To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

Dear community,

here is a new flash file system developed by Nokia engineers with
help of the University of Szeged. The new file-system is called
UBIFS, which stands for UBI file system. UBI is the wear-leveling/
bad-block handling/volume management layer which is already in
mainline (see drivers/mtd/ubi).

The main objective of UBIFS is better performance and scalability
comparing to JFFS2 which is achieved by
a) implementing write-back (JFFS2 is write-through)
b) storing and maintaining the indexing file-system information
on the media (JFFS2 maintains it in RAM and builds it on each
mount, which requires full media scanning).

At the same time, UBIFS implements the nice features JFFS2 has -
compression and tolerance to unclean re-boots. Although UBIFS
borrowed basic ideas from JFFS2, it is completely different
file-system.

UBIFS is stable and very close to be production ready. It was
tested on OLPC and N810. The development was done on flash simulator
on a 2-way x86 machine. However, UBIFS needs a good review.

Note, UBIFS works on top of UBI, not on top of bare flash devices.
It delegates crucial things like garbage-collection and bad
eraseblock handling to UBI. One important thing to note is MLC
NAND flashes which tend to have very small eraseblock lifetime -
just few thousand erase-cycles (some have even about 3000 or less).
This makes JFFS2 random wear-leveling algorithm to be not good
enough. In opposite, UBI provides good wear-leveling based on
saved erase-counters.

There is also mkfs.ubifs user-space utility, so it is possible to
prepare UBIFS images. Please, see the URLs given at the end of this
letter.

UBIFS performs quite well - it gives very good write performance
because of write-back (write tests gave us ~100 times faster
performance which is clearly because of the caching) while giving
about the same performance as JFFS2 gives on synchronous operations.
Obviously, it is extremely difficult to compete with JFFS2 on
synchronous operations beca...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>, Andrew Morton <akpm@...>
Date: Friday, April 18, 2008 - 5:05 am

The mount time compared to JFFS2 on a 700MiB partition went down from

We did some initial stress tests and so far it looks pretty robust.

Great job !

Thanks,
tglx
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, <joern@...>
Date: Monday, March 31, 2008 - 8:29 am

And how does it compare to logfs?
--

To: Jan Engelhardt <jengelh@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, <joern@...>
Date: Monday, March 31, 2008 - 9:40 am

Both share similar design goals. Biggest difference is that ubifs works
on top of ubi and depends on ubi support, while logfs works on plain mtd
(or block devices) and does everything itself.

Code size difference is huge. Ubi weighs some 11kloc, ubifs some 30,
logfs some 8.

Ubi scales linearly, as it does a large scan at init time. It is still
reasonably fast, as it reads just a few bytes worth of header per block.
Logfs mounts in O(1) but will currently become mindbogglingly slow when
the filesystem nears 100% full and write are purely random. Not that
any other flash filesystem would perform well under these conditions -
it is the known worst case scenario.

Jörn

--
Victory in war is not repetitious.
-- Sun Tzu
--

To: ext Jan Engelhardt <jengelh@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 8:47 am

We don't know a lot about logfs, so you will really have to make
your own comparison. However our general impressions are as follows:

1. In our testing logfs file operations seem to be much slower,
see http://osl.sed.hu/wiki/ubifs/index.php/IOzone

2. logfs code base is much smaller i.e. UBIFS has 3-4 times as many
lines of code.

3. logfs does not seem to have bad-block handling.

4. logfs does not seem to have wear-leveling.

5. We are not certain how scalable logfs is.

We could be wrong about those things - don't flame us if we are.
Ask us about UBIFS, not logfs.
--

To: Adrian Hunter <ext-adrian.hunter@...>
Cc: ext Jan Engelhardt <jengelh@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 9:20 am

Shiny numbers! Performance has improved significantly in the last six

Bad blocks at mkfs time are handled, blocks turning bad later on aren't

It does.

Jörn

--
Fools ignore complexity. Pragmatists suffer it.
Some can avoid it. Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. 1982
--

To: Jörn Engel <joern@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 1:26 am

I've renamed the thread because I do not like this flamish discussion
to me mixed with the technical one.

We'll re-run them. Does logfs support write-back? Does it support compression?

This basically means it is unfinished. Handling dynamic bad blocks is a *must*
if you are going to work on NAND, especially on MLC NAND which are not as
reliable as SLC.

I think you should bluntly say about this when you submit patches to prevent
people from starting using it in production.

UBI handles I/O errors and gracefully recovers the data. And it will not be
easy to do this in LogFS at all. If you have write error in the middle of an
eraseblock, you have to recover data which means correcting all the indexing
data structures which refer the data you recover. Correcting them may require
garbage collection, because you have to update them out of place. And what
do you do if you are already in the middle of doing garbage-collection?

In other words, I believe it will take a lot of time and efforts to implement
this. And any speculation about the number of lines of code makes no sense
Could you please point the core functions which implement this and shortly
describe the algorithm?

I grep'ed for "wear" and "leveling" and found only one match. Where should
I look at?

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 5:19 am

Write-back of metadata: yes. Write-back of data: currently not.

You are right, bad-block handling is easier to implement in ubi than in
a filesystem. That is one of the real benefits from using ubi. And if
my other arguments about ubi would have been handled instead of answered

fs/logfs/gc.c
Grep for s_ec_list. It is currently quite aggressive and keeps the
spread between maximum erasecount and minimum erasecount at about 40.
Should become an mkfs option one of these days.

There is no dedicated subsystem for wear leveling. There simply isn't
that much to do.

Jörn

--
Premature optimization is the root of all evil.
-- Donald Knuth
--

To: Jörn Engel <joern@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 5:46 am

Taking into account this, bad block handling and other stuff I kindly ask you
not to do these "tricky" things like comparing amount of code lines.
I believe if you implement all the things, you'll have _way_ more code. I
believe you'll even have to re-design many things because write-back really
affects the design a lot. Let's be fair.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 7:16 am

Take those numbers with as much salt as you like. Implementing
write-back caching for metadata actually reduced the line count by 100
or so. And it did affect the design a lot, I agree.

Jörn

--
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown
--

To: Jörn Engel <joern@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 1:56 am

Please, lets refrain form unfair comparisons like this before logfs is
finished. Also, when you compare, please, take into account that UBI/UBIFS
I asked you some time ago to describe how you maintain per-eraseblock
space accounting [1]. E.g., how you select an eraseblock for garbage
collection, how do you store the accounting information.

You said you find eraseblocks by scanning. This means logfs is not
really scalable because you may spend ages before you find anything
appropriate. When the FS is almost full, yo need to scan nearly
whole flash to find an eraseblock? So if I mount a nearly full
FS, and start writing, I'll get my request handled when nearly whole
media is scanned?

UBIFS stores per-eraseblock information on the media in a B-tree, and
it also has lists of empty/dirty eraseblocks, which allow to very quickly
find the best eraseblock to garbage-collect or to write to.

[1] http://lkml.org/lkml/2007/8/8/333

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 5:25 am

Correct. I have a patch that caches this information in RAM, so the
worst-case scenario is to do the full scan once - if the filesystem is
near 100% full and performance is down the drain anyway. The goal is
obviously to store that information on the medium.

And you get the obvious catch-22. When writing out how much free space
exists in which blocks, you occupy some free space in one of those
blocks and obsolete some in another. So by writing the data you have
just written changes and should be written again.

So how do you solve the catch-22?

Jörn

--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan
--

To: Jörn Engel <joern@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 5:39 am

Anyway, if you do not store this information on the flash, you are going
to scan to acquire it. You may have caches, but they do not have to have
information (may be empty), then you scan. This will lead to scalability

I'm not sure what you mean. In UBIFS we have lprops area, where we store
per-LEB accounting. UBI lets it possible to have it on a fixed position.
The accounting is a separate B-tree. Lprops area has its own independent
garbage collector. You should probably refer the white paper for more
information.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 6:51 am

Fair enough.

The obvious downside of all this is depending on UBI, which has a linear
scan. My goal was to avoid the linear scan completely. It is a harder
goal and I haven't reached it yet. Imo it is reachable and I will
continue going in that direction.

You picked the route of using UBI, which makes a lot of stuff easier.
It is a fair approach and I don't mind you taking it. It has drawbacks,
but so has everything else.

Jörn

--
Anything that can go wrong, will.
-- Finagle's Law
--

To: Jörn Engel <joern@...>
Cc: Artem Bityutskiy <dedekind@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 7:17 am

Yes, it was our core design decision. One of the reasons, we were not sure
this is technically possible to do on bare flashes. I mean, it just looked
so complex to have all in one, so we figured that was a good split, where
you can cut on big work on two smaller separate ones. The benefit of this
is obvious - we have created a complete system, which is not perfect though
and have scalability issues.

Our point is that UBI is scalable enough for the time being.

I wrote some documentation about this in UBI FAQ and UBIFS FAQ:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_scalability

We can now improve scalability of UBI without affecting UBIFS - it has some
potential. And we may develop UBI2 which would be more much more scalable,
but this is a big project and we are not planning to do this so far. Others
could do.

So in other words, using UBI allowed us to get a finished system faster. I
meets our's and many other people's requirements, although it has issues if
you try to use it on really huge flashes, like 64GiB. That's a drawback.
But the good thing is that this would require re-working UBI layer, without
Agree.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Jörn Engel <joern@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 1:28 am

Sorry, forgot to paste the link :-)

http://www.lazybastard.org/logfs/patches

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Jörn Engel <joern@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 10:00 am

I personally refuse to compare a finished FS with handles all the
crucial flash features to a non-finished FS. It just makes no sense.

LogFS was talked about back 2005 in Linux Kongress [1], but is not
finished yet. Let's talk about it when it is production ready.

[1]. http://www.linux-kongress.org/2005/abstracts.html#4_4_2

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 1:17 pm

Noone is forcing you.

Jörn

--
All models are wrong. Some models are useful.
-- George Box
--

To: Jörn Engel <joern@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 4:49 pm

Hi,

There are some of us that are interested to know why we want UBIFS in
the mainline rather than wait for LogFS or some other variant to
appear though.

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Monday, March 31, 2008 - 5:21 pm

You don't have to wait long. I was thinking about sending a patch out
tomorrow.

And I don't believe it has to be a choice. There is little reason
against merging both - apart from any problems found in the review.

Also, competition is a good thing. There's nothing like a flurry of
patches following an unfavorable benchmark for one side or the other. ;)

Jörn

--
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike
--

To: Jörn Engel <joern@...>
Cc: Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, Adrian Hunter <ext-adrian.hunter@...>, Jan Engelhardt <jengelh@...>, LKML <linux-kernel@...>, <joern@...>
Date: Tuesday, April 1, 2008 - 2:00 am

Jörn Engel wrote:
If you do not mind, could you please tell about all crucial features which
are not implemented (but have to) like bad eraseblock handling when you
send the patch-set?

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Friday, March 28, 2008 - 2:45 am

There was a typo, let me fix it.

s/garbage-collection/wear-leveling/. Of course GC is done on the FS level :-)

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Thursday, March 27, 2008 - 12:20 pm

As a suggestion, take everything below this paragraph and above the
diffstat in your original email and throw it in
Documentation/filesystems/ubifs.txt

josh

--

To: Josh Boyer <jwboyer@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Friday, March 28, 2008 - 2:17 am

Sure, thanks.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This file implement various helper functions to work with UBIFS keys.
The keys are part of the UBIFS index which is a B-tree. For example,
directory entry key consists of the parent inode number and directory
entry hash.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/key.h | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 507 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/key.h b/fs/ubifs/key.h
new file mode 100644
index 0000000..679cb80
--- /dev/null
+++ b/fs/ubifs/key.h
@@ -0,0 +1,507 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This header contains various key-related definitions and helper function.
+ * UBIFS allows several key schemes, so we access key fields only via these
+ * helpers. At the moment only one key scheme is supported.
+ *
+ * Simple key scheme
+ * ~~~~~~~~~~~~~~~~~
+ *
+ * Keys are 64-bits long. First 32-bits are inode number (parent inode number
+ * in case of direntry key). Next 3 bits are node type. The last 29 bits are
+ * 4KiB offset in case of inode node, and direntry hash in case of a direntry
+ * node....

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

Add UBIFS to Makefile and Kbuild.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/Kconfig | 3 +
fs/Makefile | 1 +
fs/ubifs/Kconfig | 47 ++++++++++++++
fs/ubifs/Kconfig.debug | 159 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/Makefile | 9 +++
5 files changed, 219 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index d731282..70edf5c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1347,6 +1347,9 @@ config JFFS2_CMODE_FAVOURLZO

endchoice

+# UBIFS File system configuration
+source "fs/ubifs/Kconfig"
+
config CRAMFS
tristate "Compressed ROM file system support (cramfs)"
depends on BLOCK
diff --git a/fs/Makefile b/fs/Makefile
index 1e7a11b..fcae06a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_NTFS_FS) += ntfs/
obj-$(CONFIG_UFS_FS) += ufs/
obj-$(CONFIG_EFS_FS) += efs/
obj-$(CONFIG_JFFS2_FS) += jffs2/
+obj-$(CONFIG_UBIFS_FS) += ubifs/
obj-$(CONFIG_AFFS_FS) += affs/
obj-$(CONFIG_ROMFS_FS) += romfs/
obj-$(CONFIG_QNX4FS_FS) += qnx4/
diff --git a/fs/ubifs/Kconfig b/fs/ubifs/Kconfig
new file mode 100644
index 0000000..21a6fae
--- /dev/null
+++ b/fs/ubifs/Kconfig
@@ -0,0 +1,47 @@
+config UBIFS_FS
+ tristate "UBIFS file system support"
+ select CRC16
+ select CRC32
+ depends on MTD_UBI
+ help
+ UBIFS is a file system for flash devices which works on top of UBI.
+
+config UBIFS_FS_XATTR
+ bool "Extended attributes support"
+ depends on UBIFS_FS
+ default n
+ help
+ This option enables support of extended attributes.
+
+config UBIFS_FS_ADVANCED_COMPR
+ bool "Advanced compression options"
+ depends on UBIFS_FS
+ default n
+ help
+ This option allows to explicitly choose which compressions, if any,
+ are enabled in UBIFS. Removing compressors means inbility to read
+ existing file systems.
+
+ If unsure, say 'N'.
+
+config UBI...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 3:39 am

Hi Artem,

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy

But these don't make much sense to me. Why would you want to be able
to compile out printks at this granularity? Why not enable all of them

Why would you not want to enable all of these for development kernels?
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 4:51 am

Well, its just more convenient for us. If I know the bug is somewhere in
the journal, I enable the journal messages - less flooding. We may
lessen the amount, but it is still handy to have some classes of
prints separate.

Some of the checks are very heavy-weight. For example, the tree checking
functions scan the whole TNC/LPT B-tree, which means they read it from
flash, they check CRC, and they make sure the tree is consistent and
has sane data. They are very useful when hunting bugs, but they are too
slow to be always enabled.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Saturday, April 26, 2008 - 5:35 am

It's everything but convenient :) Please make it one config option to
compile in all debug code and then have a module option to select the
verbosity level at runtime.

--

To: ext Christoph Hellwig <hch@...>
Cc: Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 3:09 am

Surely that judgement should be made by people who actually debug UBIFS.

--

To: Adrian Hunter <ext-adrian.hunter@...>
Cc: ext Christoph Hellwig <hch@...>, Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 5:00 am

I've debugged enough code including filesystems far more complex than
ubifs so you can happily trust my judgement. Even if not you can simply
switch on your brain and notice that a runtime/boottime switch is always
more convenient than a compile-time switch, and the only reason against
it would be a performance penalty.
--

To: ext ext Christoph Hellwig <hch@...>
Cc: Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 7:23 am

You have gone off on a tangent. The original context was discussing the
need for granulated debug messages. I have restored the context above.

Addressing the original discussion, the convenience is best expressed by
example. Say you want to try improving the garbage collector. It is
convenient to be able to get messages just about the garbage collector.
Say you want to add index node merging to the TNC, it is convenient to
get messages just about the TNC. And so on. Hence my original point
stands: the convenience is evident to someone working with the code, but
not to someone who isn't.

Note also, that switching on all the debug messages is not exactly
inconvenient. After all, you just select all the config options.

You seem to have mistakenly inferred I was impugning your judgement. That
was not the point.

Coming back to your issue of a mount-time option for debug messages. I am
not sure any other file systems do that. In general I would say having to
switch on the debug config option and also change either the kernel command
line or init scripts, seems in fact much less convenient.

--

To: Adrian Hunter <ext-adrian.hunter@...>
Cc: ext ext Christoph Hellwig <hch@...>, Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 7:39 am

I think you haven't read my statement at all. Please look at the quoted
bit above. There is nothing against having different
vebosity/granularity levels, quite to the contrary. I just told you
that a run-time selection of them is everything but convenient and they

No, the point was that you didn't read my message and/or assumes just

It means you can be debug different bits without recompiling which is a
very good thing. Especially if you're debugging moves from one area to
another.
--

To: Christoph Hellwig <hch@...>
Cc: Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 8:25 am

At one stage Artem had set up sysfs entries for UBIFS so that he could change
debugging options via sysfs on the fly, without even unmounting. But he said
he didn't find it that useful and removed it all.

For myself, recompiling UBIFS only takes 25 seconds so changing config options
is not a big deal.

However I have no problem adding a mount option, although I suspect we might
end up being asked to remove it.

I hope to spend some time on UBIFS debug message handling this week. I would
like to be able to control verbosity, but not overcomplicate matters. We plan
to post UBIFS again next week when Artem returns from holiday.
--

To: Adrian Hunter <ext-adrian.hunter@...>
Cc: Christoph Hellwig <hch@...>, Artem Bityutskiy <dedekind@...>, Pekka Enberg <penberg@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Monday, April 28, 2008 - 9:02 am

module_param is probably easier and more useful, that way it can be set
at module load/boot time and be changed later in sysfs.
--

To: Artem Bityutskiy <dedekind@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 5:15 am

Hi Artem,

Yeah, perhaps that's a sign that you're doing it wrong? You currently
have 430 separate debug printks sprinkled around in UBIFS. The way to
reduce that is to move as much logging as possible higher up the call
chain and dump as much information as you can there. That's what
ext2_error() does, for example. So I'm not opposed to a ubifs_error()
or ubifs_warning() even if that's used in a controlled fashion. The
way you do debugging checks now is totally ad hoc and IMHO not
acceptable to the mainline kernel.

And like I said, if you need _tracing_ you might want to look at
Ingo's ftrace or some other similar tracing infrastructure and use
that. The upside of it is that you can basically have it enabled at
run-time too.

So perhaps you could just separate those heavy-weight options and have
all others under CONFIG_UBIFS_DEBUG?

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 5:25 am

JFFS2 has the similar thing. I myself fixed bugs just by asking people
enabling them and sending the log. Very useful. This is why we also added
Yeah, this is what the dbg_gen doing :-) But often we need messages from
lower layers, why not? I agree we should lessen the amount of prints, but
I still do not get it what is your argument against them the shiny prints
Why? What is wrong with this? As I said, we found it very useful in JFFS2,
because I has been working with JFFS2 for _long_ time. Talk to David
Woodhouse and ask how many times that made him fix a bug just by having
This alternative is not really acceptable. We want to have only _few_
debugging things always enabled, and have other things always compiled

We can if we have to. But these checks often affect the FS behavior and
sometimes make bugs go away and become unreproducible. Especially locking
problems. This is why we want to be able to enabling them separately.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: <Artem.Bityutskiy@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>
Date: Tuesday, April 1, 2008 - 6:04 am

Hi Artem,

On Tue, Apr 1, 2008 at 12:25 PM, Artem Bityutskiy

First and foremost, JFFS2 uses BUG_ON and doesn't invent it's own
assert. Furthermore, the debug tracing code prints out human-readable
text in well-thought of places. It's looks bit excessive to me and I
don't see a good reason why the different logging levels are not
run-time configurable (if you're going to invent a logging
infrastructure, why not do it properly). But there simply is no
comparison between JFFS2 and UBIFS debug logging code. The former is
cleanly structured whereas yours looks to be totally ad hoc.

But perhaps the problem will go away after you inject some sanity to
stuff like this:

fs/ubifs/dir.c: dbg_gen("dent '%.*s' to ino %lu (nlink %d) in dir ino %lu",
fs/ubifs/dir.c: dbg_gen("dent '%.*s' from ino %lu (nlink %d) in dir ino %lu",
fs/ubifs/dir.c: dbg_gen("directory '%.*s', ino %lu in dir ino %lu",
dentry->d_name.len,
fs/ubifs/dir.c: dbg_gen("dent '%.*s', mode %#x in dir ino %lu",
fs/ubifs/dir.c: dbg_gen("dent '%.*s' in dir ino %lu",

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>
Date: Tuesday, April 1, 2008 - 6:26 am

True. But it has checking code which may be enabled or disable.
An assert is just a special case of this. You do not say why
it hurts. For me it looks like your personal taste.

The same is with UBIFS. We will make the amount of messages less,
This means that when debugging is enabled, you'll have prints like:
UBIFS DBG (pid 28398): ubifs_create: dent 'file', mode 0x81a4 in dir ino 1
or
UBIFS DBG (pid 28398): ubifs_setattr: ino 65, ia_valid 0x70

We tried to keep messages shorter because logging takes time and long
messages make it slower to debug the code.

Anyway, we will lessen and re-view this, and make it nicer.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: <Artem.Bityutskiy@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>
Date: Tuesday, April 1, 2008 - 7:33 am

Hi Artem,

On Tue, Apr 1, 2008 at 1:26 PM, Artem Bityutskiy

I don't know how many times I have to say this: you're doing it at the
wrong level! The reason you want to compile them out is because you've
added crap like this all over your code paths:

ubifs_assert(PageLocked(page));
ubifs_assert(!PageChecked(page));
ubifs_assert(!PagePrivate(page));

So instead of arguing about this you really ought to look at what
SLUB, for example, does. It's perfectly okay to have _debugging
checks_ compiled out (stuff like verify_inode and such) but at the
assertion level it makes no sense whatsoever!

On Tue, Apr 1, 2008 at 1:26 PM, Artem Bityutskiy

So what? It's still an ad hoc debugging printout with no particular
meaning whatsoever.

But this discussion is getting nowhere and I have better things to do
than argue about this over and over again. So to reiterate my review
comments on this:

- Kill your home-grown assert
- Fix up your logging messages to actually make sense
- Perhaps introduce a ubifs_error() thingy and convert as much code
to use that
- Reduce the amount of debug Kconfig options

You can ignore these comments as my personal preferences all you want
in which case I can only wish you good luck with merging this thing
upstream.

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: <Artem.Bityutskiy@...>, Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>
Date: Saturday, April 26, 2008 - 5:37 am

Yes, having this in filesystems is not very nice. If someone feels
very strong about interface assertation we should add them at the
method level boundary so that the interface is verified for all
filesystems.

--

To: ext Christoph Hellwig <hch@...>
Cc: Pekka Enberg <penberg@...>, <Artem.Bityutskiy@...>, Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 28, 2008 - 3:10 am

The checks are not valid for all file systems.

The point of the checks as preconditions is lost if they moved elsewhere.

VFS code paths are not simple, so the suggestion is impractical anyway.
--

To: Adrian Hunter <ext-adrian.hunter@...>
Cc: ext Christoph Hellwig <hch@...>, Pekka Enberg <penberg@...>, <Artem.Bityutskiy@...>, Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, David Woodhouse <dwmw2@...>
Date: Monday, April 28, 2008 - 5:03 am

These checks are if properly maintained not harmful for the code,
but they make the code less readable and there's of course the chance
that they get out of sync. That's why the term "not very nice" is

Most of these checks are indeed generic. Those that arise from special
filesystem invariants like not having unmapped buffers due to
implementing ->page_mkwrite should for now be checked in the filesystem,
although I'd like to make sure this is true for all filesystems
long-term.
--

To: Christoph Hellwig <hch@...>
Cc: Pekka Enberg <penberg@...>, <Artem.Bityutskiy@...>, Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, David Woodhouse <dwmw2@...>
Date: Wednesday, April 30, 2008 - 3:04 am

What would make them more readable? We (ok mostly me) are not enthusiastic
about BUG_ON for the following reasons:
1. When testing and debugging you lose the opportunity to get more
information and have to reboot the test platform
2. When the system is in the hands of consumers we don't want it
to oops under any circumstances. This is particularly the case
when looking at things like BUG_ON(!PageLocked(page)). In an
embedded system, single CPU, preemption disabled, even in the
unlikely event this happens, the system would probably get away
with it.
3. We don't have many people using UBIFS so hoping they catch
BUGs is less realistic. It is also bad for us as we try to
persuade people of the merits of UBIFS. Consequently our focus
is on our own testing (which brings us back to point 1).
4. We have had a couple of situations with JFFS2 where BUG_ON was
used incorrectly to handle errors that didn't look like they
could happen but it turned out they could. I.e. an error code
should have been returned and the system allowed to continue.
In short BUG_ON can be bad all by itself.

Perhaps if we called it dbg_check() instead of ubifs_assert() ?

As for things getting out of sync, anyone making changes to UBIFS, either
without testing or testing without the debug checks turned on, is taking
a much much bigger risk than just letting the checks get out of sync.

I am not sure it is really reasonable to compare the debugging needs of
SLUB with those of UBIFS because UBIFS has a much lower visibility and
usage. There are far fewer eyes looking at the code and far fewer people
using it. Since the burden of testing really falls on just a few of us,

We use both PagePrivate and PageChecked differently to EXT3 etc.

From the original example, that just leaves PageLocked, but lots of kernel
functions seem to check that, so why not UBIFS too?

--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>, David Woodhouse <dwmw2@...>
Date: Tuesday, April 1, 2008 - 7:56 am

This was more for developing. I added that to be really sure those
requirements are met. As I said, the amount of assertions will be
lessened and these ones will be deleted. There are other assertions
in the VFS calls implementation functions, which also will be deleted.

You still do not explain what is wrong with this. For me it means that
ubifs_setattr() was called for inode 65. And when I debugging a bug I
Fair enough. As I said, we will review debugging and lessen the amount of
it.

Thanks.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

Extended attributes are implemented as separate inodes. This makes
it very easy to implement them and to re-use nearly all the existing
code. This might be not the fastest implementation, though. ACL support
is not implemented.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/xattr.c | 587 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 587 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
new file mode 100644
index 0000000..85b1088
--- /dev/null
+++ b/fs/ubifs/xattr.c
@@ -0,0 +1,587 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS extended attributes support.
+ *
+ * Extended attributes are implemented as regular inodes with attached data,
+ * which limits extended attribute size to UBIFS block size (4KiB). Names of
+ * extended attributes are described by extended attribute entries (xentries),
+ * which are almost identical to directory entries, but have different key type.
+ *
+ * In other words, the situation with extended attributes is very similar to
+ * directories. Indeed, any inode (bu...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The journal re-play subsystem is responsible for replaying the
journal during mount if it was not committed before last un-mount.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/replay.c | 1006 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1006 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
new file mode 100644
index 0000000..f627053
--- /dev/null
+++ b/fs/ubifs/replay.c
@@ -0,0 +1,1006 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file contains journal replay code. It runs when the file-system is being
+ * mounted and requires no locking.
+ *
+ * The larger is the journal, the longer it takes to scan it, so the longer it
+ * takes to mount UBIFS. This is why the journal has limited size which may be
+ * changed depending on the system requirements. But a larger journal gives
+ * faster I/O speed because it writes the index less frequently. So this is a
+ * trade-off. Also, the journal is indexed by the in-memory index (TNC), so the
+ * larger is the journal, the more memory its index may consume.
+ */
+
+...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This is a small sub-system which is doing eraseblock scanning. For
example, this is needed during journal replay, recovery, or garbage
collection.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/scan.c | 368 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 368 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/scan.c b/fs/ubifs/scan.c
new file mode 100644
index 0000000..858aa94
--- /dev/null
+++ b/fs/ubifs/scan.c
@@ -0,0 +1,368 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the scan which is a general-purpose function for
+ * determining what nodes are in an eraseblock. The scan is used to replay the
+ * journal, to do garbage collection. for the TNC in-the-gaps method, and by
+ * debugging functions.
+ */
+
+#include "ubifs.h"
+
+/**
+ * scan_padding_bytes - scan for padding bytes.
+ * @buf: buffer to scan
+ * @len: length of buffer
+ *
+ * This function returns the number of padding bytes on success and
+ * %SCANNED_GARBAGE on failure.
+ */
+static int scan_padding_bytes(void *buf, int len)
+{
+ int pad_len = 0, ma...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The LEB find sub-system is responsible for maintaining lists of
eraseblocks with free and dirty space. For example, when UBIFS has
to do garbage collection, in needs to find the dirtiest eraseblock,
because it is faster to garbage-collect it, and it asks the
LEB find sub-system to do this, which usually has immediate answer.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/find.c | 951 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 951 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c
new file mode 100644
index 0000000..fc601e5
--- /dev/null
+++ b/fs/ubifs/find.c
@@ -0,0 +1,951 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file contains functions for finding LEBs for various purposes e.g.
+ * garbage collection. In general, lprops category heaps and lists are used
+ * for fast access, falling back on scanning the LPT as a last resort.
+ */
+
+#include <linux/sort.h>
+#include "ubifs.h"
+
+/**
+ * struct scan_data - data provided to scan callback functions
+ * @min_space: minimum number of bytes for which...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The LEB properties are stored and maintained on the flash media,
because otherwise UBIFS would need to scan whole media on each mount.
We store this per-LEB accounting information is the lprops tree (LPT)
which is an on-flash B-tree. The tree is updated out-of-place, as
everything in UBIFS. It has its own garbage-collector, and is kind
of small independent world whose task is to maintain the array of
per-eraseblock information.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/lpt.c | 2239 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 2239 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lpt.c b/fs/ubifs/lpt.c
new file mode 100644
index 0000000..27288d7
--- /dev/null
+++ b/fs/ubifs/lpt.c
@@ -0,0 +1,2239 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the LEB properties tree (LPT) area. The LPT area
+ * contains the LEB properties tree, a table of LPT area eraseblocks (ltab), and
+ * (for the "big" model) a table of saved LEB numbers (lsave). The LPT area sits
+ * between the log and the orphan area.
+ *
+ * The LPT area is...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

UBIFS supports on-the-flight compression, and this patch adds a
compression helper functions which make it possible to use the
same API irrespectively of the compression type. At the moment
UBIFS supports only LZO and zlib. It uses cryptoapi to access
the compressors.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/compress.c | 264 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 264 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/compress.c b/fs/ubifs/compress.c
new file mode 100644
index 0000000..74389f5
--- /dev/null
+++ b/fs/ubifs/compress.c
@@ -0,0 +1,264 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ * Zoltan Sogor
+ */
+
+/*
+ * This file provides a single place to access to compression and
+ * decompression.
+ */
+
+#include <linux/crypto.h>
+#include "ubifs.h"
+
+/*
+ * UBIFS does not try to compress data if its length is less then the below
+ * constant.
+ */
+#define MIN_COMPR_LEN 128
+
+/* Fake description object for the "none" compressor */
+static struct ubifs_com...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The recovery sub-system is responsible for recovering from unclean
reboots. It makes sure every-thing is consistent, rolls-back the
last broken and un-finished FS operation, and so on.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/recovery.c | 1437 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1437 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
new file mode 100644
index 0000000..e1e8916
--- /dev/null
+++ b/fs/ubifs/recovery.c
@@ -0,0 +1,1437 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements functions needed to recover from unclean un-mounts.
+ * When UBIFS is mounted, it checks a flag on the master node to determine if
+ * an un-mount was completed sucessfully. If not, the process of mounting
+ * incorparates additional checking and fixing of on-flash data structures.
+ * UBIFS always cleans away all remnants of an unclean un-mount, so that
+ * errors do not accumulate. However UBIFS defers recovery if it is mounted
+ * read-only, and the flash is not modified in that case.
+ */
+
+#include ...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This patch contains the superblock and master node implementations.
The UBIFS superblock is read-only and contains only static data like
the default compression type. The superblock sits at the fixed
position and may be changed only with user-space tools. The master
node contains dynamic information like the position of the root
indexing node of the UBIFS indexing B-tree, and so on. The master
node is updated out-of-place.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/master.c | 415 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/sb.c | 581 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 996 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/master.c b/fs/ubifs/master.c
new file mode 100644
index 0000000..38c40d1
--- /dev/null
+++ b/fs/ubifs/master.c
@@ -0,0 +1,415 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/* This file implements reading and writing the master node */
+
+#include "ubifs.h"
+
+/**
+ * scan_for_master - search the valid master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function scans the master node LE...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The file-system build code contains most of the UBIFS initialization
and mount-related functionality implementation.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/build.c | 1351 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/super.c | 531 +++++++++++++++++++++
2 files changed, 1882 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/build.c b/fs/ubifs/build.c
new file mode 100644
index 0000000..1142020
--- /dev/null
+++ b/fs/ubifs/build.c
@@ -0,0 +1,1351 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS initialization, mount and un-mount. Some
+ * initialization stuff which is rather large and complex is placed at
+ * corresponding subsystems, but most of it is here.
+ */
+
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/random.h>
+#include <linux/kthread.h>
+#include <linux/parser.h>
+#include "ubifs.h"
+
+/* Slab cache for UBIFS inodes */
+struct kmem_cache *ubifs_inode_slab;
+
+/* UBIFS TNC shrinker desc...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Friday, March 28, 2008 - 6:12 am

do_div() operates on u64, not signed long long. This will warn on several
architectures.
--

To: Andrew Morton <akpm@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Friday, March 28, 2008 - 7:04 am

Will be fixed, thank you!

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This sub-system is responsible for performing all the I/O-related
low-level things like calculating and checking checksums, doing
basic node validation, adding correct padding to the nodes and
so on. It also implements UBIFS write-buffers and their proper
synchronization.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/io.c | 921 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 921 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
new file mode 100644
index 0000000..182f25c
--- /dev/null
+++ b/fs/ubifs/io.c
@@ -0,0 +1,921 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ * Zoltan Sogor
+ */
+
+/*
+ * This file implements UBIFS I/O subsystem which provides various I/O-related
+ * helper functions (reading/writing/checking/validating nodes) and implements
+ * write-buffering support. Write buffers help to save space which otherwise
+ * would have been wasted for padding to the nearest minimal I/O unit boundary.
+ * Instead, data first goes to the write-buffer and is flushed...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The UBIFS code is large, and we have a plenty of debugging stuff
in there which helps to catch bugs. Some of the debugging stuff
will be deleted later.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/debug.c | 1125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/debug.h | 343 +++++++++++++++++
2 files changed, 1468 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c
new file mode 100644
index 0000000..5ccb5a4
--- /dev/null
+++ b/fs/ubifs/debug.c
@@ -0,0 +1,1125 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements most of the debugging stuff which is compiled in only
+ * when it is enabled. But some debugging check functions are implemented in
+ * corresponding subsystem, just because they are closely related and utilize
+ * various local functions of those subsystems.
+ */
+
+#define UBIFS_DBG_PRESERVE_KMALLOC
+#define UBIFS_DBG_PRESERVE_UBI
+
+#include "ubifs.h"
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+
+DEFINE_SPINLOCK(dbg_lock);
+
+static char dbg_get_key_dump_dump_buf[100];
+
+static size_t km_alloc_cnt;
+s...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Monday, March 31, 2008 - 5:00 pm

Hi Artem,

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy

Yes please. The code is somewhat noisy on the debugging side.

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy

Not acceptable for mainline kernel. SLAB already provides leak

Please kill these wrappers and use BUG_ON, WARN_ON, and printk() where
appropriate.
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 2:20 am

Yeah, we will remove this later, keep it for now because it is very
convenient. I guess you refer the /proc/slab_allocations feature.
We found it less appropriate because it needs additional scripts to
be run to detect leaks, while this simple just hack makes UBIFS print

That was introduced to test the UBIFS shrinker, and to make sure
there are no races and everything works fine. Yes, will be removed

Well, I do not see a big reason not to get rid of this harmless stuff.
Many kernel subsystems have their debugging, why not? Using BUG_ON() is
OK in few most important places. But we want to have more assertions
which are compiled-out by default, why can't we?. Similar is for prints.

Thanks for the feed-back.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 3:43 am

Hi Artem,

Why not fix it then to fit your needs? You do need leak detection
after you're in the mainline too, don't you?

It sounds useful and necessary for developing in the mainline as well
so you might want to reconsider making it a standalone module in mm/.

Pekka
--

To: Artem Bityutskiy <dedekind@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 3:33 am

Hi Artem,

Why would you want to have assertions that are compiled out by default?
Either you handle the error or don't (and have an assertion). The reason
some subsystems have had their own asserts is because they go overboard
with defensive checks as they haven't bothered to think through a
reasonable error handling strategy. The downside? It clutters the code
and causes the (compiled out) assertions to bit-rot.

Note that they're also a total pain in the ass to enable for anyone not
intimately familiar with your code. Not to mention you're now making the
lives of those crazy embedded folks that disable CONFIG_BUG for smaller
kernel size harder as well.

Do you know why we don't have compiled out asserts in the core kernel?
That's because it simply can't just roll-over and die if something
unexpected happens and your filesystem shouldn't probably do that
either. Sure, if you have some debugging checks that are way too
expensive for production use, you might want to have a
CONFIG_UBIFS_DEBUG but that shouldn't happen at assertion level but
rather at much higher level.

And btw, for optional printks, we have a lot of tracing infrastructure
in the kernel already (kprobes, relayfs, ftrace probably soon), so if
you want to have tracing for UBIFS (you probably don't), don't invent
your mechanism. But for most printks, they're either useful or they're
not. Again, I do see the potential need for CONFIG_UBIFS_DEBUG here, but
doing that at printk-level is also too low-level.

Pekka
--

To: ext Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>
Date: Tuesday, April 1, 2008 - 4:34 am

For debugging. It would be unreasonably inefficient on embedded systems

It depends whether you consider error handling and debugging to be the same
thing. Failing an assert is not an error - it is a bug. It is very
difficult, and sometimes impossible, to contrive a useful response to
a bug. It is also not really worth the effort.

BUG_ON is a poor solution for embedded systems. When developing and debugging
you don't want your system to panic just because you are on the track of a bug.
And then when the system is in production, you don't want it to panic period.
We have had lots of situations where BUG_ON has been used incorrectly to handle

Anyone developing or doing serious testing would have debugging turned on.
Anyone doing debugging, would have debugging turned on. It seems pretty

Our asserts don't roll over and die. They print a message and dump the

And lots of file systems (e.g. EXT2, JFFS2) have optional prints as well,
just like UBIFS.

--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 4:32 am

Pekka, I still do not see why you are opposed to assertions so much :-)

Because we want to have a way to catch bugs and to quickly fix them. This
is why we injected many assertions all over the place. Enabling them by
default is inefficient and makes the code larger, which is not good
especially for small embedded systems.

If someone reports us an obscure oops, and we have no idea why it happened,
and we cannot reproduce it on our setup, we ask the reporter to enable
debugging and report us results. This helps us to figure out what was the
reason and to quickly fix the bug. I do not see why you want to prevent
We handle all errors. Errors are things like I/O failures, memory allocation
failures, unexpected behavior. We do handle this. Assertion are about
_debugging_, when you already know you have a problem.

Indeed, bugs may be tricky. An oops may happen because half an hour ago a
function craped out something. Assertions allow us to catch problems on
_early_ stage, instead of dealing with consequences and scratching the head
what was the reason.

But I do agree we have too much of that. We will lessen the amount of

I am not sure what you mean. I would not want to delve into a general
discussion of the debugging stuff. I would better talk about specific
things. I'll just point you examples of debugging stuff in the kernel
in other subsystems which exists and does not hurt anyone. And I believe
it is helpful. It is compiled out by default and is enable when it is
needed to hunt a bug.

fs/ext2: ea_idebug(), EXT2FS_DEBUG
fs/xfs: #ifdef DEBUG, XFS_LOUD_RECOVERY and so on
fs/ocfs2: OCFS2_DEBUG_FS
fs/jfs: CONFIG_JFS_DEBUG, assert(), etc
fs: DEBUG_EPOLL, #ifdef DEBUG
fs/jbd2: assert_spin_locked(), CONFIG_JBD2_DEBUG, etc

Of course. People who are not familiar with the code send bug reports and

It is OK to have few BUG_ON() checks, and we should probably turn few

If something unexpected happens, UBIFS will just return -EINVAL in the
most cases, because one of the function...

To: <Artem.Bityutskiy@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 5:00 am

Hi Artem,

On Tue, Apr 1, 2008 at 11:32 AM, Artem Bityutskiy

But they're totally different kind of thing! They're not for disabling
hundreds of debug-only printks sprinkled around the kernel. Instead,
they let you disable well-defined debugging checks for kernel speed
and/or size optimizations. And btw, CONFIG_SLUB_DEBUG is only a kernel
size optimization for CONFIG_EMBEDDED. The SLUB debugging code can be
turned on and off at run-time.

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 5:04 am

Then I probably misunderstood you, sorry. My English is far from perfect.
Could you please again put your requests together and try to make each
request asking for a small thing. Probably be more specific about each
request (referring the code, saying why exactly this is harmful, etc).
It would just help a lot.

I thought you want us to remove all the UBIFS the debugging altogether.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

ubifs.h contains the internal stuff. ubifs-media.h contains the
on-flash format definition and might be copied to user-space
if needed. misc.h contains various inline helpers.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/misc.h | 267 +++++++++
fs/ubifs/ubifs-media.h | 701 ++++++++++++++++++++++
fs/ubifs/ubifs.h | 1519 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 2487 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h
new file mode 100644
index 0000000..0feadba
--- /dev/null
+++ b/fs/ubifs/misc.h
@@ -0,0 +1,267 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file contains miscellaneous helper functions.
+ */
+
+#ifndef __UBIFS_MISC_H__
+#define __UBIFS_MISC_H__
+
+/**
+ * ubifs_zn_dirty - check if znode is dirty.
+ * @znode: znode to check
+ *
+ * This helper function returns %1 if @znode is dirty and %0 otherwise.
+ */
+static inline int ubifs_zn_dirty(const struct ubifs_znode *znode)
+{
+ return !!test_bit(DIRTY_ZNODE, &znode->flags);
+}
+
+/**
+ * ubifs_wake_up_bgt - wake up backgro...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This sub-system keeps track of orphans - the files which were deleted
but are still kept open. These files should be deleted only when the
last reference goes. But if an unclean reboot happens, UBIFS has to
also delete the orphans. This is why the orphans sub-system exists -
it records information about all orphans to the on-flash orphan area.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/orphan.c | 952 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 952 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
new file mode 100644
index 0000000..4173fa9
--- /dev/null
+++ b/fs/ubifs/orphan.c
@@ -0,0 +1,952 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Author: Adrian Hunter
+ */
+
+#include "ubifs.h"
+
+/*
+ * An orphan is an inode number whose inode node has been committed to the index
+ * with a link count of zero. That happens when an open file is deleted
+ * (unlinked) and then a commit is run. In the normal course of events the inode
+ * would be deleted when the file is closed. However in the case of an unclean
+ * unmount, orphans need to be accounted for. After an unclean unmount, the
+ * orphans' inodes must be...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

Because of compression and space wastage (due to paddings) it is not
always possible to know whether the cached data fits the flash space
or not. Sometimes this problem is called "ENOSPC" problem. UBIFS
implements the budgeting sub-system to solve the issue. All the FS
operations have to acquire the budget. The budgeting subsystem does
pessimistic space calculations (e.g., assumes the data is not
compressible) and forces write-back or garbage-collection if needed.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/budget.c | 822 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 822 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
new file mode 100644
index 0000000..e975796
--- /dev/null
+++ b/fs/ubifs/budget.c
@@ -0,0 +1,822 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the budgeting unit which is responsible for UBIFS space
+ * management.
+ *
+ * Factors such as compression, wasted space at the ends of LEBs, space in other
+ * journal heads, the effect of updates on the index, and so on, make it
+...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This patch adds implementation of most of the VFS callbacks like
->readdir(), ->write_begin(), and so on. In most cases, it just
does budgeting and calls corresponding journal function, because
all new data goes first to the journal.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/dir.c | 989 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/file.c | 790 +++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/ioctl.c | 205 +++++++++++
3 files changed, 1984 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
new file mode 100644
index 0000000..672652a
--- /dev/null
+++ b/fs/ubifs/dir.c
@@ -0,0 +1,989 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ * Zoltan Sogor
+ */
+
+/*
+ * This file implements directory operations.
+ *
+ * All FS operations in this file allocate budget before writing anything to the
+ * media. If they fail to allocate it, the error is returned. The only
+ * exceptions are 'ubifs_unlink()' and 'ubifs_rmdir()' which keep working even
+ * if they unable...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 8:08 am

Hi Artem,

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy

So you don't expect the VM to ever call these functions? Why?

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 8:42 am

In UBIFS pages must never become dirty without asking UBIFS if this is allowed or not.
This is needed because of budgeting - before making any page dirty we have to make
sure there is enough space to write it back. This is not usually an issue for
traditional FSes, because pages may be changed in-place. In UBIFS we cannot change
stuff in-place, so we have to be very careful when marking things dirty. This is why
we have the budgeting sub-system. The white-paper tells more about this.

Anyway, the requirement is that all the places where a page may become dirty
should be in UBIFS and the corresponding operations have to be budgeted for.

If this function is called, this means that someone made a page dirty without
budgeting for this. Which in turn may mean that there will be no space when
the page is written back. So basically, this implementation is just a guarding

Yeah, this is also a guarding thing. When a dirty page is released
the budget which was allocated for it has to be freed. If this function
is called, then the budget was not freed, which must never happen.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: Artem Bityutskiy <dedekind@...>
Cc: Artem Bityutskiy <Artem.Bityutskiy@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 9:12 am

Hi Artem,

So what guarantees that no one calls invalidate_complete_page() or
fallback_migrate_page(), for example?
--

To: Pekka Enberg <penberg@...>
Cc: Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Tuesday, April 1, 2008 - 10:04 am

At the first glance it looks like it might be called, but for clean pages,
which is not the problem, but there is this assert which may give fake
alarm. Need to look closer at this. Thanks for the note.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: <Artem.Bityutskiy@...>
Cc: Pekka Enberg <penberg@...>, Artem Bityutskiy <dedekind@...>, LKML <linux-kernel@...>
Date: Tuesday, April 1, 2008 - 11:14 am

Dirty pages are not released.

In UBIFS, clean pages do not have PagePrivate(page) set and so releasepage() is not called.
--

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Thursday, March 27, 2008 - 9:36 am

Artem Bityutskiy <Artem.Bityutskiy@nokia.com> writes:

Any specific reason you didn't implement sub second time stamp support?
There is really no good excuse to not do that on a new file system.

-Andi
--

To: Andi Kleen <andi@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Thursday, March 27, 2008 - 9:42 am

No reason, just thought this should be enough. Will be fixed, thank you.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This is one of the most important parts of UBIFS. Since all updates
are out-of-place, we need to do garbage collection from time to time,
which is implemented in this file. The UBIFS GC does not do much -
it just move clean data to the journal and erases the cleaned-up
eraseblock. The main trick is done in TNC commit which guarantees
that the commit operation is always possible, even if there is no
clean space, in which case it may use in-place updates provided
by UBI.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/gc.c | 773 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 773 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c
new file mode 100644
index 0000000..7b43655
--- /dev/null
+++ b/fs/ubifs/gc.c
@@ -0,0 +1,773 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements garbage collection. The procedure for garbage collection
+ * is different depending on whether a LEB as an index LEB (contains index
+ * nodes) or not. For non-index LEBs, garbage collection finds a LEB which
+ * contains a lot of dirty spac...

To: Artem Bityutskiy <Artem.Bityutskiy@...>
Cc: LKML <linux-kernel@...>, Adrian Hunter <ext-adrian.hunter@...>
Date: Monday, March 31, 2008 - 10:11 pm

This comment sounds a little bit scary, but that may only be because I don't
understand the worst-case scenario.

Why can't you guarantee that there is always enough space to successfully
run GC, e.g. by reserving some space that can never be used by file data?

More importantly, if you get into the situation that the GC doesn't make
forward progress any more, can you guarantee that it is always possible
for the user to delete files in order to make space again? Or can you
get an -ENOSPC on unlink in that case?

Arnd <><
--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

Let file systems to writeback their pages and inodes when needed. This
is needed for UBIFS budgeting sub-system because it has to force
write-back from time to time.

Note, it cannot be called if one of the dirty pages is locked by
the caller, otherwise it'll deadlock.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/fs-writeback.c | 8 ++++++++
include/linux/writeback.h | 1 +
2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c007607..062aa4a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -573,6 +573,14 @@ void sync_inodes_sb(struct super_block *sb, int wait)
spin_unlock(&inode_lock);
}

+void writeback_inodes_sb(struct super_block *sb, struct writeback_control *wbc)
+{
+ spin_lock(&inode_lock);
+ sync_sb_inodes(sb, wbc);
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(writeback_inodes_sb);
+
/*
* Rather lame livelock avoidance.
*/
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b7b3362..0083a0a 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -71,6 +71,7 @@ struct writeback_control {
void writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
void sync_inodes_sb(struct super_block *, int wait);
+void writeback_inodes_sb(struct super_block *sb, struct writeback_control *wbc);
void sync_inodes(int wait);

/* writeback.h requires fs.h; it, too, is not included from here. */
--
1.5.4.1

--

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This is the commit-related part of the lprops sub-system.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/lpt_commit.c | 1628 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1628 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
new file mode 100644
index 0000000..2aa9712
--- /dev/null
+++ b/fs/ubifs/lpt_commit.c
@@ -0,0 +1,1628 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements commit-related functionality of the LEB properties
+ * subsystem.
+ */
+
+#include <linux/crc16.h>
+#include "ubifs.h"
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS
+static int dbg_check_ltab(struct ubifs_info *c);
+#else
+#define dbg_check_ltab(c) 0
+#endif
+
+/**
+ * first_dirty_cnode - find first dirty cnode.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode at which to start
+ *
+ * This function returns the first dirty cnode or %NULL if there is not one.
+ */
+static struct ubifs_cnode *first_dirty_cnode(struct ubifs_nnode *nnode)
+{
+ ubifs_assert(nnode);
+ while (1) {
+ int i, cont = 0;...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

UBIFS keeps track of all logical eraseblock - how much data do they
contain, how much of these data are dirty or clean. This space accounting
information is needed all over the place - when finding an empty eraseblock
to put new data to, when reporting amount of empty space, and so on.
We call this subsystem "lprops" which stands for LEB properties.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/lprops.c | 1341 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1341 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lprops.c b/fs/ubifs/lprops.c
new file mode 100644
index 0000000..56f43f7
--- /dev/null
+++ b/fs/ubifs/lprops.c
@@ -0,0 +1,1341 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the functions that access LEB properties and their
+ * categories. LEBs are categorized based on the needs of UBIFS, and the
+ * categories are stored as either heaps or lists to provide a fast way of
+ * finding a LEB in a particular category. For example, UBIFS may need to find
+ * an empty LEB for the journal, or a very dirty LEB for garbage coll...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

The TNC cache grows with time, because UBIFS caches the indexing nodes
when the indexing B-tree is looked-up. But if the the file-system is
large enough, the TNC may consume a lot of memory, in which UBIFS prunes
it. Namely, it register memory shrinker for these purposes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/shrinker.c | 410 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 410 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
new file mode 100644
index 0000000..a0ea4b7
--- /dev/null
+++ b/fs/ubifs/shrinker.c
@@ -0,0 +1,410 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS shrinker which evicts clean znodes from the TNC
+ * tree when Linux VM needs more RAM.
+ *
+ * We do not implement any LRU lists to find oldest znodes to free because it
+ * would add additional overhead to the file system fast paths. So the shrinker
+ * just walks the TNC tree when searching for znodes to free.
+ *
+ * If the root of a TNC sub-tree is clean and old enough, then the children are
+ * also clean and...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

We commit the TNC from time to time, which means we update the on-flash
indexing tree. The TNC commit basically implements journal commit.
The UBIFS implementation allows writing while the commit is in progress.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/tnc_commit.c | 1088 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1088 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c
new file mode 100644
index 0000000..bc0ce2c
--- /dev/null
+++ b/fs/ubifs/tnc_commit.c
@@ -0,0 +1,1088 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/* This file implements TNC functions for committing */
+
+#include "ubifs.h"
+
+/**
+ * make_idx_node - make an index node for fill-the-gaps method of TNC commit.
+ * @c: UBIFS file-system description object
+ * @idx: buffer in which to place new index node
+ * @znode: znode from which to make new index node
+ * @lnum: LEB number where new index node will be written
+ * @offs: offset where new index node will be written
+ * @len: length of new index node
+ */
+static int make_idx_node(struct ubi...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

TNC - tree node cache - the central UBIFS entity. It is basically
in-RAM cache of the on-flash indexing B-tree. But TNC also indexes
the journal, so that they are not always equivalent.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/tnc.c | 3483 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 3483 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
new file mode 100644
index 0000000..27e2b60
--- /dev/null
+++ b/fs/ubifs/tnc.c
@@ -0,0 +1,3483 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements TNC (Tree Node Cache) which caches indexing nodes of
+ * the UBIFS B-tree.
+ *
+ * At the moment the locking rules of the TNC tree are quite simple and
+ * straightforward. We just have a mutex and lock it when we traverse the
+ * tree. If a znode is not in memory, we read it from flash while still having
+ * the mutex locked.
+ */
+
+#include <linux/crc32.h>
+#include "ubifs.h"
+
+/**
+ * insert_old_idx - record an index node obsoleted since the last commit start.
+ * @c: UBIFS file-system description obje...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

This is the UBIFS journal commit implementation. The journal commit does not
mean the data is physically moved anywhere - we just update the indexing
information and find new eraseblocks for the journal.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/commit.c | 708 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 708 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
new file mode 100644
index 0000000..0b199d4
--- /dev/null
+++ b/fs/ubifs/commit.c
@@ -0,0 +1,708 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements functions that manage the running of the commit process.
+ * Each affected module has its own functions to accomplish their part in the
+ * commit and those functions are called here.
+ *
+ * The commit is the process whereby all updates to the index and LEB properties
+ * are written out together and the journal becomes empty. This keeps the
+ * file system consistent - at all times the state can be recreated by reading
+ * the index and LEB properties and then replaying the journal.
+ *
+ * ...

To: LKML <linux-kernel@...>
Cc: Adrian Hunter <ext-adrian.hunter@...>, Artem Bityutskiy <Artem.Bityutskiy@...>
Date: Thursday, March 27, 2008 - 10:55 am

All the new data first goes to the journal and sits there until it
gets committed. The journal contents does not have corresponding
on-flash indexing information, so the journal is like a small JFFS2
file-system. Once the journal is committed, the indexing information
is written to the flash media.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
fs/ubifs/journal.c | 1230 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/log.c | 769 ++++++++++++++++++++++++++++++++
2 files changed, 1999 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
new file mode 100644
index 0000000..e7c7aac
--- /dev/null
+++ b/fs/ubifs/journal.c
@@ -0,0 +1,1230 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS journal.
+ *
+ * The journal consists of 2 parts - the log and bud LEBs. The log has fixed
+ * length and position, while a bud logical eraseblock is any LEB in the main
+ * area. Buds contain file system data - data nodes, inode nodes, etc. The log
+ * contains only references to buds and some other stuff like commit
+ * start n...

Previous thread: none

Next thread: Linux ATA support now has a wiki by Jeff Garzik on Thursday, March 27, 2008 - 9:19 am. (2 messages)