Faithful readers of the Tux3 list, please make sure to block out some
time over the next couple of weeks to study up on Linux kernel
porting. I know some of you are highly skilled C programmers, but
probably few if any have really studied Linux kernel internals or gone
delving into the mysteries of the various kernel subsystems that will
be involved in the port. It is not very hard. The biggest difficulty
is finding out about the many unwritten or unadvertised facts about the
kernel that experienced kernel hacks tend to learn from lore as much as
anything.Fortunately there are some excellent resources available:
Anything written by Jon Corbet, especially "Linux Device Drivers"
(LDD) and the LWN kernel api documentation series."Understanding the Linux Kernel", Bovet and Cesati, for a general
introduction and overview with some useful insights.The most important resource: http://lxr.linux.no
Nearly everything you need to know about porting Tux3 to kernel is
here:http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/
and here:
http://lxr.linux.no/linux+v2.6.26.3/fs/ext3/
Besides that, learning about dentries and bios is essential. Ext2 is
useful source material because it implements all the essential features
of a full Posix filesystem with additional Linux-specific functionality.
Ext3 is useful because, relying on buffers as its main currency of
interaction with block devices as it does, it aligns better with the
Tux3 style than Ext2 does, which has been somewhat unsuccessfully
hacked to bypass buffers and work directly with pages. Once you are
aware of this it is easy to spot the differences and compare styles.
Suffice to say that the page model of filesystem interface only works
well for the simplest of filesystems, such as Ext2.Most of the additional complexity of Ext3 is due to journalling, which
is surprisingly complex in its details, particularly in dealing with
the various problems that arise due to the li...
Hi,
Those patches fix some bugs, and adds new handlers ->symlink(),
->truncate(), ->delete_inode(), ->unlink(), and ->link().There are some bugs, e.g. purge_inum() doesn't truncate bnode/ileaf
itself. And we don't free blocks on delete/truncate (need to implement
free_block()). Those have to be fixed soon or later.static-http://userweb.kernel.org/~hirofumi/tux3/
Please review it.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
It all looks beautiful, and as usual, it builds cleanly, passes all the
tests, boots up and mounts without problems. I did not test the new
functionality yet. Pushed to public.One idea on the links count attribute: as a later optimization, how
about we store it only if it is not equal to one? That is the majority
of cases. Saving the 6 bytes of link count attribute for most files
will reduce our average inode size by 12% or so. I previously
suggested we interpret a missing links attribute as zero, but that is a
rare case (orphan and internal inodes). We should just store the
attribute then.Link count for a directory is always 2 + number of subdirectories.
What about storing one less than the directory link count, so that
the stored link count for a directory with no children is one, which
is most directories? Then we also would not have to store the size
attribute for a directory.Very small optimizations, I know.
Daniel
_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
Sounds good. I think we can use that rule for all inode attributes.
Store attribute only if it is not initial value.
For internal inodes, we can use 1, or 2 if dir (just use initial value).
i_mode, i_uid, i_gid are 0 as initial value, so internal inodes don't
need MODE_OWNER_ATTR too. Well, finally, I guess the internal inode will
have only DATA_BTREE_ATTR.Maybe, the issue would be overhead. I think this optimization means it
may change total data size in ileaf to store. So, it may become causeFWIW, as just FYI, The following is fs/isofs/inode.c:1262
if (de->flags[-high_sierra] & 2) {
inode->i_mode = sbi->s_dmode | S_IFDIR;
inode->i_nlink = 1; /*
* Set to 1. We know there are 2, but
* the find utility tries to optimize
* if it is 2, and it screws up. It is
* easier to give 1 which tells find to
* do it the hard way.
*/It seems to use 1 for find command.
This is about phtree? Well, anyway, we would have to cleanup present
flags, because we are using CTIME_SIZE_BIT (ctime + size).
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
From here on we will be trying to stick somewhat close to the design
factoring typical of Linux filesystems, as exemplified by Ext2 and
Ext3. The immediate effort is to make Tux3's open_inode fill the role
of Ext2's ext2_get_inode.We see that ext2_get_inode is called from only two places:
http://lxr.linux.no/linux+v2.6.26.3/+ident=14702139
References:
fs/ext2/inode.c, line 1104 <- definition
fs/ext2/inode.c, line 1205 <- called from ext2_iget
fs/ext2/inode.c, line 1316 <- called from ext2_update_inodeIn general, *_iget is used to create or retrieve an inode from backing
store and *_update_inode is used to sync an inode to backing store. In
the first case a new, empty vfs struct inode is created immediately at
the beginning of the process and in the second, the struct inode
already exists. This suggests a slightly different factoring than Ext2
uses that better reflects the fact that Tux3's method of finding its
way to an inode on disk is considerably more involved than Ext2's, and
that Ext2 has separate bitmaps dedicated to allocating inodes which
allow it to avoid reading the any actual inode table blocks until later
when the new inode has to be synced to disk.A quick tour of ext2 functions that create inodes:
Look up an inode:
http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/namei.c#L66
http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/namei.c#L73Get the root directory
http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/super.c#L1044
Weird NFS hack:
http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/super.c#L330
Create a new inode
ext2_new_inode(dir, mode);
which uses ialloc, looking only at the inode bitmaps, to create a new inode.
But in Tux3 we dive down into the inode btree itself to allocate an inode,
just as we do when we look it up or update it. We want a somewhat different
factoring to reflect this.Skeleton code for Ext2:
inode = ext2_<various creates>
inode = ext2_new_inode(dir, mode)
inode = new_in...
Actually, I ultimately factored these into inode_make, inode_open and
inode_save. There turned out to be little code repeated between these
because helper functions do nearly all the work. Attribute encoding
and decoding has evolved into a pleasantly regular form which looks to
be blindingly fast, consisting entirely of inline calls to libc endian
conversion macros that expand to just a handful of machine instructions
on common architectures. Too bad gcc can't handle this common chore on
its own, that would be even better. Sigh. Anyway, unpacking/repacking
inode attributes now approaches the efficiency of Ext2/3, a pleasing
result.As I mentioned earlier, Ext2/3 have a slight advantage in being able to
compute the disk address of an inode table block directly rather than
traversing a btree to find it. But that can be optimized in Tux3 by
keeping btree "cursors" that cache the result of previous btree probes
to take advantage of spacial locality in lookups, which is the common
case. For a completely random access load, disk seeking will tend to
dominate anyway, and there will always be some amount of btree index
caching going on. Just two levels of btree index gives us access to
about three million inodes, and both index levels will fit easily in
cache. Three levels gives over a billion inodes and then we need to
be concerned mainly about keeping the terminal index nodes relatively
close to their parents, which lets the disk hardware combine seeks
efficiently. Every filesystem is going to have to seek a lot in this
case. The winner will be the one with the best disk layout policy,
and in the case of modern btree based filesystems, the highest btree
branching factor.Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Andrew Morton | 2.6.23-rc6-mm1 |
| Eric Paris | [RFC 0/5] [TALPA] Intro to a linux interface for on access scanning |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
