POHMELFS, Full Transaction Support

Submitted by Jeremy
on May 28, 2008 - 2:10pm

"This is a high performance network filesystem with a local coherent cache of data and metadata. Its main goal is distributed parallel processing of data," Evgeniy Polyakov said, announcing the latest version of his Parallel Optimized Host Message Exchange Layered File System. He noted that in addition to numerous bugfixes, the latest release includes the following new features:

"Full transaction support for all operations (object creation/removal, data reading and writing); Data and metadata cache coherency support; Transaction timeout based resending, if [a] given transaction did not receive [a] reply after specified timeout, [the] transaction will be resent (possibly to different server); Switched writepage path to ->sendpage() which improved performance and robustness of the writing."

Evgeniy also noted that he has started working on support for parallel data processing, one of the key intended features of the filesystem. He explained that initial logic has been added so data can be written to multiple servers at the same time, and reads can be balanced across the multiple servers, though the logic is not yet being used by the filesystem.


From: Evgeniy Polyakov
Subject: POHMELFS high performance network filesystem. Cache coherency, transactions, parallels.
Date: Sunday, May 25, 2008 - 9:40 am

Hi.

I'm pleased to announce POHMEL high performance network filesystem.
POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

Development status can be tracked in filesystem section [1].

This is a high performance network filesystem with local coherent cache of data
and metadata. Its main goal is distributed parallel processing of data.

This release brings following features:
 * Full transaction support for all operations (object creation/removal,
	data reading and writing). Data reading transactions are not optimal yet
	and will be improved in the next release (although fast).
 * Data and metadata cache coherency support. More details on how this is
	implemented one can find in appropriate  section [5].
 * Transaction timeout based resending. If given transaction did not
	receive reply after specified timeout, transaction will be resent
	(possibly to different server).
 * Switched writepage path to ->sendpage() which improved performance and
	robustness of the writing.
 * Preliminary support for parallel data processing. Code to write data
	to multiple servers in parallel and balance reading between them was
	imported, but is not used right now.
 * Fair number of bugfixes.

Basic POHMELFS features:
 * Local coherent (notes [2]) cache for data and metadata.
 * Completely async processing of all events (hard and symlinks are the only 
    	exceptions) including object creation and data reading/writing.
 * Flexible object architecture optimized for network processing. Ability to
    	create long pathes to object and remove arbitrary huge directoris in 
	single network command.
 * High performance is one of the main design goals.
 * Very fast and scalable multithreaded userspace server. Being in userspace
    	it works with any underlying filesystem and still is much faster than
	async ni-kernel NFS one.
 * Client is able to switch between different servers (if one goes down,
	client automatically reconnects to second and so on).
 * Transactions support. Full failover for all operations. Resending
	transactions to different servers on timeout or error.

Roadmap includes:
 * Server redundancy extensions (ability to store data in multiple locations
	according to regexp rules, like '*.txt' in /root1 and '*.jpg' in /root1
	and /root2.
 * Strong authentification and possible data encryption in network
	channel.
 * Async writing of the data from receiving kernel thread into userspace
	pages via copy_to_user() (check development tracking blog for results).
 * Client parallel extensions: ability to write to multiple servers and
	balance reading between them. Code was imported to the current version,
	but not enabled yet.
 * Client dynamical server reconfiguration: ability to add/remove servers
	from working set by server command and from userspace.
 * Start generic server distribution development.

One can grab sources from archive or git [2] or check homepage [3].

The nearest roadmap (next release is scheduled for the start of the month) includes:
 * Improved reading transactions.
 * Server redundancy extensions (ability to store data in multiple
	locations according to regext rules, like '*.txt' in /root1 and '*.jpg'
	in /root1 and /root2.
 * Client parallel extensions: ability to write to multiple servers and
	balance reading between them. Code was imported to the current
	version, but not enabled yet.
 * Client dynamical server reconfiguration: ability to add/remove servers
	from working set by server command and from userspace.

Thank you.

1. POHMELFS development status.
http://tservice.net.ru/~s0mbre/blog/devel/fs/index.html

2. Source archive.
http://tservice.net.ru/~s0mbre/archive/pohmelfs/
Git tree.
http://tservice.net.ru/~s0mbre/archive/pohmelfs/pohmelfs.git/

3. POHMELFS homepage.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=pohmelfs

4. POHMELFS vs NFS benchmark [iozone results are coming].
http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_04_18.html
http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_04_14.html
http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_05_12.html

5. Cache-coherency notes.
http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_05_17.html

Signed-off-by: Evgeniy Polyakov 

 fs/Kconfig               |    2 +
 fs/Makefile              |    1 +
 fs/pohmelfs/Kconfig      |   25 +
 fs/pohmelfs/Makefile     |    3 +
 fs/pohmelfs/config.c     |  148 ++++
 fs/pohmelfs/dir.c        |  961 ++++++++++++++++++++++++
 fs/pohmelfs/inode.c      | 1819 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/pohmelfs/net.c        |  978 +++++++++++++++++++++++++
 fs/pohmelfs/netfs.h      |  496 +++++++++++++
 fs/pohmelfs/path_entry.c |  296 ++++++++
 fs/pohmelfs/trans.c      |  609 ++++++++++++++++
 11 files changed, 5338 insertions(+), 0 deletions(-)

Archive of above thread.