Evgeniy Polyakov announced the latest release of his Parallel Optimized Host Message Exchange Layered File System, POHMELFS. He noted that the big new feature in this release is strong crypto support, "one can specify [an] encryption method (like cbc(aes), hash or digest, or all of them to be performed on [the] whole data channel (except headers)." In his blog, Evgeniy adds, "Cryptography support is [an] essential addition to the POHMELFS core. It was implemented with performance in mind, so that processing speeds would not drop noticeably even [during] very CPU-hungry operations". He explained, "POHMELFS utilizes [a configurable number of] pools of crypto threads, which perform data crypto processing and submit it either to [the] network or VFS layer." He included results from some performance benchmarks.
Evgeniy describes POHMELFS as "a high performance network filesystem with [a] locally coherent cache of data and metadata. Its main goal is distributed parallel processing of data. [The filesystem] supports [a] strong transaction model with failover recovery, allows encryption/hashing [of the entire] data channel, and performs read load balancing and write to multiple servers in parallel." When asked on his blog when he plans to push the new filesystem for mainline kernel inclusion, Evgeniy noted, "I do not know, maybe its time to push it upstream, but I do not want to bother with Linux kernel politics. We will see soon."
"I regularly run and post various benchmarks comparing POHMELFS, NFS, XFS and Ext4, [the] main goal of POHMELFS at this stage is to be essentially as fast as [the] underlying local filesystem. And it is..." explained Evgeniy Polyakov, suggesting that the POHMELFS networking filesystem performs 10% to 300% faster than NFS, depending on the file operation. In particular, he noted that it still suffers from random reads, an area that he's currently focused on fixing. He summarized the new features found in the latest release:
"Read request (data read, directory listing, lookup requests) balancing between multiple servers; write requests are sent to multiple servers and completed only when all of them send an ack; [the] ability to add and/or remove servers from [the] working set at run-time from userspace; documentation (overall view and protocol commands); rename command; several new mount options to control client behaviour instead of hard coded numbers."
Looking forward, Evgeniy noted that this was likely the last non-bugfix release of the kernel client side implementation, suggesting that the next release would focus on adding server side features, "needed for distributed parallel data processing (like the ability to add new servers via network commands from another server), so most of the work will be devoted to server code."
"This is a high performance network filesystem with a local coherent cache of data and metadata. Its main goal is distributed parallel processing of data," Evgeniy Polyakov said, announcing the latest version of his Parallel Optimized Host Message Exchange Layered File System. He noted that in addition to numerous bugfixes, the latest release includes the following new features:
"Full transaction support for all operations (object creation/removal, data reading and writing); Data and metadata cache coherency support; Transaction timeout based resending, if [a] given transaction did not receive [a] reply after specified timeout, [the] transaction will be resent (possibly to different server); Switched writepage path to ->sendpage() which improved performance and robustness of the writing."
Evgeniy also noted that he has started working on support for parallel data processing, one of the key intended features of the filesystem. He explained that initial logic has been added so data can be written to multiple servers at the same time, and reads can be balanced across the multiple servers, though the logic is not yet being used by the filesystem.
"This is a high performance network filesystem with local coherent cache of data and metadata. Its main goal is distributed parallel processing of data. Network filesystem is a client transport. POHMELFS protocol was proven to be superior to NFS in lots (if not all, then it is in a roadmap) operations."
This latest release prompted Jeff Garzik to reply, "this continues to be a neat and interesting project :)" New features include fast transactions, round-robin failover, and near-wire limit performance. This adds to existing features which include a local coherent data and metadata cache, async processing of most events, and a fast and scalable multi threaded user space server. Planned features include a server extension to allow mirroring data across multiple devices, strong authentication, and possible data encryption when transferring data over the network. Evgeniy linked to several benchmarks in his blog.
"I'm pleased to announce [the] 7'th and final release of the distributed storage subsystem (DST)," Evgeniy Polyakov stated, completing the TODO list on the project's web page. He titled the release, "squizzed black-out of the dancing back-aching hippo", noting, "it clearly shows my condition". New features in this release include checksum support, extended auto-configuration for detecting and auto-enabling checksums if supported by the remote host, new sysfs files for marking a given node as clean (in-sync) or dirty (not-in-sync), and numerous bug fixes.
Evgeniy released the first version of his distributed storage subsystem in July of 2007. In September he explained that this was the first step in a larger distributed filesystem project he's planning. In late October, Andrew Morton noted that the work looked ready to be merged into his -mm kernel.
Andrew Morton responded favorably to Evgeniy Polyakov's most recent release of his distributed storage subsystem, "I went back and re-read last month's discussion and I'm not seeing any reason why we shouldn't start thinking about merging this." He then asked, "how close is it to that stage? A peek at your development blog indicates that things are still changing at a moderate rate?" Evgeniy replied:
"I completed storage layer development itself, the only remaining todo item is to implement [a] new redundancy algorithm, but I did not see major demand on that, so it will stay for now with low priority. I will use DST as a transport layer for [a] distributed filesystem, and probably that will require additional features, I have no clean design so far, but right now I have nothing in the pipe to commit to DST."
Evgeniy Polyakov announced a new version of his distributed storage subsystem, "this release includes [a] mirroring algorithm extension, which allows [the subsystem] to store [the] 'age' of the given node on the underlying media." He went on to explain why this was useful:
"In this case, if [a] failed node gets new media, which does not contain [the] correct 'age' (unique id assigned to the whole storage during initialization time), the whole node will be marked as dirty and eventually resynced.
"This allows [it] to have [a] completely transparent failure recovery - [the] failed node can be just turned off, its hardware fixed and then turned on. DST core will detect [the] connection reset and automatically reconnect when [the] node is ready and resync if needed without any special administrator's steps."
Evgeniy Polyakov, listed as the connector and w1 subsystem maintainer, announced the first release of his distributed storage subsystem, "which allows [you] to form storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages." He describes the features of this new block device: "zero additional allocations in the common fast path not counting network allocations; zero-copy sending if supported by device using sendpage(); ability to use any implemented algorithm (linear algo implemented); pluggable mapping algorithms; failover recovery in case of broken link; ability to suspend remote node for maintenance without breaking dataflow to another nodes (if supported by algorithm and block layer) and without turning down main node; initial autoconfiguration (ability to request remote node size and use that dynamic data during array setup time); non-blocking network data processing; support for any kind of network media (not limited to tcp or inet protocols); no need for any special tools for data processing (like special userspace applications) except for configuration; userspace and kernelspace targets."
In his blog, Evgeniy noted a similarity to the recently discussed DRBD. In the recent announcement he compares his solution to iSCSI and NBD noting the following advantages: "non-blocking processing without busy loops; small, pluggable architecture; failover recovery (reconnect to remote target); autoconfiguration; no additional allocations; very simple; works with different network protocols; and storage can be formed on top of remote nodes and be exported simultaneously".