POHMELFS core. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> diff --git a/Documentation/filesystems/pohmelfs/design_notes.txt b/Documentation/filesystems/pohmelfs/design_notes.txt new file mode 100644 index 0000000..0b20a47 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/design_notes.txt @@ -0,0 +1,70 @@ +POHMELFS: Parallel Optimized Host Message Exchange Layered File System. + + Evgeniy Polyakov <johnpol@2ka.mipt.ru> + +Homepage: http://tservice.net.ru/~s0mbre/old/?section=projects&item=pohmelfs + +It was first started as network filesystem with coherent local data and metadata caches, +but it is being evolved into parallel distibuted filesystem now. + +Main features of this FS include: + * Local coherent (notes cache for data and metadata: + http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_05_17.html) + * Completely async processing of all events (hard, symlinks and rename are the + only exceptions) including object creation and data reading and writing. + * Flexible object architecture optimized for network processing. + Ability to create long pathes to object and remove arbitrary huge + directoris in single network command. + (like removing the whole kernel tree via single network command). + * Very high performance. + * Fast and scalable multithreaded userspace server. Being in userspace it works + with any underlying filesystem and still is much faster than async in-kernel NFS one. + * Client is able to switch between different servers (if one goes down, client + automatically reconnects to second and so on). + * Transactions support. Full failover for all operations. + Resending transactions to different servers on timeout or error. + * Read requests (data read, directory listing, lookup requests) balancing between multiple servers. + * Write requests are sent to multiple servers and completed only when all of them sent an ack. + * Ability to add and/or remove servers from working set at run-time from userspace (via netlink, + so the same command can be processed from real network though, but since server does + not support it yet, I dropped network part). + + +POHMELFS is based on transactions, which are potentially long-standing objects, which live +in clients memory. Each transaction contains all information needed to process given command +(or set of command, which is frequently used during data writing: single transaction can contain +creation and data writing commands). Transaction is committed by all servers, where it was sent, +and in case of failure of one or another, it will be eventually resent or dropped with error, +so, for example, reading will return error, if no servers are available. + +POHMELFS uses novel asynchronous approach of data processing. Courtesy to transactions, it is +possible to detouch reply from request, and if command requires data to be received, caller +just sleeps waiting for it. Thus it is possible to issue multiple read commands to different +servers and async threads will pick replies in parallel, find appropriate transactions in the +system and put data where it belongs (like page or inode cache). + +Main feature of the POHMELFS is writeback data and metadata cache. +Only few non-performance critical operations use write-through cache and are synchronous: +hard and symbolic link creation and object rename. Creation and removal of objects, +as long as writing, are asynchronous and are sent to the server during system writeback. +When server receives some request for given object in the system (like data reading, +or file creation or whatever else), it stores appropriate client information in own cache, +so when subsequent request comes from different client, all previous could be notified +(for example when several clients read data from file, and then new client writes there, +appropriate pages on clients will be invalidated, so subsequent write will force them +to read page from the server). Because of this feature POHMELFS is extremely fast in metadata +intensive workloads, and can fully utilize bandwidth to servers when doing bulk data transafers. + +POHMELFS client operates with working set of servers, and is capable of balancing reading, +i.e. no modification, like lookup or directory listing, workload between all servers it was conneted to. +Administrator can add or remove servers from the set in run-time via special command (described +in Documentation/pohmelfs/info.txt file). Writing is performed to all servers. + +POHEMLFS is capable of full data channel encryption and/or strong crypto hashing. +One can select kernel supported cipher, encryption mode, hash type and operation mode +(hmac or digest). It is also possible to use both or neither (default). Crypto configuration +is checked during mount time, and if server does not support it, appropriate capabilities +will be disabled or mount will fail (if 'crypto_fail_unsupported' mount option is specified). +Crypto performance heavily depends on number of crypto threads, which asynchronously perform +crypto operations and send data to server or submit it higher to the stack. Number of crypto +threads is controlled via mount option. diff --git a/Documentation/filesystems/pohmelfs/info.txt b/Documentation/filesystems/pohmelfs/info.txt new file mode 100644 index 0000000..f96f09d --- /dev/null +++ b/Documentation/filesystems/pohmelfs/info.txt @@ -0,0 +1,81 @@ +POHMELFS usage information. + +Mount options: +idx=%u + Each mountpoint is associated with special index via this option. + Administrator can add or remove servers from given index, so all mounts, + which were attached to it, were updated. + Default it is 0. + +trans_scan_timeout=%u + This timeout, expressed in milliseconds, specifies time to scan trasaction + trees looking for stale requests, which have to be resent, or if number of + retries exceed specified limit, dropped with error. + Default is 5 seconds. + +drop_scan_timeout=%u + Internal timeout, expressed in milliseconds, which specifies how frequently + inodes marked to be dropped are freed. It also specifies how frequently + system checks, that servers has to be added or removed from current working set. + Default is 1 second. + +wait_on_page_timeout=%u + Number of milliseconds to wait for reply from remote server for data reading command. + If this timeout is exceeded, reading returns error. + Default is 5 seconds. + +trans_retries=%u + Number of times, transaction will be resent to the server, which did not answer for the + last @trans_scan_timeout milliseconds. When number of resends exceeds this limit, + transaction is completed with error. + Default is 5 resends. + +crypto_thread_num=%u + Number of crypto processing threads. Threads are used both for RX and TX traffic. + Default is 2, or no threads if crypto operations are not supported. + +trans_max_pages=%u + Maximum number of pages in single transaction. This parameter also control number of pages, + allocated for crypto processing (each crypto thread has pool of pages, number of which is + equal to 'trans_max_pages'. + Default is 100 pages. + +crypto_fail_unsupported + If specified, mount will fail if server does not support requested crypto operations. + By default mount will disable non-matching crypto operations. + +Usage examples. + +Add (or remove if it already exists) server server1.net:1025 into working set with index $idx +with appropriate hash algorithm and key file and cipher algorithm, mode and key file: +$cfg -a server1.net -p 1025 -i $idx -K $hash_key -H "hmac(sha1)" -C "cbc(aes)" -k $cipher_key + +Mount filesystem with given index $idx to /mnt mountpoint. +Client will connect to all servers specified in working set via previous command: +mount -t pohmel -o idx=$idx q /mnt + +One can add or remove servers from working set after mounting too. + + +Server installation. + +Creating a server, which listens at port 1025 and 0.0.0.0 address. +Working root directory (note, that server chroots there, so you have to have appropriate permissions) +is set to /mnt, server supports SHA1 hasing (with 'hash_key' key file) and AES-CBC encryption +(with 'cipher_key' key file, its size specifies cipher blocksize) +Number of working threads is set to 10. + +# ./fserver -a 0.0.0.0 -p 1025 -r /mnt -w 10 -K hash_key -H "hmac(sha1)" -C "cbc(aes)" -k cipher_key + + -r root - path to root directory. Default: /tmp. + -a addr - listen address. Default: 0.0.0.0. + -p port - listen port. Default: 1025. + -w workers - number of workers per connected client. Default: 1. + -K file - hash key size. Default: none. + -H hash - hash type. Default: none. + -k file - cipher key size. Default: none. + -C hash - cipher type. Crypto mode should be specified as 'mode(cipher)'. Default: none. + -h - this help. + +Number of worker threads specifies how many workers will be created for each client. +Bulk single-client transafers usually are better handled with smaller number (like 1-3). diff --git a/Documentation/filesystems/pohmelfs/network_protocol.txt b/Documentation/filesystems/pohmelfs/network_protocol.txt new file mode 100644 index 0000000..bc13058 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/network_protocol.txt @@ -0,0 +1,220 @@ +POHMELFS network protocol. + +Basic structure used in network communication is following command: + +struct netfs_cmd +{ + __u16 cmd; /* Command number */ + __u16 csize; /* Attached crypto information size */ + __u16 cpad; /* Attached padding size */ + __u16 ext; /* External flags */ + __u32 size; /* Size of the attached data */ + __u32 trans; /* Transaction id */ + __u64 id; /* Object ID to operate on. Used for feedback.*/ + __u64 start; /* Start of the object. */ + __u64 iv; /* IV sequence */ + __u8 data[0]; +}; + +Commands can be embedded into transaction command (which in turn has own command), +so one can extend protocol as needed without breaking backward compatibility as long +as old commands are supported. All string lengths include tail 0 byte. + +@cmd - command number, which specifies command to be processed. Following + commands are used currently: + + NETFS_READDIR = 1, /* Read directory for given inode number */ + NETFS_READ_PAGE, /* Read data page from the server */ + NETFS_WRITE_PAGE, /* Write data page to the server */ + NETFS_CREATE, /* Create directory entry */ + NETFS_REMOVE, /* Remove directory entry */ + NETFS_LOOKUP, /* Lookup single object */ + NETFS_LINK, /* Create a link */ + NETFS_TRANS, /* Transaction */ + NETFS_OPEN, /* Open intent */ + NETFS_INODE_INFO, /* Metadata cache coherency synchronization message */ + NETFS_JOIN_GROUP, /* Joing metadata update group */ + NETFS_LEAVE_GROUP, /* Leave metadata update group */ + NETFS_PAGE_CACHE, /* Page cache invalidation message */ + NETFS_READ_PAGES, /* Read multiple contiguous pages in one go */ + NETFS_RENAME, /* Rename object */ + NETFS_CAPABILITIES, /* Capabilities of the client, for example supported crypto */ + +@ext - external flags. Used by different commands to specify some extra arguments + like partial size of the embedded objects or creation flags. + +@size - size of the attached data. For NETFS_READ_PAGE and NETFS_READ_PAGES no data is attached, + but size of the requested data is incorporated here. It does not include size of the command + header (struct netfs_cmd) itself. + +@id - id of the object this command operates on. Each command can use it for own purpose. + +@start - start of the object this command operates on. Each command can use it for own purpose. + +@csize, @cpad - size and padding size of the (attached if needed) crypto information. + +Command specifications. + +@NETFS_READDIR +This command is used to sync content of the remote dir to the client. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to read. +@start - zero. + + +@NETFS_READ_PAGE +This command is used to read data from remote server. +Data size does not exceed local page cache size. + +@id - inode number. +@start - first byte offset. +@size - number of bytes to read plus length of the path to object. +@ext - object path length. + + +@NETFS_CREATE +Used to create object. +It does not require that all directories on top of the object were +already created, it will create them automatically. Each object has +associated @netfs_path_entry data structure, which contains creation +mode (permissions and type) and length of the name as long as name itself. + +@start - 0 +@size - size of the all data structures needed to create a path +@id - local inode number +@ext - 0 + + +@NETFS_REMOVE +Used to remove object. + +@ext - length of the path to object. +@size - the same. +@id - local inode number. +@start - zero. + + +@NETFS_LOOKUP +Lookup information about object on server. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to look object in. +@start - local inode number of the object to look at. + + +@NETFS_LINK +Create hard of symlink. +Command is sent as "object_path|target_path". + +@size - size of the above string. +@id - parent local inode number. +@start - 1 for symlink, 0 for hardlink. +@ext - size of the "object_path" above. + + +@NETFS_TRANS +Transaction header. + +@size - incorporates all embedded command sizes including theirs header sizes. +@start - transaction generation number - unique id used to find transaction. +@ext - transaction flags. Unused at the moment. +@id - 0. + + +@NETFS_OPEN +Open intent for given transaction. + +@id - local inode number. +@start - 0. +@size - path length to the object. +@ext - open flags (O_RDWR and so on). + + +@NETFS_INODE_INFO +Metadata update command. +It is sent to servers when attributes of the object are changed and received +when data or metadata were updated. It operates with the following structure: + +struct netfs_inode_info +{ + unsigned int mode; + unsigned int nlink; + unsigned int uid; + unsigned int gid; + unsigned int blocksize; + unsigned int padding; + __u64 ino; + __u64 blocks; + __u64 rdev; + __u64 size; + __u64 version; +}; + +It effectively mirrors stat(2) returned data. + + +@ext - path length to the object. +@size - the same plus size of the netfs_inode_info structure. +@id - local inode number. +@start - 0. + + +@NETFS_JOIN_GROUP/NETFS_LEAVE_GROUP +Metadata cache coherency synchronization messages. +They are broadcasted when new inode is created (either for new object +or object read from the server), so that server new inode number of the +object on the appropriate client. @NETFS_LEAVE_GROUP is sent when local +inode is destroyed, so that client physically can not be interested in +data or metadata updates for given inode. + +@ext - path length to the object. +@size - the same. +@id - local inode number for given object. +@start - 0. + + +@NETFS_PAGE_CACHE +Command is only received by clients. It contains information about +page to be marked as not up-to-date. Server fills this command with data, +provided by above @NETFS_JOIN_GROUP command. + +@id - client's inode number. +@start - last byte of the page to be invalidated. If it is not equal to + current inode size, it will be vmtruncated(). +@size - 0 +@ext - 0 + + +@NETFS_READ_PAGES +Used to read multiple contiguous pages in one go. + +@start - first byte of the contiguous region to read. +@size - contains of two fields: lower 8 bits are used to represent page cache shift + used by client, another 3 bytes are used to get number of pages. +@id - local inode number. +@ext - path length to the object. + + +@NETFS_RENAME +Used to rename object. +Attached data is formed into following string: "old_path|new_path". + +@id - local inode number. +@start - parent inode number. +@size - length of the above string. +@ext - length of the old path part. + + +@NETFS_CAPABILITIES +Used to exchange crypto capabilities with server. +If crypto capabilities are not supported by server, then client will disable it +or fail (if 'crypto_fail_unsupported' mount options was specified). + +@id - superblock index. Used to specify crypto information for group of servers. +@size - size of the attached capabilities structure. +@start - 0. +@size - 0. +@scsize - 0. diff --git a/fs/Kconfig b/fs/Kconfig index c509123..59935cd 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1566,6 +1566,8 @@ menuconfig NETWORK_FILESYSTEMS if NETWORK_FILESYSTEMS +source "fs/pohmelfs/Kconfig" + config NFS_FS tristate "NFS file system support" depends on INET diff --git a/fs/Makefile b/fs/Makefile index 1e7a11b..6ce6a35 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -119,3 +119,4 @@ obj-$(CONFIG_HPPFS) += hppfs/ obj-$(CONFIG_DEBUG_FS) += debugfs/ obj-$(CONFIG_OCFS2_FS) += ocfs2/ obj-$(CONFIG_GFS2_FS) += gfs2/ +obj-$(CONFIG_POHMELFS) += pohmelfs/ diff --git a/fs/pohmelfs/Kconfig b/fs/pohmelfs/Kconfig new file mode 100644 index 0000000..61f66b0 --- /dev/null +++ b/fs/pohmelfs/Kconfig @@ -0,0 +1,33 @@ +config POHMELFS + tristate "POHMELFS filesystem support" + select CONNECTOR + help + POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System. + This is a network filesystem which supports coherent caching of data and metadata + on clients. + +config POHMELFS_DEBUG + bool "POHMELFS debugging" + depends on POHMELFS + default n + help + Turns on excessive POHMELFS debugging facilities. + You usually do not want to slow things down noticebly and get really lots of kernel + messages in syslog. + +config POHMELFS_CC_GROUP + bool "POHMELFS cache coherency protocol" + depends on POHMELFS + default y + help + This allows to broadcast data and metadata cache coherency messages between clients. + Usually you want this facility, although without locking you can get different from + POSIX expectation behaviour. For more details check POHMELFS homepage and development + section. + +config POHMELFS_CRYPTO + bool "POHMELFS crypto support" + depends on CONFIG_CRYPTO_BLKCIPHER && CONFIG_CRYPTO_HASH + help + This option allows to encrypt and/or protect with strong cryptographic hash all dataflow + between server and clients. Each config group can have own keys. diff --git a/fs/pohmelfs/Makefile b/fs/pohmelfs/Makefile new file mode 100644 index 0000000..1d1325b --- /dev/null +++ b/fs/pohmelfs/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_POHMELFS) += pohmelfs.o + +pohmelfs-y := inode.o config.o dir.o net.o path_entry.o trans.o crypto.o diff --git a/fs/pohmelfs/config.c b/fs/pohmelfs/config.c new file mode 100644 index 0000000..f96dbc9 --- /dev/null +++ b/fs/pohmelfs/config.c @@ -0,0 +1,393 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/connector.h> +#include <linux/crypto.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/string.h> + +#include "netfs.h" + +/* + * Global configuration list. + * Each client can be asked to get one of them. + * + * Allows to provide remote server address (ipv4/v6/whatever), port + * and so on via kernel connector. + */ + +static struct cb_id pohmelfs_cn_id = {.idx = POHMELFS_CN_IDX, .val = POHMELFS_CN_VAL}; +static LIST_HEAD(pohmelfs_config_list); +static DEFINE_MUTEX(pohmelfs_config_lock); + +static inline int pohmelfs_config_eql(struct pohmelfs_ctl *sc, struct pohmelfs_ctl *ctl) +{ + if (sc->idx == ctl->idx && sc->type == ctl->type && + sc->proto == ctl->proto && + sc->addrlen == ctl->addrlen && + !memcmp(&sc->addr, &ctl->addr, ctl->addrlen)) + return 1; + + return 0; +} + +static struct pohmelfs_config_group *pohmelfs_find_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g, *group = NULL; + + list_for_each_entry(g, &pohmelfs_config_list, group_entry) { + if (g->idx == idx) { + group = g; + break; + } + } + + return group; +} + +static struct pohmelfs_config_group *pohmelfs_find_create_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g; + + g = pohmelfs_find_config_group(idx); + if (g) + return g; + + g = kzalloc(sizeof(struct pohmelfs_config_group), GFP_KERNEL); + if (!g) + return NULL; + + INIT_LIST_HEAD(&g->config_list); + g->idx = idx; + + list_add_tail(&g->group_entry, &pohmelfs_config_list); + + return g; +} + +int pohmelfs_copy_config(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_config *c, *dst; + int err = -ENODEV; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(psb->idx); + if (!g) + goto out_unlock; + + /* + * Run over all entries in given config group and try to crate and + * initialize those, which do not exist in superblock list. + * Skip all existing entries. + */ + + list_for_each_entry(c, &g->config_list, config_entry) { + err = 0; + list_for_each_entry(dst, &psb->state_list, config_entry) { + if (pohmelfs_config_eql(&dst->state.ctl, &c->state.ctl)) { + err = -EEXIST; + break; + } + } + + if (err) + continue; + + dst = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!dst) { + err = -ENOMEM; + break; + } + + memcpy(&dst->state.ctl, &c->state.ctl, sizeof(struct pohmelfs_ctl)); + + list_add_tail(&dst->config_entry, &psb->state_list); + + err = pohmelfs_state_init_one(psb, dst); + if (err) { + list_del(&dst->config_entry); + kfree(dst); + } + + err = 0; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int pohmelfs_copy_crypto(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + int err = -ENOENT; + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_config_group(psb->idx); + if (g) { + psb->hash_string = g->hash_string; + psb->hash_strlen = g->hash_strlen; + g->hash_string = NULL; + g->hash_strlen = 0; + + psb->cipher_string = g->cipher_string; + psb->cipher_strlen = g->cipher_strlen; + g->cipher_string = NULL; + g->cipher_strlen = 0; + + psb->hash_key = g->hash_key; + psb->hash_keysize = g->hash_keysize; + g->hash_key = NULL; + g->hash_keysize = 0; + + psb->cipher_key = g->cipher_key; + psb->cipher_keysize = g->cipher_keysize; + g->cipher_key = NULL; + g->cipher_keysize = 0; + + err = 0; + } + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +static int pohmelfs_cn_ctl(struct cn_msg *msg) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_ctl *ctl = (struct pohmelfs_ctl *)msg->data; + struct pohmelfs_config *c, *tmp; + int err = 0; + + if (msg->len != sizeof(struct pohmelfs_ctl)) + return -EBADMSG; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_create_config_group(ctl->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + struct pohmelfs_ctl *sc = &c->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + list_del(&c->config_entry); + kfree(c); + err = -EEXIST; + goto out_unlock; + } + } + + c = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!c) { + err = -ENOMEM; + goto out_unlock; + } + + err = 0; + memcpy(&c->state.ctl, ctl, sizeof(struct pohmelfs_ctl)); + list_add_tail(&c->config_entry, &g->config_list); + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +static int pohmelfs_crypto_hash_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->hash_string) + return -EEXIST; + + g->hash_string = kstrdup(algo, GFP_KERNEL); + if (!g->hash_string) + return -ENOMEM; + g->hash_strlen = c->strlen; + g->hash_keysize = c->keysize; + + g->hash_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->hash_key) { + kfree(g->hash_string); + return -ENOMEM; + } + + memcpy(g->hash_key, key, c->keysize); + + return 0; +} + +static int pohmelfs_crypto_cipher_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->cipher_string) + return -EEXIST; + + g->cipher_string = kstrdup(algo, GFP_KERNEL); + if (!g->cipher_string) + return -ENOMEM; + g->cipher_strlen = c->strlen; + g->cipher_keysize = c->keysize; + + g->cipher_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->cipher_key) { + kfree(g->cipher_string); + return -ENOMEM; + } + + memcpy(g->cipher_key, key, c->keysize); + + return 0; +} + + +static int pohmelfs_cn_crypto(struct cn_msg *msg) +{ + struct pohmelfs_crypto *crypto = (struct pohmelfs_crypto *)msg->data; + struct pohmelfs_config_group *g; + int err = 0; + + dprintk("%s: idx: %u, strlen: %u, type: %u, keysize: %u, algo: %s.\n", + __func__, crypto->idx, crypto->strlen, crypto->type, + crypto->keysize, (char *)crypto->data); + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_create_config_group(crypto->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + switch (crypto->type) { + case POHMELFS_CRYPTO_HASH: + err = pohmelfs_crypto_hash_init(g, crypto); + break; + case POHMELFS_CRYPTO_CIPHER: + err = pohmelfs_crypto_cipher_init(g, crypto); + break; + default: + err = -ENOTSUPP; + break; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +static void pohmelfs_cn_callback(void *data) +{ + struct cn_msg *msg = data; + struct pohmelfs_cn_ack *ack; + int err; + + switch (msg->flags) { + case POHMELFS_FLAGS_CTL: + err = pohmelfs_cn_ctl(msg); + break; + case POHMELFS_FLAGS_CRYPTO: + err = pohmelfs_cn_crypto(msg); + break; + default: + err = -ENOSYS; + break; + } + + ack = kmalloc(sizeof(struct pohmelfs_cn_ack), GFP_KERNEL); + if (!ack) + return; + + memcpy(&ack->msg, msg, sizeof(struct cn_msg)); + + ack->msg.ack = msg->ack + 1; + ack->msg.len = sizeof(struct pohmelfs_cn_ack) - sizeof(struct cn_msg); + + ack->error = err; + + cn_netlink_send(&ack->msg, 0, GFP_KERNEL); + kfree(ack); +} + +int pohmelfs_config_check(struct pohmelfs_config *config, int idx) +{ + struct pohmelfs_ctl *ctl = &config->state.ctl; + struct pohmelfs_config *tmp; + int err = -ENOENT; + struct pohmelfs_ctl *sc; + struct pohmelfs_config_group *g; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(ctl->idx); + if (g) { + list_for_each_entry(tmp, &g->config_list, config_entry) { + sc = &tmp->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + err = 0; + break; + } + } + } + + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int __init pohmelfs_config_init(void) +{ + return cn_add_callback(&pohmelfs_cn_id, "pohmelfs", pohmelfs_cn_callback); +} + +void __exit pohmelfs_config_exit(void) +{ + struct pohmelfs_config *c, *tmp; + struct pohmelfs_config_group *g, *gtmp; + + cn_del_callback(&pohmelfs_cn_id); + + mutex_lock(&pohmelfs_config_lock); + list_for_each_entry_safe(g, gtmp, &pohmelfs_config_list, group_entry) { + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + list_del(&c->config_entry); + kfree(c); + } + + list_del(&g->group_entry); + + if (g->hash_string) + kfree(g->hash_string); + + if (g->cipher_string) + kfree(g->cipher_string); + + kfree(g); + } + mutex_unlock(&pohmelfs_config_lock); +} diff --git a/fs/pohmelfs/crypto.c b/fs/pohmelfs/crypto.c new file mode 100644 index 0000000..bc5f010 --- /dev/null +++ b/fs/pohmelfs/crypto.c @@ -0,0 +1,862 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/crypto.h> +#include <linux/highmem.h> +#include <linux/kthread.h> +#include <linux/pagemap.h> +#include <linux/slab.h> + +#include "netfs.h" + +static struct crypto_hash *pohmelfs_init_hash(struct pohmelfs_sb *psb) +{ + int err; + struct crypto_hash *hash; + + hash = crypto_alloc_hash(psb->hash_string, 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(hash)) { + err = PTR_ERR(hash); + dprintk("%s: idx: %u: failed to allocate hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_exit; + } + + psb->crypto_attached_size = crypto_hash_digestsize(hash); + + if (!psb->hash_keysize) + return hash; + + err = crypto_hash_setkey(hash, psb->hash_key, psb->hash_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_free; + } + + return hash; + +err_out_free: + crypto_free_hash(hash); +err_out_exit: + return ERR_PTR(err); +} + +static struct crypto_ablkcipher *pohmelfs_init_cipher(struct pohmelfs_sb *psb) +{ + int err = -EINVAL; + struct crypto_ablkcipher *cipher; + + if (!psb->cipher_keysize) + goto err_out_exit; + + cipher = crypto_alloc_ablkcipher(psb->cipher_string, 0, 0); + if (IS_ERR(cipher)) { + err = PTR_ERR(cipher); + dprintk("%s: idx: %u: failed to allocate cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_exit; + } + + crypto_ablkcipher_clear_flags(cipher, ~0); + + err = crypto_ablkcipher_setkey(cipher, psb->cipher_key, psb->cipher_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_free; + } + + return cipher; + +err_out_free: + crypto_free_ablkcipher(cipher); +err_out_exit: + return ERR_PTR(err); +} + +int pohmelfs_crypto_engine_init(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + int err; + + e->page_num = 0; + + e->size = PAGE_SIZE; + e->data = kmalloc(e->size, GFP_KERNEL); + if (!e->data) { + err = -ENOMEM; + goto err_out_exit; + } + + if (psb->hash_string) { + e->hash = pohmelfs_init_hash(psb); + if (IS_ERR(e->hash)) { + err = PTR_ERR(e->hash); + e->hash = NULL; + goto err_out_free; + } + } + + if (psb->cipher_string) { + e->cipher = pohmelfs_init_cipher(psb); + if (IS_ERR(e->cipher)) { + err = PTR_ERR(e->cipher); + e->cipher = NULL; + goto err_out_free_hash; + } + } + + return 0; + +err_out_free_hash: + crypto_free_hash(e->hash); +err_out_free: + kfree(e->data); +err_out_exit: + return err; +} + +void pohmelfs_crypto_engine_exit(struct pohmelfs_crypto_engine *e) +{ + if (e->hash) + crypto_free_hash(e->hash); + if (e->cipher) + crypto_free_ablkcipher(e->cipher); + kfree(e->data); +} + +static void pohmelfs_crypto_complete(struct crypto_async_request *req, int err) +{ + struct pohmelfs_crypto_completion *c = req->data; + + if (err == -EINPROGRESS) + return; + + dprintk("%s: req: %p, err: %d.\n", __func__, req, err); + c->error = err; + complete(&c->complete); +} + +static int pohmelfs_crypto_process(struct ablkcipher_request *req, + struct scatterlist *sg_dst, struct scatterlist *sg_src, + void *iv, int enc, unsigned long timeout) +{ + struct pohmelfs_crypto_completion complete; + int err; + + init_completion(&complete.complete); + complete.error = -EINPROGRESS; + + ablkcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, + pohmelfs_crypto_complete, &complete); + + ablkcipher_request_set_crypt(req, sg_src, sg_dst, sg_src->length, iv); + + if (enc) + err = crypto_ablkcipher_encrypt(req); + else + err = crypto_ablkcipher_decrypt(req); + + switch (err) { + case -EINPROGRESS: + case -EBUSY: + err = wait_for_completion_interruptible_timeout(&complete.complete, + timeout); + if (!err) + err = -ETIMEDOUT; + else + err = complete.error; + break; + default: + break; + } + + return err; +} + +int pohmelfs_crypto_process_input_data(struct pohmelfs_crypto_engine *e, u64 cmd_iv, + void *data, struct page *page, unsigned int size) +{ + int err; + struct scatterlist sg; + + dprintk("%s: eng: %p, iv: %llx, data: %p, page: %p/%lu, size: %u: ", + __func__, e, cmd_iv, data, page, (page)?page->index:0, size); + + if (data) { + sg_init_one(&sg, data, size); + } else { + sg_init_table(&sg, 1); + sg_set_page(&sg, page, size, 0); + } + + if (e->cipher) { + struct ablkcipher_request *req = e->data + crypto_hash_digestsize(e->hash); + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + memcpy(iv, &cmd_iv, sizeof(cmd_iv)); + + ablkcipher_request_set_tfm(req, e->cipher); + + err = pohmelfs_crypto_process(req, &sg, &sg, iv, 0, e->timeout); + if (err) + goto err_out_exit; + } + + if (e->hash) { + struct hash_desc desc; + void *dst = e->data + e->size/2; + + desc.tfm = e->hash; + desc.flags = 0; + + err = crypto_hash_init(&desc); + if (err) + goto err_out_exit; + + err = crypto_hash_update(&desc, &sg, size); + if (err) + goto err_out_exit; + + err = crypto_hash_final(&desc, dst); + if (err) + goto err_out_exit; + + err = !!memcmp(dst, e->data, crypto_hash_digestsize(e->hash)); + + if (err) { +#ifdef CONFIG_POHMELFS_DEBUG + unsigned int i; + unsigned char *recv = e->data, *calc = dst; + + dprintk("%s: eng: %p, hash: %p, cipher: %p: iv : %llx, hash mismatch (recv/calc): ", + __func__, e, e->hash, e->cipher, cmd_iv); + for (i=0; i<crypto_hash_digestsize(e->hash); ++i) { +#if 0 + dprintk("%02x ", recv[i]); + if (recv[i] != calc[i]) { + dprintk("| calc byte: %02x.\n", calc[i]); + break; + } +#else + dprintk("%02x/%02x ", recv[i], calc[i]); +#endif + } + dprintk("\n"); +#endif + goto err_out_exit; + } else { + dprintk("%s: eng: %p, hash: %p, cipher: %p: hashes matched.\n", + __func__, e, e->hash, e->cipher); + } + } + + dprintk("%s: eng: %p, hash: %p, cipher: %p: completed.\n", + __func__, e, e->hash, e->cipher); + + return 0; + +err_out_exit: + dprintk("%s: eng: %p, hash: %p, cipher: %p: err: %d.\n", + __func__, e, e->hash, e->cipher, err); + return err; +} + +static int pohmelfs_trans_iter(struct netfs_trans *t, struct pohmelfs_crypto_engine *e, + int (* iterator) (struct pohmelfs_crypto_engine *e, + struct scatterlist *dst, + struct scatterlist *src)) +{ + void *data = t->iovec.iov_base + sizeof(struct netfs_cmd) + t->psb->crypto_attached_size; + unsigned int size = t->iovec.iov_len - sizeof(struct netfs_cmd) - t->psb->crypto_attached_size; + struct netfs_cmd *cmd = data; + unsigned int sz, pages = t->attached_pages, i, csize, cmd_cmd, dpage_idx; + struct scatterlist sg_src, sg_dst; + int err; + + while (size) { + cmd = data; + cmd_cmd = __be16_to_cpu(cmd->cmd); + csize = __be32_to_cpu(cmd->size); + cmd->iv = __cpu_to_be64(e->iv); + + if (cmd_cmd == NETFS_READ_PAGES || cmd_cmd == NETFS_READ_PAGE) + csize = __be16_to_cpu(cmd->ext); + + sz = csize + __be16_to_cpu(cmd->cpad) + sizeof(struct netfs_cmd); + + dprintk("%s: size: %u, sz: %u, cmd_size: %u, cmd_cpad: %u.\n", + __func__, size, sz, __be32_to_cpu(cmd->size), __be16_to_cpu(cmd->cpad)); + + data += sz; + size -= sz; + + sg_init_one(&sg_src, cmd->data, sz - sizeof(struct netfs_cmd)); + sg_init_one(&sg_dst, cmd->data, sz - sizeof(struct netfs_cmd)); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + } + + if (!pages) + return 0; + + dpage_idx = 0; + for (i=0; i<t->page_num; ++i) { + struct page *page = t->pages[i]; + struct page *dpage = e->pages[dpage_idx]; + + if (!page) + continue; + + sg_init_table(&sg_src, 1); + sg_init_table(&sg_dst, 1); + sg_set_page(&sg_src, page, page_private(page), 0); + sg_set_page(&sg_dst, dpage, page_private(page), 0); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + + pages--; + if (!pages) + break; + dpage_idx++; + } + + return 0; +} + +static int pohmelfs_encrypt_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + struct ablkcipher_request *req = e->data; + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + + memcpy(iv, &e->iv, sizeof(e->iv)); + + return pohmelfs_crypto_process(req, sg_dst, sg_src, iv, 1, e->timeout); +} + +static int pohmelfs_encrypt(struct pohmelfs_crypto_thread *tc) +{ + struct netfs_trans *t = tc->trans; + struct pohmelfs_crypto_engine *e = &tc->eng; + struct ablkcipher_request *req = e->data; + + memset(req, 0, sizeof(struct ablkcipher_request)); + ablkcipher_request_set_tfm(req, e->cipher); + + e->iv = pohmelfs_gen_iv(t); + + return pohmelfs_trans_iter(t, e, pohmelfs_encrypt_iterator); +} + +static int pohmelfs_hash_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + return crypto_hash_update(e->data, sg_src, sg_src->length); +} + +static int pohmelfs_hash(struct pohmelfs_crypto_thread *tc) +{ + struct pohmelfs_crypto_engine *e = &tc->eng; + struct hash_desc *desc = e->data; + unsigned char *dst = tc->trans->iovec.iov_base + sizeof(struct netfs_cmd); + int err; + + desc->tfm = e->hash; + desc->flags = 0; + + err = crypto_hash_init(desc); + if (err) + return err; + + err = pohmelfs_trans_iter(tc->trans, e, pohmelfs_hash_iterator); + if (err) + return err; + + err = crypto_hash_final(desc, dst); + if (err) + return err; + + { + unsigned int i; + dprintk("%s: ", __func__); + for (i=0; i<tc->psb->crypto_attached_size; ++i) + dprintk("%02x ", dst[i]); + dprintk("\n"); + } + + return 0; +} + +static void pohmelfs_crypto_pages_free(struct pohmelfs_crypto_engine *e) +{ + unsigned int i; + + for (i=0; i<e->page_num; ++i) + __free_page(e->pages[i]); + kfree(e->pages); +} + +static int pohmelfs_crypto_pages_alloc(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + unsigned int i; + + e->pages = kmalloc(psb->trans_max_pages * sizeof(struct page *), GFP_KERNEL); + if (!e->pages) + return -ENOMEM; + + for (i=0; i<psb->trans_max_pages; ++i) { + e->pages[i] = alloc_page(GFP_KERNEL); + if (!e->pages[i]) + break; + } + + e->page_num = i; + if (!e->page_num) + goto err_out_free; + + return 0; + +err_out_free: + kfree(e->pages); + return -ENOMEM; +} + +static void pohmelfs_sys_crypto_exit_one(struct pohmelfs_crypto_thread *t) +{ + if (t->thread) + kthread_stop(t->thread); + pohmelfs_crypto_engine_exit(&t->eng); + pohmelfs_crypto_pages_free(&t->eng); + kfree(t); +} + +static int pohmelfs_crypto_finish(struct netfs_trans *t, struct pohmelfs_sb *psb, int err) +{ + if (likely(!err)) { + struct netfs_cmd *cmd = t->iovec.iov_base; + netfs_convert_cmd(cmd); + + err = netfs_trans_finish_send(t, psb); + } + t->result = err; + netfs_trans_put(t); + + return err; +} + +void pohmelfs_crypto_thread_make_ready(struct pohmelfs_crypto_thread *th) +{ + struct pohmelfs_sb *psb = th->psb; + + th->page = NULL; + th->trans = NULL; + + mutex_lock(&psb->crypto_thread_lock); + list_move_tail(&th->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + wake_up(&psb->wait); +} + +static int pohmelfs_crypto_thread_trans(struct pohmelfs_crypto_thread *t) +{ + struct netfs_trans *trans; + int err = 0; + + trans = t->trans; + trans->eng = NULL; + + if (t->eng.hash) { + err = pohmelfs_hash(t); + if (err) + goto out_complete; + } + + if (t->eng.cipher) { + err = pohmelfs_encrypt(t); + if (err) + goto out_complete; + trans->eng = &t->eng; + } + +out_complete: + t->page = NULL; + t->trans = NULL; + + if (!trans->eng) + pohmelfs_crypto_thread_make_ready(t); + + pohmelfs_crypto_finish(trans, t->psb, err); + return err; +} + +static int pohmelfs_crypto_thread_page(struct pohmelfs_crypto_thread *t) +{ + struct pohmelfs_crypto_engine *e = &t->eng; + struct page *page = t->page; + int err; + + WARN_ON(!PageChecked(page)); + + err = pohmelfs_crypto_process_input_data(e, e->iv, NULL, page, t->size); + if (!err) + SetPageUptodate(page); + else + SetPageError(page); + unlock_page(page); + page_cache_release(page); + + pohmelfs_crypto_thread_make_ready(t); + + return err; +} + +static int pohmelfs_crypto_thread_func(void *data) +{ + struct pohmelfs_crypto_thread *t = data; + + while (!kthread_should_stop()) { + wait_event_interruptible(t->wait, kthread_should_stop() || + t->trans || t->page); + + if (kthread_should_stop()) + break; + + if (!t->trans && !t->page) + continue; + + dprintk("%s: thread: %p, trans: %p, page: %p.\n", + __func__, t, t->trans, t->page); + + if (t->trans) + pohmelfs_crypto_thread_trans(t); + else if (t->page) + pohmelfs_crypto_thread_page(t); + } + + return 0; +} + +static void pohmelfs_sys_crypto_exit(struct pohmelfs_sb *psb) +{ + struct pohmelfs_crypto_thread *t, *tmp; + + mutex_lock(&psb->crypto_thread_lock); + list_for_each_entry_safe(t, tmp, &psb->crypto_active_list, thread_entry) { + list_del(&t->thread_entry); + + pohmelfs_sys_crypto_exit_one(t); + } + + list_for_each_entry_safe(t, tmp, &psb->crypto_ready_list, thread_entry) { + list_del(&t->thread_entry); + + pohmelfs_sys_crypto_exit_one(t); + } + mutex_unlock(&psb->crypto_thread_lock); +} + +static int pohmelfs_sys_crypto_init(struct pohmelfs_sb *psb) +{ + unsigned int i; + struct pohmelfs_crypto_thread *t; + struct pohmelfs_config *c; + struct netfs_state *st; + int err; + + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + + err = pohmelfs_crypto_engine_init(&st->eng, psb); + if (err) + goto err_out_exit; + + dprintk("%s: st: %p, eng: %p, hash: %p, cipher: %p.\n", + __func__, st, &st->eng, &st->eng.hash, &st->eng.cipher); + } + + for (i=0; i<psb->crypto_thread_num; ++i) { + err = -ENOMEM; + t = kzalloc(sizeof(struct pohmelfs_crypto_thread), GFP_KERNEL); + if (!t) + goto err_out_free_state_engines; + + init_waitqueue_head(&t->wait); + + t->psb = psb; + t->trans = NULL; + t->eng.thread = t; + + err = pohmelfs_crypto_engine_init(&t->eng, psb); + if (err) + goto err_out_free_state_engines; + + err = pohmelfs_crypto_pages_alloc(&t->eng, psb); + if (err) + goto err_out_free; + + t->thread = kthread_run(pohmelfs_crypto_thread_func, t, + "pohmelfs-crypto-%d-%d", psb->idx, i); + if (IS_ERR(t->thread)) { + err = PTR_ERR(t->thread); + t->thread = NULL; + goto err_out_free; + } + + if (t->eng.cipher) + psb->crypto_align_size = crypto_ablkcipher_blocksize(t->eng.cipher); + + mutex_lock(&psb->crypto_thread_lock); + list_add_tail(&t->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + } + + psb->crypto_thread_num = i; + return 0; + +err_out_free: + pohmelfs_sys_crypto_exit_one(t); +err_out_free_state_engines: + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + pohmelfs_crypto_engine_exit(&st->eng); + } +err_out_exit: + pohmelfs_sys_crypto_exit(psb); + return err; +} + +void pohmelfs_crypto_exit(struct pohmelfs_sb *psb) +{ + pohmelfs_sys_crypto_exit(psb); + + kfree(psb->hash_string); + kfree(psb->cipher_string); +} + +static int pohmelfs_crypt_init_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_sb *psb = private; + + psb->flags = -err; + dprintk("%s: err: %d, flags: %lx.\n", __func__, err, psb->flags); + + wake_up(&psb->wait); + + return err; +} + +static int pohmelfs_crypto_init_handshake(struct pohmelfs_sb *psb) +{ + struct netfs_trans *t; + struct netfs_capabilities *cap; + struct netfs_cmd *cmd; + char *str; + int err = -ENOMEM, size; + + size = sizeof(struct netfs_capabilities) + + psb->cipher_strlen + psb->hash_strlen + 2; /* 0 bytes */ + + t = netfs_trans_alloc(psb, size, 0, 0); + if (!t) + goto err_out_exit; + + t->complete = pohmelfs_crypt_init_complete; + t->private = psb; + + cmd = netfs_trans_current(t); + cap = (struct netfs_capabilities *)(cmd + 1); + str = (char *)(cap + 1); + + cmd->cmd = NETFS_CAPABILITIES; + cmd->id = psb->idx; + cmd->size = size; + cmd->start = 0; + cmd->ext = 0; + cmd->csize = 0; + + netfs_convert_cmd(cmd); + netfs_trans_update(cmd, t, size); + + cap->hash_strlen = psb->hash_strlen; + if (cap->hash_strlen) { + sprintf(str, "%s", psb->hash_string); + str += cap->hash_strlen; + } + + cap->cipher_strlen = psb->cipher_strlen; + cap->cipher_keysize = psb->cipher_keysize; + if (cap->cipher_strlen) + sprintf(str, "%s", psb->cipher_string); + + netfs_convert_capabilities(cap); + + psb->flags = ~0; + err = netfs_trans_finish(t, psb); + if (err) + goto err_out_exit; + + err = wait_event_interruptible_timeout(psb->wait, (psb->flags != ~0), + psb->wait_on_page_timeout); + if (!err) + err = -ETIMEDOUT; + else + err = -psb->flags; + + if (!err) + psb->perform_crypto = 1; + psb->flags = 0; + + /* + * At this point NETFS_CAPABILITIES response command + * should setup superblock in a way, which is acceptible + * for both client and server, so if server refuses connection, + * it will send error in transaction response. + */ + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + return err; +} + +int pohmelfs_crypto_init(struct pohmelfs_sb *psb) +{ + int err; + + if (!psb->cipher_string && !psb->hash_string) + return 0; + + err = pohmelfs_crypto_init_handshake(psb); + if (err) + return err; + + err = pohmelfs_sys_crypto_init(psb); + if (err) + return err; + + return 0; +} + +static int pohmelfs_crypto_thread_get(struct pohmelfs_sb *psb, + int (* action)(struct pohmelfs_crypto_thread *t, void *data), void *data) +{ + struct pohmelfs_crypto_thread *t = NULL; + int err; + + while (!t) { + err = wait_event_interruptible_timeout(psb->wait, + !list_empty(&psb->crypto_ready_list), + psb->wait_on_page_timeout); + + t = NULL; + err = 0; + mutex_lock(&psb->crypto_thread_lock); + if (!list_empty(&psb->crypto_ready_list)) { + t = list_entry(psb->crypto_ready_list.prev, + struct pohmelfs_crypto_thread, + thread_entry); + + list_move_tail(&t->thread_entry, + &psb->crypto_active_list); + + action(t, data); + wake_up(&t->wait); + + } + mutex_unlock(&psb->crypto_thread_lock); + } + + return err; +} + +static int pohmelfs_trans_crypt_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct netfs_trans *trans = data; + + netfs_trans_get(trans); + t->trans = trans; + + dprintk("%s: t: %p, gen: %u, thread: %p.\n", __func__, trans, trans->gen, t); + return 0; +} + +int pohmelfs_trans_crypt(struct netfs_trans *trans, struct pohmelfs_sb *psb) +{ + if ((!psb->hash_string && !psb->cipher_string) || !psb->perform_crypto) { + netfs_trans_get(trans); + return pohmelfs_crypto_finish(trans, psb, 0); + } + + return pohmelfs_crypto_thread_get(psb, pohmelfs_trans_crypt_action, trans); +} + +struct pohmelfs_crypto_input_action_data +{ + struct page *page; + struct pohmelfs_crypto_engine *e; + u64 iv; + unsigned int size; +}; + +static int pohmelfs_crypt_input_page_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct pohmelfs_crypto_input_action_data *act = data; + + memcpy(t->eng.data, act->e->data, t->psb->crypto_attached_size); + + t->size = act->size; + t->eng.iv = act->iv; + + t->page = act->page; + return 0; +} + +int pohmelfs_crypto_process_input_page(struct pohmelfs_crypto_engine *e, + struct page *page, unsigned int size, u64 iv) +{ + struct inode *inode = page->mapping->host; + struct pohmelfs_crypto_input_action_data act; + int err = -ENOENT; + + act.page = page; + act.e = e; + act.size = size; + act.iv = iv; + + err = pohmelfs_crypto_thread_get(POHMELFS_SB(inode->i_sb), + pohmelfs_crypt_input_page_action, &act); + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + SetPageUptodate(page); + page_cache_release(page); + + return err; +} diff --git a/fs/pohmelfs/dir.c b/fs/pohmelfs/dir.c new file mode 100644 index 0000000..9fec116 --- /dev/null +++ b/fs/pohmelfs/dir.c @@ -0,0 +1,1180 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/fs.h> +#include <linux/jhash.h> +#include <linux/pagemap.h> + +#include "netfs.h" + +/* + * Each pohmelfs directory inode contains a tree of childrens indexed + * by offset (in the dir reading stream) and name hash and len. Entries + * of that hashes are called pohmelfs_name. + * + * This routings deal with it. + */ +static int pohmelfs_cmp_offset(struct pohmelfs_name *n, u64 offset) +{ + if (n->offset > offset) + return -1; + if (n->offset < offset) + return 1; + return 0; +} + +static struct pohmelfs_name *pohmelfs_search_offset(struct pohmelfs_inode *pi, u64 offset) +{ + struct rb_node *n = pi->offset_root.rb_node; + struct pohmelfs_name *tmp; + int cmp; + + while (n) { + tmp = rb_entry(n, struct pohmelfs_name, offset_node); + + cmp = pohmelfs_cmp_offset(tmp, offset); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + return tmp; + } + + return NULL; +} + +static struct pohmelfs_name *pohmelfs_insert_offset(struct pohmelfs_inode *pi, + struct pohmelfs_name *new) +{ + struct rb_node **n = &pi->offset_root.rb_node, *parent = NULL; + struct pohmelfs_name *ret = NULL, *tmp; + int cmp; + + while (*n) { + parent = *n; + + tmp = rb_entry(parent, struct pohmelfs_name, offset_node); + + cmp = pohmelfs_cmp_offset(tmp, new->offset); + if (cmp < 0) + n = &parent->rb_left; + else if (cmp > 0) + n = &parent->rb_right; + else { + ret = tmp; + break; + } + } + + if (ret) + return ret; + + rb_link_node(&new->offset_node, parent, n); + rb_insert_color(&new->offset_node, &pi->offset_root); + + pi->total_len += new->len; + + return NULL; +} + +static int pohmelfs_cmp_hash(struct pohmelfs_name *n, u32 hash, u32 len) +{ + if (n->hash > hash) + return -1; + if (n->hash < hash) + return 1; + + if (n->len > len) + return -1; + if (n->len < len) + return 1; + + return 0; +} + +static struct pohmelfs_name *pohmelfs_search_hash(struct pohmelfs_inode *pi, u32 hash, u32 len) +{ + struct rb_node *n = pi->hash_root.rb_node; + struct pohmelfs_name *tmp; + int cmp; + + while (n) { + tmp = rb_entry(n, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, hash, len); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + return tmp; + } + + return NULL; +} + +static void __pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + rb_erase(&node->offset_node, &parent->offset_root); + rb_erase(&node->hash_node, &parent->hash_root); +} + +/* + * Remove name cache entry from its caches and free it. + */ +static void pohmelfs_name_free(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + __pohmelfs_name_del(parent, node); + list_del(&node->sync_del_entry); + list_del(&node->sync_create_entry); + kfree(node); +} + +static struct pohmelfs_name *pohmelfs_insert_hash(struct pohmelfs_inode *pi, + struct pohmelfs_name *new) +{ + struct rb_node **n = &pi->hash_root.rb_node, *parent = NULL; + struct pohmelfs_name *ret = NULL, *tmp; + int cmp; + + while (*n) { + parent = *n; + + tmp = rb_entry(parent, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, new->hash, new->len); + if (cmp < 0) + n = &parent->rb_left; + else if (cmp > 0) + n = &parent->rb_right; + else { + ret = tmp; + break; + } + } + + if (ret) { + printk("%s: exist: ino: %llu, hash: %x, len: %u, data: '%s', new: ino: %llu, hash: %x, len: %u, data: '%s'.\n", + __func__, ret->ino, ret->hash, ret->len, ret->data, + new->ino, new->hash, new->len, new->data); + ret->ino = new->ino; + return ret; + } + + rb_link_node(&new->hash_node, parent, n); + rb_insert_color(&new->hash_node, &pi->hash_root); + + return NULL; +} + +/* + * Free name cache for given inode. + */ +void pohmelfs_free_names(struct pohmelfs_inode *parent) +{ + struct rb_node *rb_node; + struct pohmelfs_name *n; + + for (rb_node = rb_first(&parent->offset_root); rb_node;) { + n = rb_entry(rb_node, struct pohmelfs_name, offset_node); + rb_node = rb_next(rb_node); + + pohmelfs_name_free(parent, n); + } +} + +/* + * When name cache entry is removed (for example when object is removed), + * offset for all subsequent childrens has to be fixed to match new reality. + */ +static int pohmelfs_fix_offset(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + struct rb_node *rb_node; + int decr = 0; + + for (rb_node = rb_next(&node->offset_node); rb_node; rb_node = rb_next(rb_node)) { + struct pohmelfs_name *n = container_of(rb_node, struct pohmelfs_name, offset_node); + + n->offset -= node->len; + decr++; + } + + parent->total_len -= node->len; + + return decr; +} + +/* + * Fix offset and free name cache entry helper. + */ +void pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + int decr; + + decr = pohmelfs_fix_offset(parent, node); + + dprintk("%s: parent: %llu, ino: %llu, decr: %d.\n", + __func__, parent->ino, node->ino, decr); + + pohmelfs_name_free(parent, node); +} + +/* + * Insert new name cache entry into all caches (offset and name hash). + */ +static int pohmelfs_insert_name(struct pohmelfs_inode *parent, struct pohmelfs_name *n) +{ + struct pohmelfs_name *name; + + name = pohmelfs_insert_offset(parent, n); + if (name) + return -EEXIST; + + name = pohmelfs_insert_hash(parent, n); + if (name) { + parent->total_len -= n->len; + rb_erase(&n->offset_node, &parent->offset_root); + return -EEXIST; + } + + list_add_tail(&n->sync_create_entry, &parent->sync_create_list); + + return 0; +} + +/* + * Allocate new name cache entry. + */ +static struct pohmelfs_name *pohmelfs_name_clone(unsigned int len) +{ + struct pohmelfs_name *n; + + n = kzalloc(sizeof(struct pohmelfs_name) + len, GFP_KERNEL); + if (!n) + return NULL; + + INIT_LIST_HEAD(&n->sync_create_entry); + INIT_LIST_HEAD(&n->sync_del_entry); + + n->data = (char *)(n+1); + + return n; +} + +/* + * Add new name entry into directory's cache. + */ +static int pohmelfs_add_dir(struct pohmelfs_sb *psb, struct pohmelfs_inode *parent, + struct pohmelfs_inode *npi, struct qstr *str, unsigned int mode, int link) +{ + int err = -ENOMEM; + struct pohmelfs_name *n; + struct pohmelfs_path_entry *e = NULL; + + n = pohmelfs_name_clone(str->len + 1); + if (!n) + goto err_out_exit; + + n->ino = npi->ino; + n->offset = parent->total_len; + n->mode = mode; + n->len = str->len; + n->hash = str->hash; + sprintf(n->data, str->name); + + if (!(str->len == 1 && str->name[0] == '.') && + !(str->len == 2 && str->name[0] == '.' && str->name[1] == '.')) { + mutex_lock(&psb->path_lock); + e = pohmelfs_add_path_entry(psb, parent->ino, npi->ino, str, link, mode); + mutex_unlock(&psb->path_lock); + if (IS_ERR(e)) { + err = PTR_ERR(e); + goto err_out_free; + } + } + + mutex_lock(&parent->offset_lock); + err = pohmelfs_insert_name(parent, n); + mutex_unlock(&parent->offset_lock); + + if (err) { + if (err != -EEXIST) + goto err_out_remove; + kfree(n); + } + + return 0; + +err_out_remove: + if (e) { + mutex_lock(&psb->path_lock); + pohmelfs_remove_path_entry(psb, e); + mutex_unlock(&psb->path_lock); + } +err_out_free: + kfree(n); +err_out_exit: + return err; +} + +/* + * Create new inode for given parameters (name, inode info, parent). + * This does not create object on the server, it will be synced there during writeback. + */ +struct pohmelfs_inode *pohmelfs_new_inode(struct pohmelfs_sb *psb, + struct pohmelfs_inode *parent, struct qstr *str, + struct netfs_inode_info *info, int link) +{ + struct inode *new = NULL; + struct pohmelfs_inode *npi; + int err = -EEXIST; + + dprintk("%s: creating inode: parent: %llu, ino: %llu, str: %p.\n", + __func__, (parent)?parent->ino:0, info->ino, str); + + err = -ENOMEM; + new = iget_locked(psb->sb, info->ino); + if (!new) + goto err_out_exit; + + npi = POHMELFS_I(new); + npi->ino = info->ino; + err = 0; + + if (new->i_state & I_NEW) { + dprintk("%s: filling VFS inode: %lu/%llu.\n", + __func__, new->i_ino, info->ino); + pohmelfs_fill_inode(new, info); + + if (S_ISDIR(info->mode)) { + struct qstr s; + + s.name = "."; + s.len = 1; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, npi, &s, info->mode, 0); + if (err) + goto err_out_put; + + s.name = ".."; + s.len = 2; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, (parent)?parent:npi, &s, + (parent)?parent->vfs_inode.i_mode:npi->vfs_inode.i_mode, 0); + if (err) + goto err_out_put; + } + } + + if (str) { + if (parent) { + err = pohmelfs_add_dir(psb, parent, npi, str, info->mode, link); + + dprintk("%s: %s inserted name: '%s', new_offset: %llu, ino: %llu, parent: %llu.\n", + __func__, (err)?"unsuccessfully":"successfully", + str->name, parent->total_len, info->ino, parent->ino); + + if (err && err != -EEXIST) + goto err_out_put; + } else { + mutex_lock(&psb->path_lock); + pohmelfs_add_path_entry(psb, npi->ino, npi->ino, str, link, info->mode); + mutex_unlock(&psb->path_lock); + } + } + + if (new->i_state & I_NEW) { + if (parent) + mark_inode_dirty(&parent->vfs_inode); + mark_inode_dirty(new); + +#ifdef POHMELFS_CC_GROUP + pohmelfs_meta_command(npi, NETFS_JOIN_GROUP, 0, NULL, NULL, 0); +#endif + } + unlock_new_inode(new); + + return npi; + +err_out_put: + printk("%s: putting inode: %p, npi: %p, error: %d.\n", __func__, new, npi, err); + iput(new); +err_out_exit: + return ERR_PTR(err); +} + +static int pohmelfs_remote_sync_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_inode *pi = private; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + dprintk("%s: ino: %llu, err: %d.\n", __func__, pi->ino, err); + + if (err) + pi->error = err; + wake_up(&psb->wait); + pohmelfs_put_inode(pi); + + return err; +} + +/* + * Receive directory content from the server. + * This should be only done for objects, which were not created locally, + * and which were not synced previously. + */ +static int pohmelfs_sync_remote_dir(struct pohmelfs_inode *pi) +{ + struct inode *inode = &pi->vfs_inode; + struct pohmelfs_sb *psb = POHMELFS_SB(inode->i_sb); + long ret = msecs_to_jiffies(25000); + int err; + + dprintk("%s: dir: %llu, state: %lx: created: %d, remote_synced: %d.\n", + __func__, pi->ino, pi->state, test_bit(NETFS_INODE_CREATED, &pi->state), + test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)); + + if (!test_bit(NETFS_INODE_CREATED, &pi->state)) + return 0; + + if (test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)) + return 0; + + if (!igrab(inode)) { + err = -ENOENT; + goto err_out_exit; + } + + err = pohmelfs_meta_command(pi, NETFS_READDIR, NETFS_TRANS_SINGLE_DST, + pohmelfs_remote_sync_complete, pi, 0); + if (err) + goto err_out_exit; + + pi->error = 0; + ret = wait_event_interruptible_timeout(psb->wait, + test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state) || pi->error, ret); + dprintk("%s: awake dir: %llu, ret: %ld, err: %d.\n", __func__, pi->ino, ret, pi->error); + if (ret <= 0) { + err = -ETIMEDOUT; + goto err_out_exit; + } + + if (pi->error) + return pi->error; + + return 0; + +err_out_exit: + clear_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state); + + return err; +} + +/* + * VFS readdir callback. Syncs directory content from server if needed, + * and provide info to userspace. + */ +static int pohmelfs_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct pohmelfs_inode *pi = POHMELFS_I(inode); + struct pohmelfs_name *n; + int err = 0, mode; + u64 len; + + dprintk("%s: parent: %llu.\n", __func__, pi->ino); + + err = pohmelfs_sync_remote_dir(pi); + if (err) + return err; + + while (1) { + mutex_lock(&pi->offset_lock); + n = pohmelfs_search_offset(pi, file->f_pos); + if (!n) { + mutex_unlock(&pi->offset_lock); + err = 0; + break; + } + + mode = (n->mode >> 12) & 15; + + dprintk("%s: offset: %llu, parent ino: %llu, name: '%s', len: %u, ino: %llu, mode: %o/%o.\n", + __func__, file->f_pos, pi->ino, n->data, n->len, + n->ino, n->mode, mode); + + len = n->len; + err = filldir(dirent, n->data, n->len, file->f_pos, n->ino, mode); + mutex_unlock(&pi->offset_lock); + + if (err < 0) { + dprintk("%s: err: %d.\n", __func__, err); + err = 0; + break; + } + + file->f_pos += len; + } + + return err; +} + +const struct file_operations pohmelfs_dir_fops = { + .read = generic_read_dir, + .readdir = pohmelfs_readdir, +}; + +/* + * Lookup single object on server. + */ +static int pohmelfs_lookup_single(struct pohmelfs_inode *parent, + struct qstr *str, u64 ino) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(parent->vfs_inode.i_sb); + long ret = msecs_to_jiffies(5000); + int err; + + set_bit(NETFS_COMMAND_PENDING, &parent->state); + err = pohmelfs_meta_command_data(parent, NETFS_LOOKUP, + (char *)str->name, NETFS_TRANS_SINGLE_DST, NULL, NULL, ino); + if (err) + goto err_out_exit; + + err = 0; + ret = wait_event_interruptible_timeout(psb->wait, + !test_bit(NETFS_COMMAND_PENDING, &parent->state), ret); + if (ret == 0) + err = -ETIMEDOUT; + else if (signal_pending(current)) + err = -EINTR; + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + clear_bit(NETFS_COMMAND_PENDING, &parent->state); + + printk("%s: failed: parent: %llu, ino: %llu, name: '%s', err: %d.\n", + __func__, parent->ino, ino, str->name, err); + + return err; +} + +/* + * VFS lookup callback. + * We first try to get inode number from local name cache, if we have one, + * then inode can be found in inode cache. If there is no inode or no object in + * local cache, try to lookup it on server. This only should be done for directories, + * which were not created locally, otherwise remote server does not know about dir at all, + * so no need to try to know that. + */ +struct dentry *pohmelfs_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) +{ + struct pohmelfs_inode *parent = POHMELFS_I(dir); + struct pohmelfs_name *n; + struct inode *inode = NULL; + unsigned long ino = 0; + int err; + struct qstr str = dentry->d_name; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash, str.len); + if (n) + ino = n->ino; + mutex_unlock(&parent->offset_lock); + + dprintk("%s: 1 ino: %lu, inode:
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Kamalesh Babulal | [BUILD-FAILURE] 2.6.26-rc8-mm1 - build failure at drivers/char/hvc_rtas.c |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| Christoph Lameter | Network latency regressions from 2.6.22 to 2.6.29 |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
git: | |
