Re: [ANNOUNCE] Ceph distributed file system

Previous thread: [PATCH 1/2] Add FIEMAP header file by Kalpak Shah on Monday, November 12, 2007 - 2:00 pm. (2 messages)

Next thread: [GIT PULL -mm] 0/9 Unionfs updates/cleanups/fixes by Erez Zadok on Tuesday, November 13, 2007 - 3:10 am. (10 messages)
From: Sage Weil
Date: Monday, November 12, 2007 - 6:51 pm

Hi everyone,

Ceph is a distributed network file system designed to provide excellent 
performance, reliability, and scalability with POSIX semantics.  I 
periodically see frustration on this list with the lack of a scalable GPL 
distributed file system with sufficiently robust replication and failure 
recovery to run on commodity hardware, and would like to think that--with 
a little love--Ceph could fill that gap.  Basic features include:

 * POSIX semantics.
 * Seamless scaling from a few nodes to many thousands.
 * Gigabytes to petabytes.
 * High availability and reliability.  No single points of failure.
 * N-way replication of all data across multiple nodes.
 * Automatic rebalancing of data on node addition/removal to efficiently 
   utilize device resources.
 * Easy deployment; most FS components are userspace daemons.
 * Fuse-based client.
 - Lightweight kernel client (in progress).

 - Flexible snapshots on arbitrary subdirectories (soon)
 - Quotas (soon)
 - Strong security (planned)

(* = current features, - = coming soon)

In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely on 
symmetric access by all clients to shared block devices, Ceph separates 
data and metadata management into independent server clusters, similar to 
Lustre.  Unlike Lustre, however, metadata and storage nodes run entirely 
in userspace and require no special kernel support.  Storage nodes utilize 
either a raw block device or large image file to store data objects, or 
can utilize an existing file system (XFS, etc.) for local object storage 
(currently with weakened safety semantics).  File data is striped across 
storage nodes in large chunks to distribute workload and facilitate high 
throughputs.  When storage nodes fail, data is re-replicated in a 
distributed fashion by the storage nodes themselves (with some 
coordination from a cluster monitor), making the system extremely 
efficient and scalable.  Currently only n-way replication is supported, 
although initial ...
From: Bryan Henderson
Date: Tuesday, November 13, 2007 - 10:44 am

>Ceph is a distributed network file system ...

What's distributed about it?  As described, it sounds highly centralized. 
Do you mean it is distributed across the nodes of the central server 
clusters?  Or just that it's a shared filesystem (multiple clients can use 
it at the same time)?

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

-

From: Sage Weil
Date: Tuesday, November 13, 2007 - 11:50 am

Both, actually, although there isn't necessarily a strong distinction 
between the 'central server cluster' and client nodes.  For example, 
storage nodes can double as client nodes (as in a cluster doing 
distributed computation or some such thing).

The system is distributed in the sense that there is no central server 
limiting scalability.  Data storage is distributed across a large cluster 
of storage nodes (bricks or OSDs), and the metadata (namespace) is managed 
by a smaller cluster of metadata servers, allowing shared access to the 
filesystem by many many clients.  Aggregate I/O throughput and capacity 
can be scaled more or less independently from metadata throughput 
(namespace manipulation) by adjusting the number of nodes devoted to each.

sage
-

Previous thread: [PATCH 1/2] Add FIEMAP header file by Kalpak Shah on Monday, November 12, 2007 - 2:00 pm. (2 messages)

Next thread: [GIT PULL -mm] 0/9 Unionfs updates/cleanups/fixes by Erez Zadok on Tuesday, November 13, 2007 - 3:10 am. (10 messages)