login
Header Space

 
 

Re: Distributed storage. Move away from char device ioctls.

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Robin Humble <rjh@...>
Cc: Jeff Garzik <jeff@...>, Evgeniy Polyakov <johnpol@...>, <netdev@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Saturday, September 15, 2007 - 1:51 pm

On Sep 15, 2007  12:20 -0400, Robin Humble wrote:

I have to agree - while Lustre CAN scale up to huge servers and fat pipes,
it can definitely also scale down (which is a LOT easier to do :-).  I can
run a client + MDS + 5 OSTs in a single UML instance using loop devices
for testing w/o problems.


That is definitely true, and there are a number of users who run in
this mode.  We're also working to make Lustre handle the replication
internally (RAID5/6+ at the OST level) so you wouldn't need any kind of
block-level redundancy at all.  I suspect some sites may still use RAID5/6
back-ends anyways to avoid performance loss from taking out a whole OST
due to a single disk failure, but that would definitely not be required.


It's definitely true, and we are always working at improving it.  It
used to be in the past that one of the reasons we DIDN'T want to go
into mainline was because this would restrict our ability to make
network protocol changes.  Because our install base is large enough
and many of the large sites with mutliple supercomputers mounting
multiple global filesystems we aren't at liberty to change the network
protocol at will anymore.  That said, we also have network protocol
versioning that is akin to the ext3 COMPAT/INCOMPAT feature flags, so
we are able to add/change features without breaking old clients


That's partly true - Lustre has its own RDMA RPC mechanism, but it does
not need kernel patches anymore (we removed the zero-copy callback and
do this at the protocol level because there was too much resistance to it).
We are now also able to run a client filesystem that doesn't require any
kernel patches, since we've given up on trying to get the intents and
raw operations into the VFS, and have worked out other ways to improve
the performance to compensate.  Likewise with parallel directory operations.

It's a bit sad, in a way, because these are features that other filesystems
(especially network fs) could have benefitted from also.


This is also true - when that is done the only parts that will remain
in the kernel are the network drivers.  With some network stacks there
is even direct userspace acceleration.  We'll use RDMA and direct IO to
avoid doing any user<->kernel data copies.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Distributed storage. Move away from char device ioctls., Evgeniy Polyakov, (Fri Sep 14, 2:54 pm)
Re: Distributed storage. Move away from char device ioctls., Andreas Dilger, (Sat Sep 15, 1:51 pm)
Re: Distributed storage. Move away from char device ioctls., Evgeniy Polyakov, (Sat Sep 15, 8:29 am)
Re: Distributed storage. Move away from char device ioctls., Evgeniy Polyakov, (Sun Sep 16, 9:43 am)
Re: Distributed storage. Move away from char device ioctls., Evgeniy Polyakov, (Fri Oct 26, 6:44 am)
Re: Distributed storage. Move away from char device ioctls., Evgeniy Polyakov, (Sat Sep 15, 8:34 am)
Re: Distributed storage. Move away from char device ioctls., J. Bruce Fields, (Fri Sep 14, 5:12 pm)
Re: Distributed storage. Move away from char device ioctls., J. Bruce Fields, (Fri Sep 14, 5:18 pm)
Re: Distributed storage. Move away from char device ioctls., J. Bruce Fields, (Fri Sep 14, 6:42 pm)
Re: Distributed storage. Move away from char device ioctls., J. Bruce Fields, (Sat Sep 15, 12:40 am)
speck-geostationary