On Tue, 03 Jun 2008 11:53:42 -0500
Tom Tucker <tom@opengridcomputing.com> wrote:
I confess I didn't think hard about the RDMA case here (and haven't
been paying as much attention as I probably should to the design of
it). So take my thoughts with a large chunk of salt...
On a NUMA box, the pages have to live _somewhere_ and some CPUs will be
closer to them than others. If we're concerned about making sure that
the post-RDMA_READ processing is done on a CPU close to the memory,
then we don't have much choice but to try to make sure that this
processing is only done on CPUs that are close to that memory.
Assuming that this post-processing is done by nfsd, I suppose we'd need
to tag the post-RDMA_READ RPC with a poolid or something and make sure
that only nfsds running on CPUs close to the memory pick it up. Perhaps
there could be a per-pool queue for these RPC's or something...
Either way, the big question is whether that will be a net win or loss
for throughput. i.e. are we better off waiting for the right nfsd to
become available or allowing the first nfsd that becomes available to
make the crosscalls needed to do the RPC? It's hard to say...
In the near term, I doubt this patchset will harm the RDMA case. After
all, the distribution of memory allocations is pretty lumpy now. On
a NUMA box with RDMA you're probably doing a lot of crosscalls with
the current code.
--
Jeff Layton <jlayton@redhat.com>
--