The 2.6.26 will open soon, so it's time to review what my plans are
for the merge window opens.
As usual, patch review by non-me people is always welcome.
Anyway, here are all the pending things that I'm aware of. As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake. Please remind me again
in that case.
Core:
- I did a bunch of cleanups all over drivers/infiniband and the
gcc and sparse warning noise is down to a pretty reasonable level.
Further cleanups welcome of course.
ULPs:
- I merged Eli's IPoIB stateless offload changes for checksum
offload and LSO changes. The interrupt moderation changes are
next, and should not be a problem to merge. Please test IPoIB
on all sorts of hardware!
- Shirley's IPoIB 4 KB MTU changes. I expect these to make it in,
although I would certainly appreciate review from Eli or anyone else.
HW specific:
- Vlad's mlx4 resize CQ support. Looks basically OK, so I think we
should be able to get it in.
- ipath support for 7220 HCAs. I don't expect any issues here once
the patches appear.
Here are a few topics that I believe will not be ready in time for the
2.6.26 window and will need to wait for 2.6.27 at least:
- XRC. I still don't have a good feeling that we have settled on all
the nuances of the ABI we want to expose to userspace for this, and
ideally I would like to understand how ehca LL QPs fit into the
picture as well.
- Remove LLTX from IPoIB. I haven't had time to finish this yet, so
I guess it will probably wait for 2.6.27 now...
- Multiple CQ event vector support. I still haven't seen any
discussions about how ULPs or userspace apps should decide which
vector to use, and hence no progress has been made since we
deferred this during the 2.6.23 merge window.
Here all the patches I already have in my for-2.6.26 branch:
Arthur Jones (4):
IB/ipath: Fix sparse warning about ...I did some prototype for IPoIB to enable multiple CQ event support. I did see the approach improved multiple links aggregation performance. I also see some customers' requirements in userspace. I will start the discussion as soon as possible. But it would most likely miss 2.6.26 window. Thanks Shirley --
What's the status of RDS? Thanks Shirley --
> What's the status of RDS? I've never seen any patches. I guess ask the RDS guys if/when they want to start working on getting RDS merged. - R. --
What is the work we need to do here - I was thinking RDS should just work ? --
> What is the work we need to do here - I was thinking RDS should just work ? Stuff doesn't get merged into the kernel on its own. If you want RDS upstream then the first step is to post patches in a form suitable for reviewing. Then respond to the review comments. The files Documentation/SubmittingPatches and to some extent Documentation/SubmittingDrivers in the kernel source have more info. - R. --
Yes, I see this is for pushing RDS upstream - but what about running RDS as is over IWARP NICs - that should just work right ? --
> Yes, I see this is for pushing RDS upstream - but what about running > RDS as is over IWARP NICs - that should just work right ? No idea. It depends on whether you took into account the differences between IB and iWARP. Anyway that's not really what this thread was about. --
WRT to merging RDS into the kernel - our current plans are to wait to see RDS adopted by more than Oracle - before approaching the kernel community about inclusion of RDS. --
I've seen statements before from someone from Oracle that RDS was only for Oracle's use, for example, that person did not want netperf changed to support RDS. Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems --
I believe there is a patch for NetPerf which supports RDS - although it may need to be updated - and submitted. The only prior discussion I can think of - was whether or not NetPerf exercises RDS as Oracle would. I'm not proposing that we should enhance NetPerf to do that (but that's OK with me). We created a tool rds-stress which does that. --
Rich, On Nov 1, 2007, you wrote this to rds-devel: "Netperf is too simplistic in that all it seems to do is stream data in a simple loop. This is not how Oracle uses the IPC and again does not reflect what it would take to make UDP reliable. For this reason we are not interested in having Netperf support RDS and or seeing Netperf data." I would like to see RDS supported by existing common tools like netperf, iperf, etc. so we can easily compare how RDS performs to UDP for IPC models other than Oracle. Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering --
OK - and the conversation was about using NetPerf to compare performance of RDS to UDP relative to suitability for Oracle use ... so I think those statements still illustrate my points... 1) NetPerf does not do what Oracle does - and hence is not useful from Oracle's perspective in comparing ULPs. 2) For some metrics - it's not valid to compare a non-reliable IPC to a reliable IPC - it's not an apples to apples comparison. Especially when the app is considered and what the app must do to use UDP vs RDS. I did not say that NetPerf should not be extended to support RDS - just that using it to do a comparison of ULPs to determine how well Oracle would run - is not what we (Oracle) would want - at least that was my intention.. --
I'd like to see netperf comparisions of UDP_STREAM/UDP_RR vs RDS_STREAM/RDS_RR, does anyone have a patch that will apply cleanly to a recent netperf? Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering --
We want to add send with invalidate & mask compare and swap. Eli will be able to send the patches next week and since they are small What about the split CQ for UD mode? It's improved the IPoIB performance mlx4- we plan to send patches for the low level driver only to enable mlx4_en. These only affect our low level driver. I think we should try to push for XEC in 2.6.26 since there are already MPI implementation that use it and this ties them to use OFED only. Also this feature is stable and now being defined in IBTA Not taking it causing changes between OFED and the kernel and your libibverbs and we wish to avoid such gaps. Is there any thing we can do to help and make it into 2.6.26? --
> We want to add send with invalidate & mask compare and swap. > Eli will be able to send the patches next week and since they are > small I think they can be in for 2.6.26 Send with invalidate should be OK. Let's see about the masked atomics stuff -- we have a ton of new verbs and I think we might want to slow down and make sure it all makes sense. > What about the split CQ for UD mode? It's improved the IPoIB > performance for small messages significantly. Oh yeah... I'll try to get that in too. > mlx4- we plan to send patches for the low level driver only to enable > mlx4_en. These only affect our low level driver. No problem in principle, let's see the actual patches. > I think we should try to push for XEC in 2.6.26 since there are > already MPI implementation that use it and this ties them to use OFED > only. > Also this feature is stable and now being defined in IBTA > Not taking it causing changes between OFED and the kernel and your > libibverbs and we wish to avoid such gaps. > Is there any thing we can do to help and make it into 2.6.26? I don't have a good feeling that the user-kernel interface is well thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs and make sure we have something that makes sense for the future. - R. --
I see - but can't we figure this all for the 2.6.26 window? Tziporet --
> We want to add send with invalidate & mask compare and swap. > Eli will be able to send the patches next week and since they are > small I think they can be in for 2.6.26 We are very interested in these new operations and are moving in the direction of tightly integrating RDMA along with atomics (if available) into Oracle. We plan on testing some early prototypes of the these in the few months. Send with invalidate is an exact match for our current RDS V3 rdma driver - and should be more efficient than the current background syncing of the tpt to ensure keys are invalidated. We intend on exposing the atomics via the RDS driver along with simple low level rdma operations to Oracle's internal clients. If Oracle is running over a transport which exports atomics and rdma - Oracle will see a dramatic performance boost for several database operations. --
> We are very interested in these new operations and are moving in the > direction of tightly integrating RDMA along with atomics (if > available) into Oracle. We plan on testing some early prototypes of > the these in the few months. And you need the ConnectX-only masked atomics? Or do the standard IB atomic operations work for you? Of course using atomics at all means that things don't work on iWARP. > Send with invalidate is an exact match for our current RDS V3 rdma > driver - and should be more efficient than the current background > syncing of the tpt to ensure keys are invalidated. How does send with invalidate interact with the current IB FMR stuff? Seems that you would run into trouble keeping the state of the FMR straight if the remote side is invalidating them. Also I would think that send-with-invalidate would be much more expensive than the current FMR method of batching up the invalidates, since you don't get to amortize the cost of syncing up all the internal HCA state. - R. --
We specifically asked for the masked operations. Yes, this means Oracle will not get the performance boost of atomics on IWARP - but we still get rdma - and that's a real win / benefit for The model we implement is based on "use once" keys - we issue the key to the rdma server and want to toss it as soon as the rdma is complete. Today, we explicitly free the key after the rdma completes and we get a message from the rdma server - saying rdma is complete. If the key is auto invalidated by the recv'ing HCA then we do not need to do it in the driver... which also meanswe do not need to issue the sync tpts to force the HCA to be update its cache. This is the one piece we do not know - our plans are to test this and see where the trade offs are. We will keep the current design / --
On Wed, Apr 2, 2008 at 3:31 PM, Tziporet Koren Does send with invalidate applies to rkeys generated through the proprietary FMR API? if not, what usage you envision to the new verb under nowadays IB devices? Or. --
