On Apr 12, 2010 10:14 AM, Andrea Gozzelino Hi all, I add that in kernel space SDP debug the error is: command line: dmesg sdp_init_qp:95 sdp_sock( 2100:2 40720:0): recv sge's. capability: 4 needed: 9 sdp_init_qp:95 sdp_sock( 2100:2 41203:0): recv sge's. capability: 4 needed: 9 The structure sdp_init_qp() is defined in /usr/src/ofa_kernel-1.5.1/drivers/infiniband/ulp/sdp/sdp_cma.c (lines 76 - 141). Could be a firmware problem? I have this situation: command line: ethtool -i eth2 driver: iw_nes version: 1.5.0.0 firmware-version: 3.16 bus-info: 0000:03:00.0 Thank you very much, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
NE020 supports 4 SGEs. I don't know enough about SDP to know why it is using this calculation for # of send_sge: This is not a firmware problem. Chien --
Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5 code and it appears to do nothing in the map_phys_fmr functions. --
We never implemented map_phys_fmr. Is it relevant to the # of SGEs? Chien --
No, but SDP uses FMRs. I don't think it will run without FMR support. --
Good to know. Thanks. Chien --
On Apr 13, 2010 10:22 PM, "Tung, Chien Tin" <chien.tin.tung@intel.com> Hi Steve and Chien, I understand that NE020 cards have problem with SDP connected with map_phy_fmr (FMR stands for Fast Memory Region). Is it possible to solve/fix this point? If yes, have you an idea about the time that is necessary to code development/build? If no, can you suggest me a card that supports SDP protocol? I work on NE020 cards from February 2010 for an INFN experimental proposal, called REDIGO (Read out at 10 Gbits/s), about the data acquisition and movement systems. The covergence of storage protocols around 10 Gigabits/s Ethernet protocols shows that one way could be the Remote Direct Memory Access (RDMA). The goals are the investigations of latency time, the throughput, the buffer size schemes and finally the global event building bandwidth. Do you know if NE020 cards have problems with librdma (RDMA procedures, in general) and / or with MPI versions? Thank you very much, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
Hi, FMR are being used only in a special mode called ZCopy. You could disable this mode by setting the module paramter sdp_zcopy_thresh to 0, or by issuing: # echo 0 > /sys/module/ib_sdp/parameters/sdp_zcopy_thresh This means that you won't get the benefits of Zero-copy. - Amir --
One more thing - Please open a bug regarding the num_sge limitation at: https://bugs.openfabrics.org/ Thanks, Amir --
Done, Bug 2027. Chien --
And 2028 opened to request fastreg support. Steve. --
Thank you very much. I check the status of 2027 and 2028 bugs. Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
I am open to test fixes for these two bugs. Chien --
I hope to have a fix next week for the first one. Thanks, Amir --
Hi Amir, Hi Chien, I understand that the bug 2027 could be solved next week, so I will test SDP protocol performance on NE020 cards. Is it correct? If yes, could you point out the code modifies? Keep in touch and take care. Regards, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
It should be a simple fix and I plan to do soon - just add yourself as CC in bugzilla - that way I won't forget to notify you. - amir --
Hi Amir, have you any news about bugs 2027 "SDP not respecting # SGEs as reported from HW" and 2028 "SDP should support fastreg mrs"? When those bugs will be fixed, I will test the NE020 cards performance with SDP protocol and I will compare SDP and TCP. Keep in touch, Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
Hi Andrea, I am preparing the fix right now. - Amir --
Hi Amir, have you any news about bugs 2027 and 2028 (SDP)? Take care and keep in touch. Thamk you very much, Andrea --
Hi Amir, have you any news about bug 2027 and 2028 (SDP)? Take care and keep in touch. Thamk you very much, Andrea --
I have a fix for 2027 (number of SGE's issue) read. I'm busy with other things - but hopefully will succeed testing it and pushing it today. As to 2028 (No FMR support) - I need pushed a fix to only disable ZCopy when no FMR facility. I don't have time right now to add support for fast memory registeration when FMR is not available - but Steve said he'll help with implementing it. - Amir --
Hi Amir, have you any news about bugs 2027 and 2028 (SDP)? Take care and keep in touch. Thamk you very much, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 I-35020 - Legnaro (PD)- ITALIA Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino@lnl.infn.it --
Hey Amir, I don't think this helps because sdp_add_device() will not add rdma devices that fail to create fmr pools. So I guess you could key off of fmr pool failures and set sdp_zcopy_thresh to 0 and allow the device to be used? But what we really need is sdp support for fastreg_mrs as an alternative to fmrs. Steve. --
You are right - I missed it. Andrea, Please open a bug at bugzilla (https://bugs.openfabrics.org) - so that you will be notified as soon as I will fix SDP not use FMR if not supported. As to fastreg_mrs support - I don't know this mechanism. Do you mean FRWR? Thanks, Amir --
ib_alloc_fast_reg_mr(), ib_alloc_fast_reg_page_list() and friends, plus --
ok - actually I used it in an early version of SDP before changing to FMR... - Amir --
If you run into any issues, please email me. NE020 supports Mvapich, Mvapich2, OpenMPI, Intel MPI and HP MPI. Chien --
