Re: Socket Direct Protocol: help (2)

Previous thread: hdlc_ppp: why no detach()? by Michael Barkowski on Monday, April 12, 2010 - 7:15 am. (4 messages)

Next thread: Your mailbox has exceeded one or more size limits by Liz Polakoff on Monday, April 12, 2010 - 8:05 am. (1 message)
From: Andrea Gozzelino
Date: Monday, April 12, 2010 - 7:33 am

On Apr 12, 2010 10:14 AM, Andrea Gozzelino

Hi all,

I add that in kernel space SDP debug the error is:

command line: dmesg
sdp_init_qp:95 sdp_sock( 2100:2 40720:0): recv sge's. capability: 4
needed: 9
sdp_init_qp:95 sdp_sock( 2100:2 41203:0): recv sge's. capability: 4
needed: 9

The structure sdp_init_qp() is defined in
/usr/src/ofa_kernel-1.5.1/drivers/infiniband/ulp/sdp/sdp_cma.c (lines 76
- 141).

Could be a firmware problem?
I have this situation:
command line: ethtool -i eth2
driver: iw_nes
version: 1.5.0.0
firmware-version: 3.16
bus-info: 0000:03:00.0

Thank you very much,
Andrea
Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it			

--

From: Tung, Chien Tin
Date: Tuesday, April 13, 2010 - 9:28 am

NE020 supports 4 SGEs.  I don't know enough about SDP to know why it is
using this calculation for # of send_sge:


This is not a firmware problem.

Chien


--

From: Steve Wise
Date: Tuesday, April 13, 2010 - 9:39 am

Chien, does the NE020 support FMRs?  I looked at the nes ofed-1.5 code 
and it appears to do nothing in the map_phys_fmr functions.




--

From: Tung, Chien Tin
Date: Tuesday, April 13, 2010 - 1:02 pm

We never implemented map_phys_fmr.  Is it relevant to the # of SGEs?

Chien

--

From: Steve Wise
Date: Tuesday, April 13, 2010 - 1:20 pm

No, but SDP uses FMRs.  I don't think it will run without FMR support.


--

From: Tung, Chien Tin
Date: Tuesday, April 13, 2010 - 1:22 pm

Good to know.  Thanks.

Chien


--

From: Andrea Gozzelino
Date: Wednesday, April 14, 2010 - 1:51 am

On Apr 13, 2010 10:22 PM, "Tung, Chien Tin" <chien.tin.tung@intel.com>

Hi Steve and Chien,

I understand that NE020 cards have problem with SDP connected with
map_phy_fmr (FMR stands for Fast Memory Region).
Is it possible to solve/fix this point?
If yes, have you an idea about the time that is necessary to code
development/build?
If no, can you suggest me a card that supports SDP protocol?

I work on NE020 cards from February 2010 for an INFN experimental
proposal, called REDIGO (Read out at 10 Gbits/s), about the data
acquisition and movement systems. The covergence of storage protocols
around 10 Gigabits/s Ethernet protocols shows that one way could be the
Remote Direct Memory Access (RDMA). The goals are the investigations of
latency time, the throughput, the buffer size schemes and finally the
global event building bandwidth.

Do you know if NE020 cards have problems with librdma (RDMA procedures,
in general) and / or with MPI versions?

Thank you very much,
Andrea

Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it			

--

From: Amir Vadai
Date: Wednesday, April 14, 2010 - 7:31 am

Hi,

FMR are being used only in a special mode called ZCopy.

You could disable this mode by setting the module paramter
sdp_zcopy_thresh to 0, or by issuing:
# echo 0 > /sys/module/ib_sdp/parameters/sdp_zcopy_thresh

This means that you won't get the benefits of Zero-copy.

- Amir


--

From: Amir Vadai
Date: Wednesday, April 14, 2010 - 7:46 am

One more thing - Please open a bug regarding the num_sge limitation at:
https://bugs.openfabrics.org/

Thanks,
Amir


--

From: Tung, Chien Tin
Date: Wednesday, April 14, 2010 - 8:01 am

Done, Bug 2027.

Chien
--

From: Steve Wise
Date: Wednesday, April 14, 2010 - 8:05 am

And 2028 opened to request fastreg support.

Steve.
--

From: Andrea Gozzelino
Date: Wednesday, April 14, 2010 - 8:14 am

Thank you very much.
I check the status of 2027 and 2028 bugs.

Andrea




Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it			

--

From: Tung, Chien Tin
Date: Wednesday, April 14, 2010 - 11:48 am

I am open to test fixes for these two bugs.

Chien
--

From: Amir Vadai
Date: Wednesday, April 14, 2010 - 11:24 pm

I hope to have a fix next week for the first one.

Thanks,
Amir

--

From: Andrea Gozzelino
Date: Thursday, April 15, 2010 - 12:07 am

Hi Amir, 
Hi Chien,

I understand that the bug 2027 could be solved next week, so I will test
SDP protocol performance on NE020 cards.
Is it correct? 
If yes, could you point out the code modifies?

Keep in touch and take care.
Regards,
Andrea


Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it			

--

From: Amir Vadai
Date: Thursday, April 15, 2010 - 1:38 am

It should be a simple fix and I plan to do soon - just add yourself as
CC in bugzilla  - that way I won't forget to notify you.

- amir

--

From: Andrea Gozzelino
Date: Tuesday, April 20, 2010 - 6:53 am

Hi Amir,

have you any news about bugs 2027 "SDP not respecting # SGEs as reported
from HW" and 2028 "SDP should support fastreg mrs"?

When those bugs will be fixed, I will test the NE020 cards performance
with SDP protocol and I will compare SDP and TCP.

Keep in touch,

Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it









		

--

From: Amir Vadai
Date: Wednesday, April 21, 2010 - 5:01 am

Hi Andrea,

I am preparing the fix right now.

- Amir


--

From: Andrea Gozzelino
Date: Friday, April 23, 2010 - 7:35 am

Hi Amir, 

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea





--

From: Andrea Gozzelino
Date: Friday, April 23, 2010 - 7:34 am

Hi Amir, 

have you any news about bug 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea





--

From: Amir Vadai
Date: Sunday, April 25, 2010 - 12:35 am

I have a fix for 2027 (number of SGE's issue) read.
I'm busy with other things - but hopefully will succeed testing it and
pushing it today.

As to 2028 (No FMR support) - I need pushed a fix to only disable ZCopy
when no FMR facility.
I don't have time right now to add support for fast memory registeration
when FMR is not available - but Steve said he'll help with implementing it.

- Amir

--

From: Andrea Gozzelino
Date: Friday, April 23, 2010 - 7:35 am

Hi Amir, 

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea





Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino@lnl.infn.it			

--

From: Steve Wise
Date: Wednesday, April 14, 2010 - 7:54 am

Hey Amir,

I don't think this helps because sdp_add_device() will not add rdma 
devices that fail to create fmr pools. 

So I guess you could key off of fmr pool failures and set 
sdp_zcopy_thresh to 0 and allow the device to be used?

But what we really need is sdp support for fastreg_mrs  as an 
alternative to fmrs.


Steve.



--

From: Amir Vadai
Date: Wednesday, April 14, 2010 - 8:03 am

You are right - I missed it.

Andrea, Please open a bug at bugzilla (https://bugs.openfabrics.org) -
so that you will be notified as soon as I will fix SDP not use FMR if
not supported.

As to fastreg_mrs support - I don't know this mechanism. Do you mean FRWR?

Thanks,
Amir


--

From: Steve Wise
Date: Wednesday, April 14, 2010 - 8:08 am

ib_alloc_fast_reg_mr(), ib_alloc_fast_reg_page_list() and friends, plus 

--

From: Amir Vadai
Date: Wednesday, April 14, 2010 - 8:16 am

ok - actually I used it in an early version of SDP before changing to FMR...

- Amir

--

From: Tung, Chien Tin
Date: Wednesday, April 14, 2010 - 12:24 pm

If you run into any issues, please email me.  NE020 supports Mvapich,
Mvapich2, OpenMPI, Intel MPI and HP MPI.

Chien
--

Previous thread: hdlc_ppp: why no detach()? by Michael Barkowski on Monday, April 12, 2010 - 7:15 am. (4 messages)

Next thread: Your mailbox has exceeded one or more size limits by Liz Polakoff on Monday, April 12, 2010 - 8:05 am. (1 message)