Re: [Fwd: Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition]]

From: Celine Bourde
Date: Thursday, April 23, 2009 - 1:10 am

Hi,

I've updated nfs-utils package:
[root@my_host ~]# mount.nfs -V

The problem is exactly the same without rdma:

[root@my_host ~]# strace  mount.nfs 192.168.0.215:/vol0 /mnt/ -o rw,port=2050
[..]
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET, sin_port=htons(997), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
fcntl(3, F_SETFL, O_RDWR)               = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0o\177\0\0\1\0\0\0\0\0\0\0\0"}, 16) = 0
sendto(3, "\0308\310\272\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 40, 0, {sa_family=AF_INET, sin_port=htons(997), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\0308\310\272\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 400, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(997), sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
close(3)                                = 0
mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, "port=2050,addr=192.168.0.215"

.. and it blocks !

[root@my_host ~]# dmesg
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16

I can't etablish rdma connection, following errors occur with rping :

[root@my_host ~]# rping -c 192.168.0.214
cq completion failed status 5
wait for CONNECTED state 10
connect error -1
cma event RDMA_CM_EVENT_REJECTED, error 8

My kernel is a 2.6.27 kernel.org build on a Red Hat Distribution : Red Hat Enterprise Linux Server release 5.3 Beta (Tikanga).


Céline.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To ...
From: Celine Bourde
Date: Thursday, April 23, 2009 - 3:48 am

There was a mistake in my last email.
My last nfs mount test with no rdma was not correct, I've retried with

[root@my_host ] #mount -o rw 192.168.0.215:/vol0 /mnt/

My nfs partition is mounted and everything is correct, which confirms that 
the problem comes from rdma connection manager.

Céline.



_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Steve Wise
Date: Thursday, April 23, 2009 - 7:14 am

Can you run rping or one of the perf programs over the IB link?  This 
will confirm that your IB setup works.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Steve Wise
Date: Thursday, April 23, 2009 - 7:11 am

You cannot use port 2050 for tcp mounts.  So remove the 'port=2050' and 
it will attempt a tcp mount to port 2049.

Steve.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Celine Bourde
Date: Friday, April 24, 2009 - 4:13 am

Hi Steve,

This email summarizes the situation:

Standard mount -> OK
---------------------

[root@twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
Command works fine.

rdma mount -> KO
-----------------

[root@twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
Command blocks ! I should perform Ctr+C to kill process.

or

[root@twind ofa_kernel-1.4.1]# strace mount.nfs 192.168.0.215:/vol0 /mnt/ -o rdma,port=2050
[..]
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
fcntl(3, F_SETFL, O_RDWR)               = 0
sendto(3, "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 40, 0, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
close(3)                                = 0
mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, "rdma,port=2050,addr=192.168.0.215"
..same problem

[root@twind tmp]# dmesg
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)


Rdma cm tests
-------------

* With ib_rdma_bw tool :

[root@twing ~]# ib_rdma_bw -c
4960: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 |
4960: Local address:  LID 0000, QPN 000000, PSN 0x24cafe RKey 0x18002400 VAddr ...
From: Steve Wise
Date: Friday, April 24, 2009 - 7:40 am

Hey Celine,

Thanks for gathering all this info!  So the rdma connections work fine 
with everything _but_ nfsrdma.  And errno 103 indicates the connection 
was aborted, maybe by the server (since no failures are logged by the 
client).


More below:



Is there anything logged on the server side?

Also, can you try this again, but on both systems do this before 
attempting the mount:

echo 32768 > /proc/sys/sunrpc/rpc_debug

This will enable all the rpc trace points and add a bunch of logging to 
/var/log/messages. 

Maybe that will show us something.  It think the server is aborting the 
connection for some reason. 


Steve.




_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Vu Pham
Date: Friday, April 24, 2009 - 10:54 am

Hi Celine,

What HCA do you have on your system? Is it ConnectX? If yes, what is its 
firmware version?


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Vu Pham
Date: Friday, April 24, 2009 - 11:40 am

Celine,

I'm seeing mlx4 in the log so it is connectX.

nfsrdma does not work with any official connectX' fw release 2.6.0 
because of fast registering work request problems between nfsrdma and 
the firmware.

We are currently debugging/fixing those problems.

Do you have direct contact with Mellanox field application engineer? 
Please contact him/her.
If not I can send you a contact on private channel.

thanks,

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Celine Bourde
Date: Monday, April 27, 2009 - 3:56 am

Thanks for the explanation.
Let me know if you have additional information.

We have a contact at Mellanox. I will contact him.

Thanks,

Céline.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Tom Talpey
Date: Monday, April 27, 2009 - 5:47 am

There is a very simple workaround if you don't have the latest mlx4 firmware.

Just set the client to use the all-physical memory registration mode. This will
avoid making unsupported reregistration requests, which the firmware advertised.

Before mounting, enter (as root)

	sysctl -w sunrpc.rdma_memreg_strategy = 6

The client should work properly after this.

If you do have access to the fixed firmware, I recommend using the default
setting (5) as it provides greater safety on the client.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Celine Bourde
Date: Monday, April 27, 2009 - 7:05 am

We have still the same problem, even changing the registration method.

mount doesn't reply and this is the output of dmesg on client:

rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0001, status -22
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)

I have still another doubt: if the firmware is the problem, why is NFS 
RDMA working with a kernel 2.6.27.10 and without OFED 1.4 with these 
same cards??

Thanks,

Céline Bourde. 



_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: jeffrey Lang
Date: Monday, April 27, 2009 - 7:46 am

--------------090605030607060808090506
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 8bit

I recently was having the "ib0: multicast join failed" issue.   Once i 
upgraded the firmware in my switch everything started working again.

I would give the firmware upgrade a try.

jeff


> C
From: Tom Talpey
Date: Monday, April 27, 2009 - 7:50 am

I need to see the log on the server. Errno 103 is ECONNABORTED which means

There were a number of changes in the 2.6.28 cycle, especially on the
server. So it's quite possible that 2.6.27, without the changes, would behave
differently. Have you tried this with 2.6.29, or with different cards?


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Vu Pham
Date: Monday, April 27, 2009 - 10:33 am

On 2.6.27.10 nfsrdma does not use fast registration work request; 
therefore, it works well with connectX

 From 2.6.28 and so on, nfsrdma start implementing/using fast 
registration work request and commit without verifying it with connectX

I'm looking and trying to resolve those glitches/issues now


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Diego Moreno
Date: Tuesday, April 28, 2009 - 5:45 am

Hi,

I'm working with Celine trying to make NFS RDMA work. We installed a new 
  firmware (2.6.636). We still have the problem but now we have more 
information on client side.

- With the workaround (memreg 6) we can mount without any problem. We 
can read a file but if we try to create a file with dd, application 
hangs and then we have to do 'umount -f'. There is no message on server. 
Message on client:

rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)


- With fast registration:

There is no message on server. dmesg client output with fast registration:


rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
ird 16
------------[ cut here ]------------
WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3c/0x92()
Modules linked in: xprtrdma autofs4 hidp nfs lockd nfs_acl rfcomm l2cap 
bluetooth sunrpc iptable_filter ip_tables ip6t_REJECT xt_tcpudp 
ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq 
freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa 
ipv6 ib_uverbs ib_umad iw_nes ib_ipath ib_mthca dm_multipath scsi_dh 
raid0 sbs sbshc battery acpi_memhotplug ac parport_pc lp parport mlx4_ib 
ib_mad ib_core e1000e sr_mod joydev cdrom mlx4_core i5000_edac edac_core 
shpchp rtc_cmos sg pcspkr rtc_core rtc_lib i2c_i801 i2c_core serio_raw 
button dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix 
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last 
unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.27_ofa_compil #2

Call Trace:
  <IRQ>  [<ffffffff80235b8d>] warn_on_slowpath+0x51/0x77
  [<ffffffff80229b79>] __wake_up+0x38/0x4f
  [<ffffffff80246d57>] __wake_up_bit+0x28/0x2d
  [<ffffffffa05485af>] rpc_wake_up_task_queue_locked+0x223/0x24b [sunrpc]
  ...
From: Vu Pham
Date: Monday, April 27, 2009 - 10:30 am

This work around only work for client side. On the server side we don't 
have option to switch the memory option

-vu
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Joe Landman
Date: Wednesday, May 6, 2009 - 3:26 pm

I am seeing this also on a server with ConnectX and a client with mthca.

My mount hangs:

  /sbin/mount.nfs 10.1.1.2:/data /data -o rdma,intr,port=2050
^C

Leaving this in the logs:

May  6 18:14:03 dv3 kernel: [ 9997.015209] rpcrdma: connection to 
10.1.1.2:2050 on mthca0, memreg 6 slots 32 ird 4
May  6 18:14:03 dv3 kernel: [ 9997.015582] rpcrdma: connection to 
10.1.1.2:2050 closed (-103)

rdma seems to work

root@dv3:~# ib_rdma_bw -b -i 2
6222: | port=18515 | ib_port=2 | size=65536 | tx_depth=100 | iters=1000 
| duplex=1 | cma=0 |
6222: Local address:  ...
6222: Remote address: ...

6222: Bandwidth peak (#0 to #245): 1765.83 MB/sec
6222: Bandwidth average: 1724.45 MB/sec
6222: Service Demand peak (#0 to #245): 884 cycles/KB
6222: Service Demand Avg  : 906 cycles/KB

root@dv3:~# showmount -e 10.1.1.2
Export list for 10.1.1.2:
/data *

On the server side, I see

May  6 14:07:53 jr4 mountd[5673]: authenticated mount request from 
10.1.1.1:940 for /data (/data)

On server for rping
[
root@jr4 ~]# rping -s
cq completion failed status 4
wait for RDMA_READ_COMPLETE state 10

on the client side for rping

root@dv3:~# rping -S 100 -d -v -c -a 10.1.1.2
verbose
client
created cm_id 0x606690
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x606690 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x606690 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x608be0
created channel 0x6068c0
created cq 0x608c30
created qp 0x608d50
rping_setup_buffers called on cb 0x605010
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x606690 (parent)
ESTABLISHED
rmda_connect successful
RDMA addr 60a8d0 rkey 116003d len 100
send completion
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x606690 (parent)
client DISCONNECT EVENT...
wait for RDMA_WRITE_ADV state 6
cq completion failed status 5
rping_free_buffers called on cb 0x605010
destroy cm_id 0x606690

Any hints on the 103 ...