openfabrics-general mailing list

FromSubjectsort iconDate
Roland Dreier
[ofa-general] Updated InfiniBand/RDMA merge plans for 2.6.24
Since 2.6.23 still isn't out, and I've managed to reduce my patch review backlog a bit, it's probably a good idea to give another update about what I have queued for 2.6.24 already and what I hope to get to before the merge window opens. Core: - My user_mad P_Key index support patch. Merged this, although I still owe Sasha a patch to update libraries to use this. - A fix to the user_mad 32-bit big-endian userspace 64/32 problem with the method_mask when registering agents. Merged. ...
Oct 5, 7:18 pm 2007
Roland Dreier
[ofa-general] Re: [PATCH 3 of 3 for-2.6.24] mlx4: implement ...
Thanks, I applied cleaned-up versions of all three patches for 2.6.24. One thing I changed was to just pass an error back to the caller rather than doing BUG_ON() anywhere. It's very unfriendly to the user to crash the whole machine just because of a driver bug -- much better to try and continue so that the user sees the error and can report it. _______________________________________________ general mailing list general@lists.openfabrics.org [ message continues ]
" title="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/...">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/...
Oct 5, 6:56 pm 2007
akepner
[ofa-general] mpi failures on large ia64/ofed/IB clusters
On "large" IB-connected ia64 clusters, I (and some customers) are seeing failures in MPI programs. This is commoner the bigger the cluster nodes are, but I've seen it with as few as 32P/node. I'm using "Mellanox Technologies MT23108 InfiniHost (rev a1)" HCAs, with firmware version 3.5.0 (but this has been seen with several firmware revisions) and OFED-1.2. For example, with 2-128P systems connected via a single IB port, using this simple MPI program: int main(int argc, char **argv) { ...
Oct 5, 6:36 pm 2007
Roland Dreier
Re: [ofa-general] mpi failures on large ia64/ofed/IB clusters
> On one run we got this in syslog (ib_mthca's debug_level set to 1): > > 15:34:34 ib_mthca 0012:01:00.0: Command 21 completed with status 09 > 15:35:34 ib_mthca 0012:01:00.0: HW2SW_MPT failed (-16) > .... > (status 0x9==MTHCA_CMD_STAT_BAD_RES_STATE => problem with mpi?) > > or on another run: > > 13:57:15 ib_mthca 0005:01:00.0: Command 1a completed with status 01 > 13:57:15 ib_mthca 0005:01:00.0: modify QP 1->2 returnedstatus 01. > ....
Oct 5, 6:46 pm 2007
Roland Dreier
Re: [ofa-general] mpi failures on large ia64/ofed/IB clusters
> I don't really see anything racy in the FW command stuff, but it's > possible that there's something like an mmiowb() missing somewhere (I > have a hard time spotting that type of race for some reason). Another possibility (independent of the hack I suggested before) would be to add an mmiowb() before the mutex_unlock() in mthca_cmd_post(). I actually have a good feeling about this theory.... - R. _______________________________________________ general mailing list general@list...
Oct 5, 6:51 pm 2007
akepner
Re: [ofa-general] mpi failures on large ia64/ofed/IB clusters
Genius! I have completed over 275 runs with the patch below, so we can be very confident that this has fixed things. Roland, should I submit a proper patch, or do you want to take care of this? (And thanks alot, too!) diff -rup ofa_kernel-1.2.orig/drivers/infiniband/hw/mthca/mthca_cmd.c ofa_kernel-1.2/drivers/infiniband/hw/mthca/mthca_cmd.c --- ofa_kernel-1.2.orig/drivers/infiniband/hw/mthca/mthca_cmd.c 2007-06-21 07:38:47.000000000 -0700 +++ ofa_kernel-1.2/drivers/infiniband/hw/mthca/mthc...
Oct 5, 8:22 pm 2007
Zulfi Imani
[ofa-general] OFED libibverbs API
------=_Part_13327_10039423.1191617160200 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all, I wanted to find out where I can get the libibverbs API specification from. I checked the openfabrics.org website but could not find anything immediately. Thanks Zulfi ------=_Part_13327_10039423.1191617160200 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all,<br>&l...
Oct 5, 4:46 pm 2007
Steve Wise
Re: [ofa-general] OFED libibverbs API
OFA Admins: It would be nice to put the man pages on-line... If we installed the man pages, then used man2html or something we could point folks at that for on-line docs... Zulfi, if you build/install ofed-1.2.5, you can then get man pages for the verbs and rdmacm APIs. Also there are header files and examples that get build/installed. Steve. _______________________________________________ general mailing list general@lists.openfabrics.org [ message continues ]
" title="http://lists.openfabrics.org/cgi-bin/ma...">http://lists.openfabrics.org/cgi-bin/ma...
Oct 5, 5:53 pm 2007
Zulfi Imani
Re: [ofa-general] OFED libibverbs API
------=_Part_13581_6151579.1191622736139 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks Steve. Just a couple of questions. I have installed the OFED1.2 stack. You said I bin include lib lib64 mpi sbin src I do not see any subdir for example programs ? Also where can I find simple programs like file transfer using RDMA and libibverbs ? Does the "verbs.h" in the $INSTALL/include/infiniband represent the libverbs API ? I...
Oct 5, 6:18 pm 2007
Sean Hefty
RE: [ofa-general] [PATCH] rdma/cm: add locking around QP acc...
Rick, have you had a chance to test out this patch? _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Oct 5, 1:19 pm 2007
Vladimir Sokolovsky ...
[ofa-general] ofa_1_3_kernel 20071005-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_3/linux-2.6.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod --with-nes-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.22 Passed ...
Oct 5, 5:53 am 2007
che_del_rosario
[ofa-general] Big mover shows market today
FRLE begins to deliver promised returns, Shares up over 31%. Fearless International Inc. (F R L E) $0.25 UP 31.76 % Hard climb for the hottest new yacht on the market, shares jumped nearly 32% today. You cant ignore these kind of numbers, this is going to be huge. There is a time and place for everything, and Friday more is yours, grab this one early. _______________________________________________ general mailing list general@lists.openfabrics.org [ message continues ]
" title="http://lists.openfabrics.org/cgi-bin/mailma...">http://lists.openfabrics.org/cgi-bin/mailma...
Oct 5, 5:42 am 2007
WINNING NOTIFICATION
[ofa-general] ***SPAM*** CONFIRM YOUR WINNING PRIZE Ref: XYL...
Ref: XYL /26510460037/05 Batch: 24/00319/IPD WINNING NOTIFICATION We happily announce to you the draw (#1071)winner of the cash prize of £2,696,385held on the 4th of October 2007 in London Uk. contact our fiduaciary claims department Agents Name: Van Williams Email: claims_uknationallotterydept3@yahoo.co.uk Tel: +447024096270 1.Name...2.Address...3.Nationality....4.Age...5.Sex... 6.Occupation...7.Phone/Fax..8.COUNTRY.. Cordially, Rose Woo...
Oct 5, 5:24 am 2007
damaru
[ofa-general] Do we like the same books?
----boundary_2054012_0b00d57b-fba4-47c8-849c-606cdcdd600d content-type: text/plain; charset=iso-8859-1 content-transfer-encoding: quoted-printable I just joined Shelfari to connect with other book lovers. Come see the books= I love and see if we have any in common. Then pick my next book so I can= keep on reading.=0A=0AClick below to join my group of friends on Shelfari!= =0A=0Ahttp://www.shelfari.com/Register.aspx?ActivityId=3D22801633&InvitationCode=3Dc70c284d-a0dd-4a69-a023-3022d4752243= ...
Oct 5, 4:30 am 2007
kliteyn
[ofa-general] nightly osm_sim report 2007-10-05:normal compl...
OSM Simulation Regression Summary [Generated mail - please do NOT reply] OpenSM binary date = 2007-10-04 OpenSM git rev = Tue_Oct_2_22:28:56_2007 [d5c34ddc158599abff9f09a6cc6c8cad67745f0b] ibutils git rev = Tue_Sep_4_17:57:34_2007 [4bf283f6a0d7c0264c3a1d2de92745e457585fdb] Total=520 Pass=520 Fail=0 Pass: 39 Stability IS1-16.topo 39 Pkey IS1-16.topo 39 OsmTest IS1-16.topo 39 OsmStress IS1-16.topo 39 Multicast IS1-16.topo 39 LidMgr IS1-16.topo 13 Stability IS3-loop.topo 13...
Oct 5, 1:09 am 2007
Troy Benjegerdes
Re: [ofa-general] Setting lowest-common denominator ipoib mu...
I think it would help usability a lot to put the PARTITION CONFIGURATION section in a separate 'opensm-partitions.conf' man page with the values for rate, mtu and scope listed directly. _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Oct 5, 2:54 pm 2007
Dotan Barak
Re: [ofa-general] Issues to scale to 64K ranks.
Hi. This number of QPs (and any other resource) is per HCA basis. The HCA itself support much more QPs (and more elements from any other resource), but the driver have limited the number of the QPs to consume less memory. The mthca low level driver support changing the number of resources with module parameters, Until those module parameters will be added, the only way to do is to hack the low level driver. Dotan _______________________________________________ general mailing list gen...
Oct 5, 1:51 pm 2007
Roland Dreier
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
> I tested this by simulating a slow passive side responder, and it worked as > expected for those tests. Using an MRA does add another MAD to the CM exchange, > which is why it is sent only after seeing a duplicate request. Alternatively, > we can take the OFED module parameter patch. What the heck, I added this for 2.6.24. If it doesn't work out we can back it out. - R. _______________________________________________ general mailing list general@lists.openfabrics.org http...
Oct 5, 7:10 pm 2007
Roland Dreier
Re: [ofa-general] [PATCH] mlx4: increase permissible number ...
Thanks, I just applied Jack's patch and also this: commit adeeb48f21a36693fed11b318bce132571ed3679 Author: Roland Dreier <rolandd@cisco.com> Date: Fri Oct 5 16:03:44 2007 -0700 IB/mthca: Increase max number of QPs per multicast group to 56 Increase the number of QPs allowed per multicast group from 8 to 56. This allows for one QP per core on 16-core systems, which are now quite common, and allows some space for future growth. This is basically the same pat...
Oct 5, 7:12 pm 2007
Zulfi Imani
Re: [ofa-general] Problem running SDP apps using OFED 1.2
------=_Part_13547_30004409.1191621670073 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Dotan, ifconfig shows up ib0 Link encap:InfiniBand HWaddr 80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr: 140.221.37.32 Bcast: 140.221.37.255 Mask: 255.255.255.0 inet6 addr: fe80::211:7500:ff:d802/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packet...
Oct 5, 6:01 pm 2007
Scott Weitzenkamp (s...
RE: [ofa-general] Problem running SDP apps using OFED 1.2
This is a multi-part message in MIME format. ------_=_NextPart_001_01C8079C.3DCD473F Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Can you ping between the two nodes using the IPoIB IP address? =20 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems =20 ________________________________ From: general-bounces@lists.openfabrics.org [mailto:general-bounces@lists.openfabrics.org] On Behalf Of Zulfi Imani ...
Oct 5, 6:08 pm 2007
Zulfi Imani
Re: [ofa-general] Problem running SDP apps using OFED 1.2
------=_Part_13764_7924086.1191626722839 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I restarted openibd and now my interfaces are up. mach#1 ib0 Link encap:InfiniBand HWaddr 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:140.221.37.46 Bcast:140.221.37.255 Mask:255.255.255.0 inet6 addr: fe80::211:7500:ff:d7f2/64 Scope:Link mach#2 ib0 Link encap:InfiniBand HWaddr 80:00:00:...
Oct 5, 7:25 pm 2007
Scott Weitzenkamp (s...
RE: [ofa-general] Problem running SDP apps using OFED 1.2
This is a multi-part message in MIME format. ------_=_NextPart_001_01C807B9.6764E868 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Does "lsmod | grep sdp" report SDP is loaded on both machines? =20 I would then use strace with the client to watch the socket system calls happening, to make sure the client is trying to use SDP. =20 Scott ________________________________ From: Zulfi Imani [mailto:zulfiimani@gmail.com]=20 Sent: Friday, October...
Oct 5, 9:37 pm 2007
Zulfi Imani
Re: [ofa-general] Problem running SDP apps using OFED 1.2
------=_Part_13612_8676690.1191623167533 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline For machine#1 my IPoIB interface is ib0 Link encap:InfiniBand HWaddr 80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:140.221.37.32 Bcast:140.221.37.255 Mask:255.255.255.0 inet6 addr: fe80::211:7500:ff:d802/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packe...
Oct 5, 6:26 pm 2007
Or Gerlitz
Re: [ofa-general] Problem running SDP apps using OFED 1.2
------=_Part_8145_19775178.1191625124041 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline ib0 on machine#2 is not running, but it seems that your bigger problem is lack of some essential background on TCP/IP operation, where this list is not the best place to gain it. Or. ------=_Part_8145_19775178.1191625124041 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 10/6/07, ...
Oct 5, 6:58 pm 2007
Roland Dreier
Re: [ofa-general] Re: [PATCH RFC v2] IB/ipoib: enable IGMP f...
> I understand this desire... just need a little clarification from you > re hotplug. First, as for OFED, looking on the openibd service script > (excerpts below) installed by OFED 1.3 I see that mode and mtu are set > "manually", that is the user sets/provides the mode and mtu params for > the script and the script uses sysfs to configure the device. This > does not address devices created after the service has started nor > seem a very elegant way to do so. I don't kno...
Oct 5, 6:59 pm 2007
Or Gerlitz
Re: [ofa-general] Re: [PATCH RFC v2] IB/ipoib: enable IGMP f...
------=_Part_8152_27677453.1191625598155 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline OK, AFAIK under both Red Hat and SLES there is a way to intall pre-up and post-down hooks for the iftools, if this is what you were referring to in "hot-plug", then we are on the same page, thanks. Or. ------=_Part_8152_27677453.1191625598155 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inlin...
Oct 5, 7:06 pm 2007
Sean Hefty
[ofa-general] [PATCH-2.6.24 2/2 v2] [RFC] ib/cm: add basic p...
Add performance/debug counters to track sent/received messages, retries, and duplicates. Counters are tracked per CM message type, per port. The counters are always enabled, so intrusive state tracking is not done. Counters are exported as: /sys/class/infiniband_cm/device/port/counter_description/cm_attribute for example: /sys/class/infiniband_cm/mthca0/1/cm_tx_msgs/req /sys/class/infiniband_cm/mthca0/1/cm_tx_retries/rep Signed-off-by: Sean Hefty <sean.hefty@intel.com> This m...
Oct 5, 5:31 pm 2007
previous daytodaynext day
October 4, 2007October 5, 2007October 6, 2007