[ofa-general] [V2][PATCH 3/3] ib/ipoib: IPoIB-UD RX S/G support

Previous thread: [ofa-general] Mlcrosoft 0ff!ce2007 for XP|Vlsta 79, Retail 899 (save 819) by Piercarlo Obrien on Tuesday, January 15, 2008 - 10:06 pm. (1 message)

Next thread: [ofa-general] by ד"ר שי סולן on Wednesday, January 16, 2008 - 3:54 am. (1 message)

Hi Roland,

Looking on ipoib_start_xmit, it seems that both the check that comes to handle a gratitious
ARP (ie a difference between the remote GID as kept in the ipoib_neigh to the one present in
the network stack neighbour) and the check that comes to handle a situation where we attempt to
xmit an ipoib_neigh created by another ipoib device (ie following a bonding failover) -
does not come into play for the connected mode neighbours.

Isn't it a bug, or I miss something?

Or.

+static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_neigh *neigh;
+	unsigned long flags;
+
...
+	if (likely(skb->dst && skb->dst->neighbour)) {
...
+		neigh = *to_ipoib_neigh(skb->dst->neighbour);
+
+		if (ipoib_cm_get(neigh)) {
+			if (ipoib_cm_up(neigh)) {
+				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+				goto out;
+			}
+		} else if (neigh->ah) {
+			if (unlikely((memcmp(&neigh->dgid.raw,
+					    skb->dst->neighbour->ha + 4,
+					    sizeof(union ib_gid))) ||
+					 (neigh->dev != dev))) {

any reason not to apply these two checks on connected mode neighbours?

+				spin_lock(&priv->lock);
...
+			ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha));
+			goto out;
+		}
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

> Looking on ipoib_start_xmit, it seems that both the check that
 > comes to handle a gratitious ARP (ie a difference between the
 > remote GID as kept in the ipoib_neigh to the one present in the
 > network stack neighbour) and the check that comes to handle a
 > situation where we attempt to xmit an ipoib_neigh created by
 > another ipoib device (ie following a bonding failover) - does not
 > come into play for the connected mode neighbours.
 > 
 > Isn't it a bug, or I miss something?

Good question.  The device test came straight from Moni's patch -- how
much have you guys tested bonding of IPoIB CM?

The GID comparison seems a little trickier to handle -- it seems on a
neighbour GID change we need to tear down any connection we might have
in the CM case...
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

The test for neigh->dev != dev comes to handle a possible race where a 
fail over occurs under a high xmit rate, so the deletion of the 
ipoib_neigh portion of the neighbour causes by the bonding fail-over 
did not happen yet, but as of the fail-over the bonding is now xmitting 
through a device which is not the one that created the ipoib_neigh.

We have never managed to reproduce a hit on this check... anyway, I will 
  double check on how much testing was done with the bonding and 

not really: when there is a hit on the GID comparison ipoib_neigh_free() 
is called which for a connected mode neighbour will invoke 
ipoib_cm_destroy_tx() which will disconnect etc.

Or

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

move a little up the code that checks for a situation where the remote GID stored in the ipoib_neigh is
different than the one present in the neighbour (handle Gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh created not by the current slave. This
will cause the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

I have tested this patch on 2.6.24-rc1 (and its now in progress for 2.6.24-rc8)
things are basically working fine, but I do want to play more with bonding fail-overs
to make sure nothing was broken wrt to Gratuitous ARP etc, will let you know.

-----

Index: linux-2.6.24-rc8/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- linux-2.6.24-rc8.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2008-01-17 16:37:10.000000000 +0200
+++ linux-2.6.24-rc8/drivers/infiniband/ulp/ipoib/ipoib_main.c	2008-01-17 16:46:51.000000000 +0200
@@ -686,13 +686,8 @@ static int ipoib_start_xmit(struct sk_bu
 		}

 		neigh = *to_ipoib_neigh(skb->dst->neighbour);
-
-		if (ipoib_cm_get(neigh)) {
-			if (ipoib_cm_up(neigh)) {
-				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
-				goto out;
-			}
-		} else if (neigh->ah) {
+
+		if (neigh->ah)
 			if (unlikely((memcmp(&neigh->dgid.raw,
 					    skb->dst->neighbour->ha + 4,
 					    sizeof(union ib_gid))) ||
@@ -713,6 +708,12 @@ static int ipoib_start_xmit(struct sk_bu
 				goto out;
 			}

+		if (ipoib_cm_get(neigh)) {
+			if (ipoib_cm_up(neigh)) {
+				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+				goto out;
+			}
+		} else if (neigh->ah) {
 			ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha));
 			goto out;
 		}

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit ...

I have did some more testing but not enough to say if without this patch fail-over
under connected mode is always slow. Being away for the rest of this week, I will
continue working on it next week.

Or.
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

OK, Roland, I'd am now confident that this patch is needed, see below the reasonings,
please apply to 2.6.25, later I will send it also to -stable, here goes:

Basically ipoib-cm is not totaly broken wrt to bonding AND connect mode --without-- this
patch being applied, but OTOH it does not function at it should. My setup has a client node
xmitting udp unicast to a server node where the server node is bonded (ib0 and ib1 are
enslaved by bond0). I tried three types of fail-overs where each one of them causes the
bonding at the server node to send gratuitous ARP where without this patch no act is
taken by ipoib at the client side

A) using "primary slave up" (*)
B) taking an interface down
C) taking a port down

In the "primary slave up" fail-over case, since the non-active slave interface is up and running,
the traffic keeps going through it, so forever at the client side there's a neighbour pointing
to GID X where the traffic goes to (the QP associated with) GID Y.

In the interface down fail-over case, the going down code closes the RX QP, since the connected
mode (cm) is implemented over RC (...) this causes a send completion with IB_WC_RETRY_EXC_ERR
error to be generated by the HCA, ipoib_cm_handle_tx_wc calls ipoib_neigh_free and when the next
xmit is called from the stack, ipoib creates a new ipoib_neigh, this time against the correct GID

In the port going down case, again the RC implementation causes the retry exceeded error to
take place and from here its the same as in the previous case.

Other then all the above, gratitious ARP is used in other HA schemes such as floating IP address
between I/O targets, since the connected mode ignores it, this scheme will not work without the patch.

Or

(*) the bonding HA mode enables you to select a primary slave which once
up would be moved to be the active slave. So to cause this failover, I
take the primary (eg ib0) down, and then fail-over happens to the second
slave (eg ib1), now I take the primary up and a second fail-over ...

Hi Roland,

Do you need from me any more clarification to merge this into 2.6.25 ?

Or

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 11:42 am

The current IPoIB-UD implementation is limited IPoIB payload size to
2048 through hard coding IPOIB_PACKET_SIZE. The implementation is
designed for kernel PAGE_SIZE equals or greater than 4K. If the kernel
PAGE_SIZE is equals to 2K, memory buffer allocation will be failure when
lack of large buffer of memory. However most of the Distros does support
PAGE_SIZE >= 4K. So this implementation has no problem for 2048 payload.
This implementation is simple but it prevents HCA device who does
support 4096 payload from performing, like IBM eHCA2.

This patch allows IPoIB-UD MTU up to 4092 (4K - IPOIB_ENCAP_LEN) when
HCA can support 4K MTU. In this patch, APIs for S/G buffer allocation in
IPoIB-CM mode has been made generic so IPoIB-UD and IPoIB-CM can share
the S/G code. When PAGE_SIZE is equal or greater than IPOIB_UD_BUF_SIZE
+ bytes padding to align IP header, Only one buffer is needed for 4K MTU
buffer allocation, otherwise, two buffers allocation is needed in S/G.
The node IPoIB link MTU size is the minimum value of admin configurable
MTU through ifconfig and IPoIB default broadcast group MTU size. When
Subnet Manager enables default broadcast group during start up, this
subnet IPoIB link MTU will be the value of default broadcast group MTU
size. For any node IB MTU smaller than this value, the node can't join
this IPoIB subnet. For any node IB MTU is greater than this value, the
node will join this IPoIB subnet and this value will be set as its IPOIB
link MTU. If Subnet Manager disables default broadcast group during
start up, the first bring up node in this subnet will create the default
IPoIB broadcast group based on the negotiation with the Subnet Manager,
the default is currently set as 2K according to IPoIB RFC.

The patch will be splitted into two patches:

1. Make IPoIB-CM RX S/G APIs generic
2. Enable IPoIB-UD RX S/G

I am trying to split these two patches more independent so it's easy to
test. ipoib_cm_alloc_rx_skb() will be renamed in second patch. Please
review these ...
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 11:45 am

Please review below patch while I am testing so I can integrate your
comments in my test immediately.

Thanks
Shirley

Signed-off-by:Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |   25 ++++--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |  139
++++++------------------------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   85 +++++++++++++++++++
 3 files changed, 131 insertions(+), 118 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index fe250c6..138f1a3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -138,7 +138,7 @@ struct ipoib_mcast {
 
 struct ipoib_rx_buf {
 	struct sk_buff *skb;
-	u64		mapping;
+	u64		mapping[IPOIB_CM_RX_SG];
 };
 
 struct ipoib_tx_buf {
@@ -189,7 +189,7 @@ enum ipoib_cm_state {
 struct ipoib_cm_rx {
 	struct ib_cm_id	       *id;
 	struct ib_qp	       *qp;
-	struct ipoib_cm_rx_buf *rx_ring;
+	struct ipoib_rx_buf    *rx_ring;
 	struct list_head	list;
 	struct net_device      *dev;
 	unsigned long		jiffies;
@@ -212,11 +212,6 @@ struct ipoib_cm_tx {
 	struct ib_wc	     ibwc[IPOIB_NUM_WC];
 };
 
-struct ipoib_cm_rx_buf {
-	struct sk_buff *skb;
-	u64 mapping[IPOIB_CM_RX_SG];
-};
-
 struct ipoib_cm_dev_priv {
 	struct ib_srq	       *srq;
 	struct ipoib_cm_rx_buf *srq_ring;
@@ -458,6 +453,22 @@ int ipoib_vlan_delete(struct net_device *pdev,
unsigned short pkey);
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 void ipoib_drain_cq(struct net_device *dev);
+void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
+		   unsigned int length, struct sk_buff *toskb);
+struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
+				      int id, int frags, int head_size,
+				      int pad, u64 *mapping);
+void inline ipoib_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
+			       int head_size, u64 *mapping)
+{
+	int ...
From: Shirley Ma
Date: Thursday, January 31, 2008 - 11:58 am

This patch makes IPoIB-CM RX S/G APIs more generic for IPoIB-UD RX S/G
to be resued later.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |   26 +++++-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |  135
++++++-------------------------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   85 +++++++++++++++++++
 3 files changed, 132 insertions(+), 114 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index fe250c6..d1d3ca2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -141,6 +141,11 @@ struct ipoib_rx_buf {
 	u64		mapping;
 };
 
+struct ipoib_cm_rx_buf {
+	struct sk_buff *skb;
+	u64		mapping[IPOIB_CM_RX_SG];
+};
+
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -212,11 +217,6 @@ struct ipoib_cm_tx {
 	struct ib_wc	     ibwc[IPOIB_NUM_WC];
 };
 
-struct ipoib_cm_rx_buf {
-	struct sk_buff *skb;
-	u64 mapping[IPOIB_CM_RX_SG];
-};
-
 struct ipoib_cm_dev_priv {
 	struct ib_srq	       *srq;
 	struct ipoib_cm_rx_buf *srq_ring;
@@ -458,6 +458,22 @@ int ipoib_vlan_delete(struct net_device *pdev,
unsigned short pkey);
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 void ipoib_drain_cq(struct net_device *dev);
+void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
+		   unsigned int length, struct sk_buff *toskb);
+struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
+				      int id, int frags, int head_size,
+				      int pad, u64 *mapping);
+static void inline ipoib_dma_unmap_rx(struct ipoib_dev_priv *priv, int
frags,
+			              int head_size, u64 *mapping)
+{
+	int i;
+	ib_dma_unmap_single(priv->ca, mapping[0], head_size, DMA_FROM_DEVICE);
+	for (i = 0; i < frags; i++)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE,
+				    DMA_FROM_DEVICE);
+
+}
+
 
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
 
diff --git ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:29 am

This patch makes two of IPoIB-CM RX S/G APIs generic, so it can be
reusable. This patch is the same as V1 previously submitted.

Signed-of-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |   26 +++++-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |  135
++++++-------------------------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   85 +++++++++++++++++++
 3 files changed, 132 insertions(+), 114 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index fe250c6..d1d3ca2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -141,6 +141,11 @@ struct ipoib_rx_buf {
 	u64		mapping;
 };
 
+struct ipoib_cm_rx_buf {
+	struct sk_buff *skb;
+	u64		mapping[IPOIB_CM_RX_SG];
+};
+
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -212,11 +217,6 @@ struct ipoib_cm_tx {
 	struct ib_wc	     ibwc[IPOIB_NUM_WC];
 };
 
-struct ipoib_cm_rx_buf {
-	struct sk_buff *skb;
-	u64 mapping[IPOIB_CM_RX_SG];
-};
-
 struct ipoib_cm_dev_priv {
 	struct ib_srq	       *srq;
 	struct ipoib_cm_rx_buf *srq_ring;
@@ -458,6 +458,22 @@ int ipoib_vlan_delete(struct net_device *pdev,
unsigned short pkey);
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 void ipoib_drain_cq(struct net_device *dev);
+void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
+		   unsigned int length, struct sk_buff *toskb);
+struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
+				      int id, int frags, int head_size,
+				      int pad, u64 *mapping);
+static void inline ipoib_dma_unmap_rx(struct ipoib_dev_priv *priv, int
frags,
+			              int head_size, u64 *mapping)
+{
+	int i;
+	ib_dma_unmap_single(priv->ca, mapping[0], head_size, DMA_FROM_DEVICE);
+	for (i = 0; i < frags; i++)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE,
+				    DMA_FROM_DEVICE);
+
+}
+
 
 #ifdef ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 11:38 am

This patch has created a couple of APIs for UD RX S/G to be used later.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |    9 ++++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   65
+++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index fe250c6..415bf9a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -61,6 +61,10 @@ enum {
 
 	IPOIB_ENCAP_LEN		  = 4,
 
+	IPOIB_MAX_IB_MTU	  = 4096,
+	IPOIB_UD_HEAD_SIZE	  = IB_GRH_BYTES + IPOIB_ENCAP_LEN,
+	IPOIB_UD_RX_SG		  = 2, /* for 4K MTU */ 
+
 	IPOIB_CM_MTU		  = 0x10000 - 0x10, /* padding to align header to 16 */
 	IPOIB_CM_BUF_SIZE	  = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
 	IPOIB_CM_HEAD_SIZE	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -136,6 +140,11 @@ struct ipoib_mcast {
 	struct net_device *dev;
 };
 
+struct ipoib_sg_rx_buf {
+	struct sk_buff *skb;
+	u64		mapping[IPOIB_UD_RX_SG];
+};
+
 struct ipoib_rx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 52bc2bd..9ca3d34 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -87,6 +87,71 @@ void ipoib_free_ah(struct kref *kref)
 	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
+/* Adjust length of skb with fragments to match received data */
+static void ipoib_ud_skb_put_frags(struct sk_buff *skb, unsigned int
length,
+				   struct sk_buff *toskb)
+{
+	unsigned int size;
+	skb_frag_t *frag = &skb_shinfo(skb)->frags[0];
+
+	/* put header into skb */
+	size = min(length, (unsigned)IPOIB_UD_HEAD_SIZE);
+	skb->tail += size;
+	skb->len += size;
+	length -= size;
+
+	if (length == 0) {
+		/* don't need this page */
+		skb_fill_page_desc(toskb, 0, frag->page, 0, PAGE_SIZE);
+		--skb_shinfo(skb)->nr_frags;
+	} else ...
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 12:33 pm

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 138f1a3..65b1159 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -56,11 +56,11 @@
 /* constants */
 
 enum {
-	IPOIB_PACKET_SIZE	  = 2048,
-	IPOIB_BUF_SIZE		  = IPOIB_PACKET_SIZE + IB_GRH_BYTES,
-
 	IPOIB_ENCAP_LEN		  = 4,
 
+	IPOIB_MAX_IB_MTU	  = 4096, /* max ib device payload is 4096 */
+	IPOIB_UD_MAX_RX_SG	  = ALIGN(IPOIB_MAX_IB_MTU + IB_GRH_BYTES + 4,
PAGE_SIZE) / PAGE_SIZE,  /* padding to align IP header */
+
 	IPOIB_CM_MTU		  = 0x10000 - 0x10, /* padding to align header to 16 */
 	IPOIB_CM_BUF_SIZE	  = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
 	IPOIB_CM_HEAD_SIZE	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -314,6 +314,9 @@ struct ipoib_dev_priv {
 	struct dentry *mcg_dentry;
 	struct dentry *path_dentry;
 #endif
+	int max_ib_mtu;
+	struct ib_sge rx_sge[IPOIB_UD_MAX_RX_SG];
+	struct ib_recv_wr rx_wr;
 };
 
 struct ipoib_ah {
@@ -354,6 +357,11 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES + 4) /*
padding to align IP header */
+#define IPOIB_UD_HEAD_SIZE(ib_mtu)	(IPOIB_UD_BUF_SIZE(ib_mtu)) %
PAGE_SIZE
+#define IPOIB_UD_RX_SG(ib_mtu)		ALIGN(IPOIB_UD_BUF_SIZE(ib_mtu),
PAGE_SIZE) / PAGE_SIZE
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..646aeb2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -194,7 +194,7 @@ static int ipoib_change_mtu(struct net_device *dev,
int new_mtu)
 		return 0;
 	}
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	if (new_mtu > ...
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 1:43 pm

Found a problem in patch generation file ipoib_verbs.c, I will fix it
tomorrow, it should be:

--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -150,7 +150,7 @@ int ipoib_transport_dev_init(struct net_device *dev,
struct ib_device *ca)
                        .max_send_wr  = ipoib_sendq_size,
                        .max_recv_wr  = ipoib_recvq_size,
                        .max_send_sge = 1,
-                       .max_recv_sge = 1
+                       .max_recv_sge =
IPOIB_UD_RX_SG(priv->max_ib_mtu) 
                },
                .sq_sig_type = IB_SIGNAL_ALL_WR,
                .qp_type     = IB_QPT_UD
@@ -208,6 +208,16 @@ int ipoib_transport_dev_init(struct net_device
*dev, struct ib_device *ca)
        priv->tx_wr.num_sge     = 1;
        priv->tx_wr.send_flags  = IB_SEND_SIGNALED;

+       priv->rx_sge[0].length = IPOIB_UD_HEAD_SIZE(priv->max_ib_mtu);
+       for (i = 0; i < IPOIB_UD_RX_SG(priv->max_ib_mtu) - 1; ++i) {
+               priv->rx_sge[i].lkey = priv->mr->lkey;
+               priv->rx_sge[i + 1].length = PAGE_SIZE;
+       }
+       priv->rx_sge[i + 1].lkey = priv->mr->lkey;
+       priv->rx_wr.num_sge = IPOIB_UD_RX_SG(priv->max_ib_mtu);
+       priv->rx_wr.next = NULL;
+       priv->rx_wr.sg_list = priv->rx_sge;
+
        return 0;

 out_free_cq:

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Thursday, January 31, 2008 - 12:35 pm

This patch sets up all IPoIB-UD RX S/G related parameters.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h           |   13 +++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   19
++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 +--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   14 ++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index d1d3ca2..004a80b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -61,6 +61,10 @@ enum {
 
 	IPOIB_ENCAP_LEN		  = 4,
 
+	IPOIB_MAX_IB_MTU	  = 4096,
+	IPOIB_UD_MAX_RX_SG	  = ALIGN(IPOIB_MAX_IB_MTU + IB_GRH_BYTES + 4,
+					  PAGE_SIZE) / PAGE_SIZE,  /* padding to align IP header */
+
 	IPOIB_CM_MTU		  = 0x10000 - 0x10, /* padding to align header to 16 */
 	IPOIB_CM_BUF_SIZE	  = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
 	IPOIB_CM_HEAD_SIZE	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -319,6 +323,9 @@ struct ipoib_dev_priv {
 	struct dentry *mcg_dentry;
 	struct dentry *path_dentry;
 #endif
+	int max_ib_mtu;
+	struct ib_sge rx_sge[IPOIB_UD_MAX_RX_SG];
+	struct ib_recv_wr rx_wr;
 };
 
 struct ipoib_ah {
@@ -359,6 +366,12 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+/* padding to align IP header */ 
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES + 4) 
+#define IPOIB_UD_HEAD_SIZE(ib_mtu)	(IPOIB_UD_BUF_SIZE(ib_mtu)) %
PAGE_SIZE
+#define IPOIB_UD_RX_SG(ib_mtu)		ALIGN(IPOIB_UD_BUF_SIZE(ib_mtu),
PAGE_SIZE) / PAGE_SIZE
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..242591f 100644
--- ...
From: Shirley Ma
Date: Friday, February 1, 2008 - 11:04 pm

--1__=08BBF970DFB285898f9e8a93df938690918c08BBF970DFB28589
Content-type: text/plain; charset=US-ASCII





My unix mail is down. Here is the new update one.  I need to resend this
one when my unix mail back.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h           |   13 +++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   19 ++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 +--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   14 ++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index d1d3ca2..004a80b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -61,6 +61,10 @@ enum {

      IPOIB_ENCAP_LEN           = 4,

+     IPOIB_MAX_IB_MTU    = 4096,
+     IPOIB_UD_MAX_RX_SG        = ALIGN(IPOIB_MAX_IB_MTU + IB_GRH_BYTES + 4,
+                               PAGE_SIZE) / PAGE_SIZE,  /* padding to align IP header */
+
      IPOIB_CM_MTU              = 0x10000 - 0x10, /* padding to align header to 16 */
      IPOIB_CM_BUF_SIZE   = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
      IPOIB_CM_HEAD_SIZE        = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -319,6 +323,9 @@ struct ipoib_dev_priv {
      struct dentry *mcg_dentry;
      struct dentry *path_dentry;
 #endif
+     int max_ib_mtu;
+     struct ib_sge rx_sge[IPOIB_UD_MAX_RX_SG];
+     struct ib_recv_wr rx_wr;
 };

 struct ipoib_ah {
@@ -359,6 +366,12 @@ struct ipoib_neigh {
      struct list_head    list;
 };

+#define IPOIB_UD_MTU(ib_mtu)       (ib_mtu - IPOIB_ENCAP_LEN)
+/* padding to align IP header */
+#define IPOIB_UD_BUF_SIZE(ib_mtu)  (ib_mtu + IB_GRH_BYTES + 4)
+#define IPOIB_UD_HEAD_SIZE(ib_mtu) (IPOIB_UD_BUF_SIZE(ib_mtu)) % PAGE_SIZE
+#define IPOIB_UD_RX_SG(ib_mtu)           ALIGN(IPOIB_UD_BUF_SIZE(ib_mtu), PAGE_SIZE) / PAGE_SIZE
+
 /*
  * We stash a pointer to our private neighbour ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:30 am

This patch is the same as previous submitted version (V1).
This patch makes IPoIB-UD RX S/G to be ready.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index d1d3ca2..004a80b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -61,6 +61,10 @@ enum {
 
 	IPOIB_ENCAP_LEN		  = 4,
 
+	IPOIB_MAX_IB_MTU	  = 4096,
+	IPOIB_UD_MAX_RX_SG	  = ALIGN(IPOIB_MAX_IB_MTU + IB_GRH_BYTES + 4,
+					  PAGE_SIZE) / PAGE_SIZE,  /* padding to align IP header */
+
 	IPOIB_CM_MTU		  = 0x10000 - 0x10, /* padding to align header to 16 */
 	IPOIB_CM_BUF_SIZE	  = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
 	IPOIB_CM_HEAD_SIZE	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -319,6 +323,9 @@ struct ipoib_dev_priv {
 	struct dentry *mcg_dentry;
 	struct dentry *path_dentry;
 #endif
+	int max_ib_mtu;
+	struct ib_sge rx_sge[IPOIB_UD_MAX_RX_SG];
+	struct ib_recv_wr rx_wr;
 };
 
 struct ipoib_ah {
@@ -359,6 +366,12 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+/* padding to align IP header */ 
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES + 4) 
+#define IPOIB_UD_HEAD_SIZE(ib_mtu)	(IPOIB_UD_BUF_SIZE(ib_mtu)) %
PAGE_SIZE
+#define IPOIB_UD_RX_SG(ib_mtu)		ALIGN(IPOIB_UD_BUF_SIZE(ib_mtu),
PAGE_SIZE) / PAGE_SIZE
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..242591f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -194,7 +194,7 @@ static int ipoib_change_mtu(struct net_device *dev,
int new_mtu)
 		return 0;
 	}
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
 		return -EINVAL;
 ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:52 am

Signed-off-by: Shirley Ma <xma@us.ibm.com>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 138f1a3..65b1159 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -56,11 +56,11 @@
/* constants */

enum {
- IPOIB_PACKET_SIZE   = 2048,
- IPOIB_BUF_SIZE   = IPOIB_PACKET_SIZE + IB_GRH_BYTES,
-
IPOIB_ENCAP_LEN   = 4,

+ IPOIB_MAX_IB_MTU   = 4096, /* max ib device payload is 4096 */
+ IPOIB_UD_MAX_RX_SG   = ALIGN(IPOIB_MAX_IB_MTU + IB_GRH_BYTES + 4,
PAGE_SIZE) / PAGE_SIZE,  /* padding to align IP header */
+
IPOIB_CM_MTU   = 0x10000 - 0x10, /* padding to align header to 16 */
IPOIB_CM_BUF_SIZE   = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
IPOIB_CM_HEAD_SIZE   = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -314,6 +314,9 @@ struct ipoib_dev_priv {
struct dentry *mcg_dentry;
struct dentry *path_dentry;
#endif
+ int max_ib_mtu;
+ struct ib_sge rx_sge[IPOIB_UD_MAX_RX_SG];
+ struct ib_recv_wr rx_wr;
};

struct ipoib_ah {
@@ -354,6 +357,11 @@ struct ipoib_neigh {
struct list_head    list;
};

+#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN)
+#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES + 4) /*
padding to align IP header */
+#define IPOIB_UD_HEAD_SIZE(ib_mtu) (IPOIB_UD_BUF_SIZE(ib_mtu)) %
PAGE_SIZE
+#define IPOIB_UD_RX_SG(ib_mtu) ALIGN(IPOIB_UD_BUF_SIZE(ib_mtu),
PAGE_SIZE) / PAGE_SIZE
+
/*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..646aeb2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -194,7 +194,7 @@ static int ipoib_change_mtu(struct net_device *dev,
int new_mtu)
return 0;
}

- if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+ if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
return -EINVAL;

priv->admin_mtu ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 11:40 am

Define and set several UD RX S/G parameters to be used later.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h           |   16 ++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   19
++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 +--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   13 ++++++++++++-
 4 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 415bf9a..6b5e108 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -328,6 +328,9 @@ struct ipoib_dev_priv {
 	struct dentry *mcg_dentry;
 	struct dentry *path_dentry;
 #endif
+	int max_ib_mtu;
+	struct ib_sge rx_sge[IPOIB_UD_RX_SG];
+	struct ib_recv_wr rx_wr;
 };
 
 struct ipoib_ah {
@@ -368,6 +371,19 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES)
+static inline int ipoib_ud_need_sg(int ib_mtu)
+{
+	return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;
+}
+static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv,
+					 u64 mapping[IPOIB_UD_RX_SG])
+{
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE,
DMA_FROM_DEVICE);
+	ib_dma_unmap_single(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..242591f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -194,7 +194,7 @@ static int ipoib_change_mtu(struct net_device *dev,
int new_mtu)
 		return 0;
 	}
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	if (new_mtu > ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 2:33 pm

Patchset has been tested for Intel platform 2K MTU. Here is the update
patch:

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h           |   16 ++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   19
++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 +--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   16 +++++++++++++++-
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 415bf9a..6b5e108 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -328,6 +328,9 @@ struct ipoib_dev_priv {
 	struct dentry *mcg_dentry;
 	struct dentry *path_dentry;
 #endif
+	int max_ib_mtu;
+	struct ib_sge rx_sge[IPOIB_UD_RX_SG];
+	struct ib_recv_wr rx_wr;
 };
 
 struct ipoib_ah {
@@ -368,6 +371,19 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES)
+static inline int ipoib_ud_need_sg(int ib_mtu)
+{
+	return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;
+}
+static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv,
+					 u64 mapping[IPOIB_UD_RX_SG])
+{
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE,
DMA_FROM_DEVICE);
+	ib_dma_unmap_single(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a082466..242591f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -194,7 +194,7 @@ static int ipoib_change_mtu(struct net_device *dev,
int new_mtu)
 		return 0;
 	}
 
-	if (new_mtu > IPOIB_PACKET_SIZE - ...
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 1:30 pm

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 65b1159..969955e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -463,9 +463,9 @@ int ipoib_pkey_dev_delay_open(struct net_device
*dev);
 void ipoib_drain_cq(struct net_device *dev);
 void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 		   unsigned int length, struct sk_buff *toskb);
-struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
-				      int id, int frags, int head_size,
-				      int pad, u64 *mapping);
+struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev,
+				   int id, int frags, int head_size,
+				   int pad, u64 *mapping);
 void inline ipoib_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
 			       int head_size, u64 *mapping)
 {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index c7d42ea..a9af796 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -283,11 +283,10 @@ static int ipoib_cm_nonsrq_init_rx(struct
net_device *dev, struct ib_cm_id *cm_i
 	spin_unlock_irq(&priv->lock);
 
 	for (i = 0; i < ipoib_recvq_size; ++i) {
-		rx->rx_ring[i].skb = ipoib_cm_alloc_rx_skb(dev, i,
-							   IPOIB_CM_RX_SG - 1,
-							   IPOIB_CM_HEAD_SIZE, 
-							   12,
-							   rx->rx_ring[i].mapping);
+		rx->rx_ring[i].skb = ipoib_alloc_rx_skb(dev, i,
+							IPOIB_CM_RX_SG - 1,
+							IPOIB_CM_HEAD_SIZE, 12,
+							rx->rx_ring[i].mapping);
 		if (!rx->rx_ring[i].skb) {
 			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
 				ret = -ENOMEM;
@@ -491,8 +490,8 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev,
struct ib_wc *wc)
 	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
 					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
 
-	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, IPOIB_CM_HEAD_SIZE, 
-				       ...
From: Shirley Ma
Date: Thursday, January 31, 2008 - 1:20 pm

This patch enables IPoIB-UD RX to allocate S/G buffer up to payload size
4096. The link IPoIB MTU size is up to 4K - 4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |   14 +----
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   25 ++++----
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   95
+++++++++++-------------------
 3 files changed, 50 insertions(+), 84 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 004a80b..57d33d5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -56,9 +56,6 @@
 /* constants */
 
 enum {
-	IPOIB_PACKET_SIZE	  = 2048,
-	IPOIB_BUF_SIZE		  = IPOIB_PACKET_SIZE + IB_GRH_BYTES,
-
 	IPOIB_ENCAP_LEN		  = 4,
 
 	IPOIB_MAX_IB_MTU	  = 4096,
@@ -142,11 +139,6 @@ struct ipoib_mcast {
 
 struct ipoib_rx_buf {
 	struct sk_buff *skb;
-	u64		mapping;
-};
-
-struct ipoib_cm_rx_buf {
-	struct sk_buff *skb;
 	u64		mapping[IPOIB_CM_RX_SG];
 };
 
@@ -198,7 +190,7 @@ enum ipoib_cm_state {
 struct ipoib_cm_rx {
 	struct ib_cm_id	       *id;
 	struct ib_qp	       *qp;
-	struct ipoib_cm_rx_buf *rx_ring;
+	struct ipoib_rx_buf    *rx_ring;
 	struct list_head	list;
 	struct net_device      *dev;
 	unsigned long		jiffies;
@@ -223,7 +215,7 @@ struct ipoib_cm_tx {
 
 struct ipoib_cm_dev_priv {
 	struct ib_srq	       *srq;
-	struct ipoib_cm_rx_buf *srq_ring;
+	struct ipoib_rx_buf    *srq_ring;
 	struct ib_cm_id	       *id;
 	struct list_head	passive_ids;   /* state: LIVE */
 	struct list_head	rx_error_list; /* state: ERROR */
@@ -473,7 +465,7 @@ int ipoib_pkey_dev_delay_open(struct net_device
*dev);
 void ipoib_drain_cq(struct net_device *dev);
 void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 		   unsigned int length, struct sk_buff *toskb);
-struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
+struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev,
 				      int id, int frags, int ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:39 am

This patch keeps existing 2K MTU IPoIB-UD implemenation to be used by
both 2K MTU and no S/G 4K MTU. 4K MTU RX S/G is needed when necessary.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |   28 ++++-----
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   10 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |  108
++++++++++++++++++++++---------
 3 files changed, 95 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 004a80b..6c33d7d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -56,9 +56,6 @@
 /* constants */
 
 enum {
-	IPOIB_PACKET_SIZE	  = 2048,
-	IPOIB_BUF_SIZE		  = IPOIB_PACKET_SIZE + IB_GRH_BYTES,
-
 	IPOIB_ENCAP_LEN		  = 4,
 
 	IPOIB_MAX_IB_MTU	  = 4096,
@@ -140,12 +137,7 @@ struct ipoib_mcast {
 	struct net_device *dev;
 };
 
-struct ipoib_rx_buf {
-	struct sk_buff *skb;
-	u64		mapping;
-};
-
-struct ipoib_cm_rx_buf {
+struct ipoib_sg_rx_buf {
 	struct sk_buff *skb;
 	u64		mapping[IPOIB_CM_RX_SG];
 };
@@ -198,7 +190,7 @@ enum ipoib_cm_state {
 struct ipoib_cm_rx {
 	struct ib_cm_id	       *id;
 	struct ib_qp	       *qp;
-	struct ipoib_cm_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 	struct list_head	list;
 	struct net_device      *dev;
 	unsigned long		jiffies;
@@ -223,7 +215,7 @@ struct ipoib_cm_tx {
 
 struct ipoib_cm_dev_priv {
 	struct ib_srq	       *srq;
-	struct ipoib_cm_rx_buf *srq_ring;
+	struct ipoib_sg_rx_buf *srq_ring;
 	struct ib_cm_id	       *id;
 	struct list_head	passive_ids;   /* state: LIVE */
 	struct list_head	rx_error_list; /* state: ERROR */
@@ -294,7 +286,7 @@ struct ipoib_dev_priv {
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
 
-	struct ipoib_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 
 	spinlock_t	     tx_lock;
 	struct ipoib_tx_buf *tx_ring;
@@ -367,10 +359,14 @@ struct ipoib_neigh {
 };
 
 #define ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 11:52 am

This patch enables IPoIB-UD 4K MTU support. If PAGE_SIZE > 4K MTU + GRH
head + IPoIB head, then two buffers are allocated, otherwise use one
buffer.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |    7 +--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   90
+++++++++++++++++++------------
 2 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 6b5e108..faee740 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -145,11 +145,6 @@ struct ipoib_sg_rx_buf {
 	u64		mapping[IPOIB_UD_RX_SG];
 };
 
-struct ipoib_rx_buf {
-	struct sk_buff *skb;
-	u64		mapping;
-};
-
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -299,7 +294,7 @@ struct ipoib_dev_priv {
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
 
-	struct ipoib_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 
 	spinlock_t	     tx_lock;
 	struct ipoib_tx_buf *tx_ring;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 9ca3d34..93025d3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -155,29 +155,22 @@ partial_error:
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_sge list;
-	struct ib_recv_wr param;
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
-	list.addr     = priv->rx_ring[id].mapping;
-	list.length   = IPOIB_BUF_SIZE;
-	list.lkey     = priv->mr->lkey;
-
-	param.next    = NULL;
-	param.wr_id   = id | IPOIB_OP_RECV;
-	param.sg_list = &list;
-	param.num_sge = 1;
-
-	ret = ib_post_recv(priv->qp, &param, &bad_wr);
+	priv->rx_wr.wr_id = id | IPOIB_OP_RECV;
+	ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr);
 	if (unlikely(ret)) {
+		if (ipoib_ud_need_sg(priv->max_ib_mtu))
+			ipoib_sg_dma_unmap_rx(priv, ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 2:35 pm

This patchset has been tested for 2K MTU on Intel platform with mthca.
Here is the updated one:

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |    7 +--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   91
+++++++++++++++++++------------
 2 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 6b5e108..faee740 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -145,11 +145,6 @@ struct ipoib_sg_rx_buf {
 	u64		mapping[IPOIB_UD_RX_SG];
 };
 
-struct ipoib_rx_buf {
-	struct sk_buff *skb;
-	u64		mapping;
-};
-
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -299,7 +294,7 @@ struct ipoib_dev_priv {
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
 
-	struct ipoib_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 
 	spinlock_t	     tx_lock;
 	struct ipoib_tx_buf *tx_ring;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 9ca3d34..81a517b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -155,29 +155,25 @@ partial_error:
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_sge list;
-	struct ib_recv_wr param;
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
-	list.addr     = priv->rx_ring[id].mapping;
-	list.length   = IPOIB_BUF_SIZE;
-	list.lkey     = priv->mr->lkey;
+	priv->rx_wr.wr_id = id | IPOIB_OP_RECV;
+	priv->rx_sge[0].addr = priv->rx_ring[id].mapping[0];
+	priv->rx_sge[1].addr = priv->rx_ring[id].mapping[1];	
 
-	param.next    = NULL;
-	param.wr_id   = id | IPOIB_OP_RECV;
-	param.sg_list = &list;
-	param.num_sge = 1;
-
-	ret = ib_post_recv(priv->qp, &param, &bad_wr);
+	ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr);
 	if (unlikely(ret)) {
+		if ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:02 pm

I have fixed a bug found in 4K MTU test. Here is the new patch. I am
running stress tonight.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |    7 +--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   93
+++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 6b5e108..faee740 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -145,11 +145,6 @@ struct ipoib_sg_rx_buf {
 	u64		mapping[IPOIB_UD_RX_SG];
 };
 
-struct ipoib_rx_buf {
-	struct sk_buff *skb;
-	u64		mapping;
-};
-
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -299,7 +294,7 @@ struct ipoib_dev_priv {
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
 
-	struct ipoib_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 
 	spinlock_t	     tx_lock;
 	struct ipoib_tx_buf *tx_ring;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 9ca3d34..dfb5cc2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -155,29 +155,25 @@ partial_error:
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_sge list;
-	struct ib_recv_wr param;
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
-	list.addr     = priv->rx_ring[id].mapping;
-	list.length   = IPOIB_BUF_SIZE;
-	list.lkey     = priv->mr->lkey;
+	priv->rx_wr.wr_id = id | IPOIB_OP_RECV;
+	priv->rx_sge[0].addr = priv->rx_ring[id].mapping[0];
+	priv->rx_sge[1].addr = priv->rx_ring[id].mapping[1];	
 
-	param.next    = NULL;
-	param.wr_id   = id | IPOIB_OP_RECV;
-	param.sg_list = &list;
-	param.num_sge = 1;
-
-	ret = ib_post_recv(priv->qp, &param, &bad_wr);
+	ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr);
 	if (unlikely(ret)) {
+		if ...
From: Eli Cohen
Date: Sunday, February 3, 2008 - 3:07 am

Hi Shirley,

you patches cannot be applied cleanly. It seems like your email client
wraps around long lines. Can please check if this is the case?


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:21 pm

Thanks Eli.

Too bad :(. I have struggled with my email for a while. Let me send you
an attachment file for the whole patch built against OFED-1.3-RC3 kernel
here first. I will work on my email client tomorrow. I am too tired
today. Let me know right away if there is any problem.

Shirley

 
From: Eli Cohen
Date: Sunday, February 3, 2008 - 3:25 am

Go to sleep :) I'll get along with the wrapped lines. I am reviewing now your patches against Roland's tree. After that I'll look at the attachements. 

-----Original Message-----
From: Shirley Ma [mailto:mashirle@us.ibm.com] 
Sent: א 03 פברואר 2008 02:22
To: Eli Cohen
Cc: Roland Dreier; general@lists.openfabrics.org
Subject: Re: [UPDATE] [V3] [PATCH 3/3] ib/ipoib: IPoIB-UD RX S/G supportfor 4K MTU

Thanks Eli.

Too bad :(. I have struggled with my email for a while. Let me send you an attachment file for the whole patch built against OFED-1.3-RC3 kernel here first. I will work on my email client tomorrow. I am too tired today. Let me know right away if there is any problem.

Shirley

 
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:30 pm

Thank you so much, Eli! I have done 4 different implementations in the
last few days to make it possible to be included in OFED-1.3 as well as
Distros. I am totally exhausted. If any issues, let me know. It's quiet
possible for me to make mistakes when working like this. I will run
stress test overnight on both intel (mthca 2K mtu) and ppc (ehca 4K
mtu).

Thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Eli Cohen
Date: Sunday, February 3, 2008 - 10:07 am

Hi Shirley,

I have reviewed the patches against Roland's tree and have the following
comments:

1. I see that there are a few if statements added on the fast pass and I
am concerned they might hurt performance of slow UDP messages.
Unfortunately I have not been able to test with an SM defining the
broadcast group to 4K MTU (currently opensm uses 2K).

2. The usage of ipoib_ud_skb_put_frags() seems to be redundant and will
only hurt performance since you would never reuse anything from the old
SKB. This is because the headlen is 40 bytes for GRH and the rest of the
data is in the first (and only) fragment.

3. I think it would be better to allocate room for real data in the head
of the SKB since the tcp/ip stack seems to have less overhead if the
headers are on the linear data.

4. I would consider using a pre-allocated buffer for the GRH of all
received data (not as part of the SKB).

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Sunday, February 3, 2008 - 10:24 am

--0__=08BBF977DFCC9DD48f9e8a93df938690918c08BBF977DFCC9DD4
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: quoted-printable







What kind of parameters you prefer here for me to test this patch? I ca=
n

Comments 2,3,4 can be combined of one if we use a pre-allocated buffer =
for
GRH+IPoIB-head for all IP payload data, right? This is a performance
enhancement if any. I think this could be done after this patch being
checked in. And I will fix it before RC4 out. Do you agree?

Thanks
Shirley=

--0__=08BBF977DFCC9DD48f9e8a93df938690918c08BBF977DFCC9DD4
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
<p><tt>general-bounces@lists.openfabrics.org wrote on 02/03/2008 09:07:=
29 AM:<br>
<br>
> Hi Shirley,<br>
> <br>
> I have reviewed the patches against Roland's tree and have the fol=
lowing<br>
> comments:<br>
</tt><br>
<tt>Appreciate you quick review.</tt><br>
<tt><br>
> 1. I see that there are a few if statements added on the fast pass=
 and I<br>
> am concerned they might hurt performance of slow UDP messages.<br>=

> Unfortunately I have not been able to test with an SM defining the=
<br>
> broadcast group to 4K MTU (currently opensm uses 2K).<br>
What kind of parameters you prefer here for me to test this patch? I ca=
n test it right away when you send me your recommendations.</tt><br>
<tt> <br>
> 2. The usage of ipoib_ud_skb_put_frags() seems to be redundant and=
 will<br>
> only hurt performance since you would never reuse anything from th=
e old<br>
> SKB. This is because the headlen is 40 bytes for GRH and the rest =
of the<br>
> data is in the first (and only) fragment.<br>
The header is 44 bytes, the IP payload data is in the first fragment.</=
tt><br>
<br>
<tt>> 3. I think it would be better to allocate room for real data i=
n the head<br>
> of the SKB since the tcp/ip stack seems to have less ...
From: Shirley Ma
Date: Sunday, February 3, 2008 - 10:36 am

--0__=08BBF977DFF3D9668f9e8a93df938690918c08BBF977DFF3D966
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: quoted-printable






Does your recommendation is the same as Roland's before? I hope it's no=
t,
otherwise, it doesn't work. Since  the first buffer is GRH + IPoIB HEAD=
 =3D
44 bytes not 40 bytes. If we put all skb data in the first frag, then t=
he
IP header is not aligned to 16 bytes. I am copying Roland's comments
regarding this approach:
---------
However, I now realize that my earlier idea of allocating a scratch
buffer for the GRH and just allocating a 4096 byte skb doesn't work,
because the skb_shinfo ends up being allocated along with the buffer,
so trying to allocate a 4096-byte skb will bloat the data past a
single page, which is what we're trying to avoid.

So how about the following?  When using a UD MTU of 4096 with a page
size of 4096, allocate an skb of size 44 for the GRH and ethertype,
and then allocate a single page for the fragment list.  This means
that the IP packet will start nicely 16-byte aligned for free, and all
the bookkeeping is very simple.
-------

thanks
Shirley=

--0__=08BBF977DFF3D9668f9e8a93df938690918c08BBF977DFF3D966
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
<p>Does your recommendation is the same as Roland's before? I hope it's=
 not, otherwise, it doesn't work. Since  the first buffer is GRH + IPoI=
B HEAD =3D 44 bytes not 40 bytes. If we put all skb data in the first f=
rag, then the IP header is not aligned to 16 bytes. I am copying Roland=
's comments regarding this approach:<br>
---------<br>
<tt><font size=3D"4">However, I now realize that my earlier idea of all=
ocating a scratch<br>
because the skb_shinfo ends up being allocated along with the buffer,<b=
r>
so trying to allocate a 4096-byte skb will bloat the data past a<br>
single page, which is what we're trying to avoid.</font></tt><font ...
From: Eli Cohen
Date: Monday, February 4, 2008 - 6:54 am

I actually say lets allocate for example, 128 bytes in the linear data
and then a 4K page. The first 128 bytes will be used for GRH, for the
encapsulation header, and for the IP and TCP/UDP headers. The following
4K fragment will have large enough space to contain the rest of the
packet.

Another thing to consider is use a 3 entries receive scatter list:
1. The first will point to 40 bytes generic buffer (allocated once per
netdevice). All receive buffer will point to this buffer. As Roland
suggested before, this will save us the skb_pull on the GRH.

2. A 128 bytes buffer which comes from the linear part of the SKB - we
can align this buffer to ensure IP is aligned at 16 byte boundary.

3. A 4K page to in the first fragment.
We can then check when the packet is received whether the overall packet
length is small enough such that it did not touch the page. If it did
not we can use this page for the newly posted buffer.

** the above 128 bytes value can be a macro and we can determine what is
the correct value.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Sunday, February 3, 2008 - 10:31 pm

Hello Eli,


Are you saying we also do this for 2K MTU? Otherwise the if condition
check can't not be avoid. And I don't know how much performance gain
from this approach.

Thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Eli Cohen
Date: Monday, February 4, 2008 - 8:44 am

Hi Shirley,

I think it we can do it for 2K MTU is well and avoid all the if . But
first let's get this to ofed 1.3 and then work on the changes.
Unfortunately you'll have to build again your patches on top of the
current ofed tree. Can you do it today?

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Sunday, February 3, 2008 - 11:03 pm

Thanks Eli. I will pull git tree and do it today. I will limit the
need-S/G check in one in fast path.

thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Hal Rosenstock
Date: Monday, February 4, 2008 - 10:39 am

[Empty message]
From: Eli Cohen
Date: Monday, February 4, 2008 - 11:55 am

Thanks. 

-----Original Message-----
From: Hal Rosenstock [mailto:hrosenstock@xsigo.com] 
Sent: ב 04 פברואר 2008 19:39
To: Eli Cohen
Cc: Shirley Ma; Roland Dreier; general@lists.openfabrics.org; sashak@voltaire.com
Subject: Re: [ofa-general] RE: [UPDATE] [V3] [PATCH 3/3] ib/ipoib: IPoIB-UDRX S/G supportfor 4K MTU

Eli,


The default is 2K (mtu=4). You can get opensm to make it 4K if you want as follows:

/etc/ofa/opensm-partitions.conf:
Default=0x7fff,ipoib,mtu=5:ALL=full;

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Sunday, February 3, 2008 - 10:09 am

--1__=08BBF977DFCF7B0A8f9e8a93df938690918c08BBF977DFCF7B0A
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: quoted-printable






Hello Tziporet,

I have done 4 different approaches for IPoIB-UD 4K mtu implmenetation. =
I
have tested and validated three of them and I didn't see any performanc=
e
difference among these implementations for both 2K mtu and 4K mtu. Howe=
ver
I picked up V3 patch since this V3 patch is based on Eli and Roland's
review comment: Keep existing 2K mtu implementation, don't merge IPoIB-=
UD
RX S/G and IPoIB-CM RX S/G. Using 2 buffers for 4K MTU, one buffer is
HEAD=3DGRH+IPoIB-head=3D44 bytes, one buffer is 4K for data when PAGE_S=
IZE is
not bigger enough for 4K MTU+HEAD.

I have tested and validated this patch on both mthca driver intel based=

platform and ehca driver ppc platform. Stress test has passed whole nig=
ht
without any problem on on intel based platform for 2K MTU validation
against 2.6.24 kernel for OFED-1.3-RC3 tree + Pradeep's noSRQ patch.

The attachment is the patch built against OFED-1.3-RC3. One line is nee=
ded
for backporting to other kernel: ++dev->stats vs. ++priv->stats. Please=

review it for OFED-1.3 inclusion. If there is any issues, please let me=

know.

(See attached file: ipoib-4kmtu-rc3-2.6.24.patch)

Thanks
Shirley=

--1__=08BBF977DFCF7B0A8f9e8a93df938690918c08BBF977DFCF7B0A
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
<p>Hello Tziporet,<br>
<br>
I have done 4 different approaches for IPoIB-UD 4K mtu implmenetation. =
I have tested and validated three of them and I didn't see any performa=
nce difference among these implementations for both 2K mtu and 4K mtu. =
However I picked up V3 patch since this V3 patch is based on Eli and Ro=
land's review comment: Keep existing 2K mtu implementation, don't merge=
 IPoIB-UD RX S/G and IPoIB-CM RX S/G. Using 2 buffers for 4K MTU, one b=
uffer is ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 3:36 pm

This is updated patch, this patchset has been tested for both 2K MTU and
4K MTU. Here fixed a typo in 4K MTU.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/infiniband/ulp/ipoib/ipoib.h    |    7 +--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   91
+++++++++++++++++++------------
 2 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 6b5e108..faee740 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -145,11 +145,6 @@ struct ipoib_sg_rx_buf {
 	u64		mapping[IPOIB_UD_RX_SG];
 };
 
-struct ipoib_rx_buf {
-	struct sk_buff *skb;
-	u64		mapping;
-};
-
 struct ipoib_tx_buf {
 	struct sk_buff *skb;
 	u64		mapping;
@@ -299,7 +294,7 @@ struct ipoib_dev_priv {
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
 
-	struct ipoib_rx_buf *rx_ring;
+	struct ipoib_sg_rx_buf *rx_ring;
 
 	spinlock_t	     tx_lock;
 	struct ipoib_tx_buf *tx_ring;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 9ca3d34..81a517b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -155,29 +155,25 @@ partial_error:
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_sge list;
-	struct ib_recv_wr param;
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
-	list.addr     = priv->rx_ring[id].mapping;
-	list.length   = IPOIB_BUF_SIZE;
-	list.lkey     = priv->mr->lkey;
+	priv->rx_wr.wr_id = id | IPOIB_OP_RECV;
+	priv->rx_sge[0].addr = priv->rx_ring[id].mapping[0];
+	priv->rx_sge[1].addr = priv->rx_ring[id].mapping[1];	
 
-	param.next    = NULL;
-	param.wr_id   = id | IPOIB_OP_RECV;
-	param.sg_list = &list;
-	param.num_sge = 1;
-
-	ret = ib_post_recv(priv->qp, &param, &bad_wr);
+	ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr);
 	if (unlikely(ret)) {
+		if ...
From: Or Gerlitz
Date: Wednesday, January 30, 2008 - 11:51 pm

Just to make sure, this patch is a candidate for upstream inclusion 
(which you want also to be present in ofed 1.3) and hence is based 
against Roland's tree, correct?

Or

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 1:58 pm

Yes. I forgot to mention these patches are created against Roland's
2.6.25 tree.

Thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Or Gerlitz
Date: Thursday, January 31, 2008 - 12:04 am

I see, but I want to make sure these patches are the one you want to 
merge into the kernel or its more of a work in progress which you want 
to be included in this experimental testbed called ofed

If its candidate for upstream inclusion, I find it hard to review since 
there is no per patch change-log.

Or.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 2:17 pm

Hello Or,

I will create patch for OFED-1.3-RC3 separately. I wouldn't call it's
experimental code since these APIs have been tested along with IPoIB-CM
Thanks for the advice, I thought one change log was enough. If not, I
will resubmit these patches along with one ling change-log.

Thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Or Gerlitz
Date: Thursday, January 31, 2008 - 12:27 am

I meant to say that ofed is an experimental testbed, this is becoming 
more and more clear to more and more people. I did not address your 

If you think that for each of the patch one line change log is enough 
for a reviewer, let it be, but if I were you, I would validate again 
this assumption. The things is that when you send an RFC, many times 
most of the documentation is in the virtual 0/N patch, but remember that 
  this documentation does not go into the git change-log, so in your 
case since you want this to be merged, you have to work harder and 
document both in the 0/N and also in the 1/N, 2/N ... N/N postings.

Or


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 2:53 pm

That's a good suggestion. It seems ipoib_cm.c has been changed in the
past few hours. I am having trouble to apply them. I am cleaning my
local tree and redo all patches with change-log.

thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Or Gerlitz
Date: Thursday, January 31, 2008 - 12:32 am

Hi Shirley,

Just to make sure, can you confirm that this patch set is not dependent 
on the below patch which is part of ofed but was never submitted to the 
upstream ipoib driver for inclusion?

Also, can you share with what SM have you checked this, did you had to 
patch or run it with non-default param, more, what was the 
configuration, specifically what switch was used and any instrumentation 
you have made to the switch FW, thanks.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 3:18 pm

No, this patchset is not dependent on any OFED patches. It's a pure
patch set for 2.6.25 kernel. I have another version of this patchset
which is built against OFED-1.3-RC2. I will update it to OFED-1.3-RC3. I
hope I can get a quick ack for this patchset from maintainers to agree
with this approach. There are around 1.5-2 times better performance I
can see to use 4K MTU for IPoIB-UD. I will resumit this patchset
tomorrow. You should wait for the new patchset since I have found some

	One of the reason this patchset was not be able to submit earlier was
because of the SW support. I couldn't do a full test without SW supports
4K MTU. The SW firmware needs to be update to allow IPoIB broadcast
group to be able to create 4096 MTU size. There are two requirements to
the switch from SW perspective:
1. SW ports are able to configure to 4096 MTU size.
2. SW default IPoIB broadcast group is able to configure to 4096 MTU
size. The default IPoIB broadcast group MTU can't exceed SW ports MTU
size. 

The way to enable IPoIB 4K MTU is:
1. set SW ports to 4K MTU
2. set SM default IPoIB broadcast group MTU size as 4K. 

	You could disable or enable IPoIB broadcast group when starting SM. If
you don't enable IPoIB default broadcast group when starting SM, the
first node in the subnet will come up and create a broadcast group with
2K MTU for this subnet. It makes sense since the node doesn't know the
whole subnet link MTU size. So it's better to create a default 2K MTU.
If you enable IPoIB default broadcast group when starting SM, if the MTU
size is 2K, then all nodes in the cluster can join the subnet and the
IPoIB subnet link MTU size will be set to 2K. If the broadcast group MTU
size is 4K, then only nodes with 4K MTU can join this IPoIB subnet.

I am not sure that's what you are looking for. Let me know if anything
is unclear.

thanks
Shirley

_______________________________________________
general mailing ...
From: Eli Cohen
Date: Thursday, January 31, 2008 - 2:28 am

Hi shirley,

my comments are:
1. The first patch (1/3) is malformed. I suggest you try to apply it
before sending.
2. Make sure they compile before submitting - patch 1/3 for example
changes ipoib_rx_buf

struct ipoib_rx_buf {
	struct sk_buff *skb;
	u64             mapping;
+       u64             mapping[IPOIB_CM_RX_SG];
 };

but does not change code in the UD flow to align with these changes.
3. Please put an explanation in the changelog.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Wednesday, January 30, 2008 - 10:44 pm

Thanks Eli,

	I found it yesterday night too. I made a mistake, my local tree was not
clean somehow. I have been working too much in the past few days for
OFED-1.3 validation. When you are tired, it's easy to make mistake.
Sorry about that. I sent out an email for checking today's patch
already.
	
	Somehow, I couldn't receive your email on time. Looks like, some of my
emails got warning saying that the email needs to be approved since it
matches spam contents. Do you have any idea why it's blocked?

Thanks
Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Eli Cohen
Date: Thursday, January 31, 2008 - 8:51 am

Maybe the list administrator can help with this issue.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Jeff Becker
Date: Thursday, January 31, 2008 - 11:49 am

Hi.

Every once in a while, perfectly normal mail trips the spam filter. If
this is really a problem, I can look into it. However, training the
filter is kind of a black art so I'm not sure how successful I'll be.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Thursday, January 31, 2008 - 1:31 pm

Hello Roland,

Finally I cleaned up my local git tree and built compile this patchset
against 2.6.25 kernel. The patchset has splitted into three patches, the
patch could be built in sequence separately so it's easy to test.

1/3. Make IPoIB-CM RX S/G APIs generic
2/3. Set IPoIB-UD RX S/G parameters
3/3. Enable IPoIB-UD RX S/G

Please review these patches as soon as possible so we can meet 
OFED-1.3-RC4 schedule.

Appreciate your help on time.

The current IPoIB-UD implementation is limited IPoIB payload size to
2048 through hard coding IPOIB_PACKET_SIZE. The implementation is
designed for kernel PAGE_SIZE equals or greater than 4K. If the 
kernel PAGE_SIZE is equals to 2K, memory buffer allocation will be failed 
when lack of large buffer of memory. However most of the Distros 
does support PAGE_SIZE >= 4K. So this implementation has no problem 
for 2048 payload.This implementation is simple but it prevents HCA 
device who does support 4096 payload from performing, like IBM eHCA2.

This patch allows IPoIB-UD MTU up to 4092 (4K - IPOIB_ENCAP_LEN) when
HCA can support 4K MTU. In this patch, APIs for S/G buffer allocation in
IPoIB-CM mode has been made generic so IPoIB-UD and IPoIB-CM can share
the S/G code. When PAGE_SIZE is equal or greater than IPOIB_UD_BUF_SIZE
+ bytes padding to align IP header, Only one buffer is needed for 4K MTU
buffer allocation, otherwise, two buffers allocation is needed in S/G.

The node IPoIB link MTU size is the minimum value of admin configurable
MTU through ifconfig and IPoIB default broadcast group MTU size. When
Subnet Manager enables default broadcast group during start up, this
subnet IPoIB link MTU will be the value of default broadcast group MTU
size. For any node IB MTU smaller than this value, the node can't join
this IPoIB subnet. For any node IB MTU is greater than this value, the
node will join this IPoIB subnet and this value will be set as its IPOIB
link MTU. If Subnet Manager disables default broadcast group during
start up, ...
From: Roland Dreier
Date: Friday, February 1, 2008 - 4:35 pm

> The current IPoIB-UD implementation is limited IPoIB payload size to
 > 2048 through hard coding IPOIB_PACKET_SIZE. The implementation is
 > designed for kernel PAGE_SIZE equals or greater than 4K. If the kernel
 > PAGE_SIZE is equals to 2K, memory buffer allocation will be failure when
 > lack of large buffer of memory. However most of the Distros does support
 > PAGE_SIZE >= 4K. So this implementation has no problem for 2048 payload.
 > This implementation is simple but it prevents HCA device who does
 > support 4096 payload from performing, like IBM eHCA2.

Not sure I understand this.  Is there any possible configuration of
any architecture where Linux runs where PAGE_SIZE < 4096?

 > This patch allows IPoIB-UD MTU up to 4092 (4K - IPOIB_ENCAP_LEN) when
 > HCA can support 4K MTU. In this patch, APIs for S/G buffer allocation in
 > IPoIB-CM mode has been made generic so IPoIB-UD and IPoIB-CM can share
 > the S/G code.

This approach seems overly complex to me, since it ends up going
through all the CM buffer fragment bookkeeping for the simple UD path.

However, I now realize that my earlier idea of allocating a scratch
buffer for the GRH and just allocating a 4096 byte skb doesn't work,
because the skb_shinfo ends up being allocated along with the buffer,
so trying to allocate a 4096-byte skb will bloat the data past a
single page, which is what we're trying to avoid.

So how about the following?  When using a UD MTU of 4096 with a page
size of 4096, allocate an skb of size 44 for the GRH and ethertype,
and then allocate a single page for the fragment list.  This means
that the IP packet will start nicely 16-byte aligned for free, and all
the bookkeeping is very simple.

 - R.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Friday, February 1, 2008 - 5:36 pm

--0__=08BBF970DF914ACA8f9e8a93df938690918c08BBF970DF914ACA
Content-type: text/plain; charset=US-ASCII





Hello, Roland,


Technially it's a problem, pratically it's not since there is no

No, it's not complex, only one buffer is allocated if the page_size is

It has 44 bytes head with another 4K page size without if condition check
of mtu size and page size. Please look at the patches for detail.

thanks
Shirley
--0__=08BBF970DF914ACA8f9e8a93df938690918c08BBF970DF914ACA
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline

<html><body>
<p><tt>Hello, Roland,</tt><br>
<br>
<tt>Thanks for your quick review.</tt><br>
<br>
<tt>> Not sure I understand this.  Is there any possible configuration of<br>
> any architecture where Linux runs where PAGE_SIZE < 4096?<br>
</tt><br>
<tt>Technially it's a problem, pratically it's not since there is no architecture i can think of has PAGE_SIZE < 4096.</tt><br>
<tt><br>
>  > This patch allows IPoIB-UD MTU up to 4092 (4K - IPOIB_ENCAP_LEN) when<br>
>  > HCA can support 4K MTU. In this patch, APIs for S/G buffer allocation in<br>
>  > IPoIB-CM mode has been made generic so IPoIB-UD and IPoIB-CM can share<br>
>  > the S/G code.<br>
> <br>
> This approach seems overly complex to me, since it ends up going<br>
> through all the CM buffer fragment bookkeeping for the simple UD path.<br>
</tt><br>
<tt>No, it's not complex, only one buffer is allocated if the page_size is bigger enough and if it's 2K MTU. <br>
> So how about the following?  When using a UD MTU of 4096 with a page<br>
> size of 4096, allocate an skb of size 44 for the GRH and ethertype,<br>
> and then allocate a single page for the fragment list.  This means<br>
> that the IP packet will start nicely 16-byte aligned for free, and all<br>
> the bookkeeping is very simple.<br>
</tt><br>
<tt>It has 44 bytes head with another 4K page size without if condition check ...
From: Shirley Ma
Date: Saturday, February 2, 2008 - 9:16 am

--0__=08BBF970DFCB57778f9e8a93df938690918c08BBF970DFCB5777
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: quoted-printable







My patch has been passed the stress test for both PPC and Intel
architechture against OFED-1.3-RC2 bit for a couple days. And I didn't =
see
performance imapct for 2K mtu. But I rethink about your suggestion here=

yesterday night. I can modify my patch to meet your thoughts here by
keeping current implementation of 2K mtu and using if condition check f=
or
the new code. I will submit a new version of patchset today for review.=

Since I only have two days for my patch to be integred into OFED-1.3-RC=
3
for Distros to pick up. I would like to see your ack here for this appr=
oach
as soon as possible. I will compare two different implementation's
performance.

Thanks for your inputs. Appreciate your prompt response.

Thanks
Shirley=

--0__=08BBF970DFCB57778f9e8a93df938690918c08BBF970DFCB5777
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
<p><tt>Hello Roland,</tt><br>
<br>
<tt>> So how about the following?  When using a UD MTU of 4096 =
with a page<br>
> size of 4096, allocate an skb of size 44 for the GRH and ethertype=
,<br>
> and then allocate a single page for the fragment list.  This =
means<br>
> that the IP packet will start nicely 16-byte aligned for free, and=
 all<br>
> the bookkeeping is very simple.<br>
</tt><br>
<tt>My patch has been passed the stress test for both PPC and Intel arc=
hitechture against OFED-1.3-RC2 bit for a couple days. And I didn't see=
 performance imapct for 2K mtu. But I rethink about your suggestion her=
e yesterday night. I can modify my patch to meet your thoughts here by =
keeping current implementation of 2K mtu and using if condition check f=
or the new code. I will submit a new version of patchset today for revi=
ew. Since I only have two days for my patch to ...
From: Roland Dreier
Date: Monday, February 4, 2008 - 9:43 pm

> My patch has been passed the stress test for both PPC and Intel
 > architechture against OFED-1.3-RC2 bit for a couple days. And I didn't see
 > performance imapct for 2K mtu. But I rethink about your suggestion here
 > yesterday night. I can modify my patch to meet your thoughts here by
 > keeping current implementation of 2K mtu and using if condition check for
 > the new code. I will submit a new version of patchset today for review.
 > Since I only have two days for my patch to be integred into OFED-1.3-RC3
 > for Distros to pick up. I would like to see your ack here for this approach
 > as soon as possible. I will compare two different implementation's
 > performance.

Sorry, I've kind of lost the plot here with so many versions of the
patches flying around.  In any case this is not something I am going
to pick up for 2.6.25.  I don't have any control over OFED or distros,
although I would probably hold off on adding a feature at this late
stage of the release process; but the OFED maintainers don't seem to
be as conservative as I am.

 - R.
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Monday, February 4, 2008 - 10:21 pm

--0__=08BBF975DF8E96A58f9e8a93df938690918c08BBF975DF8E96A5
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: quoted-printable






Thanks Roland. We can review this patch for upper stream later. Eli has=

reviewed it. This patch is going to be OFED-1.3. I am testing current
OFED-1.3 Git tree + this patch now. It seems everything works well for =
a
few hours. I will let the test running overnight to see any issues
tomorrow.

Thanks
Shirley=

--0__=08BBF975DF8E96A58f9e8a93df938690918c08BBF975DF8E96A5
Content-type: text/html; charset=US-ASCII
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
<p>Thanks Roland. We can review this patch for upper stream later. Eli =
has reviewed it. This patch is going to be OFED-1.3. I am testing curre=
nt OFED-1.3 Git tree + this patch now. It seems everything works well f=
or a few hours. I will let the test running overnight to see any issues=
 tomorrow.<br>
<br>
Thanks<br>
Shirley</body></html>=

--0__=08BBF975DF8E96A58f9e8a93df938690918c08BBF975DF8E96A5--

From: Or Gerlitz
Date: Tuesday, February 5, 2008 - 4:00 am

Same goes for me on both points: I was totally lost between all the 
posts you have made, and it prevents me from reviewing the patches, 
also, integration to ofed of patches which were --not reviewed-- (nor 
accepted) for upstream inclusion totally unclear to me, to remove doubt 
this policy is present in ofed from day one, so its not specific to your 
patches.

Or.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Saturday, February 2, 2008 - 5:26 am

Hello Roland,

Please review below approach as early as you can. Thanks

This patch is based on Eli and Roland's input. The idea is to keep
IPoIB-UD 2K MTU current implementation and allows IPoIB-UD link MTU up
to 4092 (4K - IPOB_ENCAP_LEN) when HCAs support 4K MTU. For IPoIB-UD 4K
MTU, if the PAGE_SIZE is greater than IB MTU + GRH HEAD + 4, then no S/G
is needed, use IPoIB-UD 2K MTU implementation, if PAGE_SIZE is smaller,
then two buffers need to be used. One of the API IPoIB-CM RX S/G code
has been made more generic, so it can be reused.

This patchset includes three patches:
1. Make one IPoIB-CM RX S/G API generic.
2. Set up IPoIB-UD RX S/G ready.
3. Enable IPoIB-UD RX S/G when needed.

Shirley

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Shirley Ma
Date: Saturday, February 2, 2008 - 11:35 am

Hello Roland,

This patchset is based on your previous review comments. Using current
IPoIB-UD 2K MTU implementation when 4K MTU + GRH head + 4 is less than
PAGE_SIZE, if it's greater, then allocate two buffers: One is for GRH +
IPoIB head, one is for data.

Please compare this approach with V2 patchset and provide the feedback
as soon as you can, so I can concentrated on the test and backport the
one we agree with to OFED-1.3 RC3.

Thanks
Shirley 



_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Previous thread: [ofa-general] Mlcrosoft 0ff!ce2007 for XP|Vlsta 79, Retail 899 (save 819) by Piercarlo Obrien on Tuesday, January 15, 2008 - 10:06 pm. (1 message)

Next thread: [ofa-general] by ד"ר שי סולן on Wednesday, January 16, 2008 - 3:54 am. (1 message)