[ofa-general] [PATCH] IB/mthca: Clear ICM pages before handing to FW

Previous thread: [ofa-general] ***SPAM*** [PATCH 3/3 v2] ib/ipoib: blocking mcast loopback ipoib packets by Ron Livne on Sunday, June 22, 2008 - 9:37 am. (1 message)

Next thread: [ofa-general] Luxury by Melvin Finley on Sunday, June 22, 2008 - 9:25 am. (1 message)
From: Eli Cohen
Date: Sunday, June 22, 2008 - 8:55 am

>From 086179f2ab6950b339dbb4e66f749a7c87e9a5be Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli@mellanox.co.il>
Date: Fri, 20 Jun 2008 15:36:44 +0300
Subject: [PATCH] IB/mthca: Clear ICM pages before handing to FW

Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared -- this patch will clear any ICM page
passed to the FW. It is essential to use this patch since there are
many cards already in use which use FW with this bug. At some time
when we're sure all systems are updated with the new FW (yet to be
released), we can remove this patch.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
---

This fixes the bug reported by Arthur from SGI here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html


 drivers/infiniband/hw/mthca/mthca_memfree.c |   29 +++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 9e77ba9..9596c9b 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -103,6 +103,33 @@ void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent)
 	kfree(icm);
 }
 
+static void clear_pages(struct page *page, int npages)
+{
+	struct page **page_arr;
+	int i;
+	void *buf;
+
+	page_arr = kmalloc(npages * sizeof *page_arr, GFP_KERNEL);
+	if (!page_arr) {
+		printk(KERN_WARNING "page array alloc failure\n");
+		return;
+	}
+
+	for (i = 0; i < npages; ++i)
+		page_arr[i] = page++;
+
+	buf = vmap(page_arr, npages, VM_MAP, PAGE_KERNEL);
+	if (!buf) {
+		printk(KERN_WARNING "vmap failed\n");
+		goto exit;
+	}
+	memset(buf, 0, PAGE_SIZE * npages);
+	vunmap(buf);
+
+exit:
+	kfree(page_arr);
+}
+
 static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t gfp_mask)
 {
 	struct page *page;
@@ -111,6 +138,8 @@ static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t ...
From: Roland Dreier
Date: Sunday, June 22, 2008 - 9:11 am

> Current memfree FW has a bug which in some cases, assumes that ICM
 > pages passed to it are cleared -- this patch will clear any ICM page
 > passed to the FW. It is essential to use this patch since there are
 > many cards already in use which use FW with this bug. At some time
 > when we're sure all systems are updated with the new FW (yet to be
 > released), we can remove this patch.

realistically I think we leave this workaround forever.  it's not like
it costs very much.

Does mlx4 have any similar problem?

 > ---

 > This fixes the bug reported by Arthur from SGI here:
 > http://lists.openfabrics.org/pipermail/general/2008-May/050026.html

This should be in the changelog itself... there's no reason to throw
away this useful information when merging the patch.


Anyway, thanks for debugging this... one question, why not just use the
following (completely untested) change instead?  Does this not work?

At least using clear_highpage() instead of vmap() by hand seems much
simpler too if __GFP_ZERO isn't usable for some reason.

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 9e77ba9..4fb3c85 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -107,7 +107,7 @@ static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t gfp_m
 {
 	struct page *page;
 
-	page = alloc_pages(gfp_mask, order);
+	page = alloc_pages(gfp_mask | __GFP_ZERO, order);
 	if (!page)
 		return -ENOMEM;
 
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Eli Cohen
Date: Sunday, June 22, 2008 - 11:14 am

I was thinking about applications which create and destroy HCA

Looks to me like using __GFP_ZERO is the cleanest approach. I like
less clear_highpage()  since it uses kmap_atomic() which could fail.
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Roland Dreier
Date: Sunday, June 22, 2008 - 12:57 pm

> I was thinking about applications which create and destroy HCA
 > resources at a high rate which might be affected.

I would expect all the other costs of mapping pages into/out of the ICM
to be large enough that the cost of zeroing a page is not too big a
deal.  But if there is anyone who cares we can always make this
__GFP_ZERO flag conditional on FW version I guess.

 > Looks to me like using __GFP_ZERO is the cleanest approach. I like
 > less clear_highpage()  since it uses kmap_atomic() which could fail.

To the best of my knowledge, kmap_atomic() cannot fail -- and if you
think about it, what could make it fail?  The whole point of
kmap_atomic() is that there is a per-cpu pre-reserved slot to map the
memory at for highmem pages, so it has to always work.  As far as I can
see, __GFP_ZERO allocations use clear_highpage() internally anyway, so
it ends up being the same thing.

Since just adding __GFP_ZERO is so much simpler, I'll just commit the
patch below and send it to Linus in a day or two if it seems OK to you:

commit 801d1ad7b6bdb3418a462c6b4950aee56dbac940
Author: Eli Cohen <eli@mellanox.co.il>
Date:   Sun Jun 22 12:56:58 2008 -0700

    IB/mthca: Clear ICM pages before handing to FW
    
    Current memfree FW has a bug which in some cases, assumes that ICM
    pages passed to it are cleared.  This patch uses __GFP_ZERO to
    allocate all ICM pages passed to the FW.  Once firmware with a fix is
    released, we can make the workaround conditional on firmware version.
    
    This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
    http://lists.openfabrics.org/pipermail/general/2008-May/050026.html
    
    Signed-off-by: Eli Cohen <eli@mellanox.co.il>
    
    [ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
      ICM memory and memset()ing it to 0. - Roland ]
    
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c ...
From: Eli Cohen
Date: Sunday, June 22, 2008 - 11:35 pm

Looking at the implementation of kmap_atomic() quoted bellow:

void *page_address(struct page *page)
{
        unsigned long flags;
        void *ret;
        struct page_address_slot *pas;

        if (!PageHighMem(page))
                return lowmem_page_address(page);

        pas = page_slot(page);
        ret = NULL;
        spin_lock_irqsave(&pas->lock, flags);
        if (!list_empty(&pas->lh)) {
                struct page_address_map *pam;

                list_for_each_entry(pam, &pas->lh, list) {
                        if (pam->page == page) {
                                ret = pam->virtual;
                                goto done;
                        }
                }
        }
done:
        spin_unlock_irqrestore(&pas->lock, flags);
        return ret;
}

static inline void *kmap_atomic(struct page *page, enum km_type idx)
{
        pagefault_disable();
        return page_address(page);
}

I can't see page_address doing anything actively to map the page. It
just searches for the page and returns it's mapping. But I'm probably

Of course. Go ahead and send it.
Thanks.
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
From: Pete Wyckoff
Date: Monday, June 23, 2008 - 7:27 am

You're looking at the no-highmem version.  Check the one in
arch/x86/mm/highmem_32.c for a more interesting example.

		-- Pete
_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Previous thread: [ofa-general] ***SPAM*** [PATCH 3/3 v2] ib/ipoib: blocking mcast loopback ipoib packets by Ron Livne on Sunday, June 22, 2008 - 9:37 am. (1 message)

Next thread: [ofa-general] Luxury by Melvin Finley on Sunday, June 22, 2008 - 9:25 am. (1 message)