Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

Previous thread: Problem: screen partially garbled in i830m (regression) by Michael Haas on Wednesday, September 12, 2007 - 8:03 am. (4 messages)

Next thread: RE: boot hangs with SMP, USB storage, and ACPI by tx tox on Wednesday, September 12, 2007 - 9:00 am. (3 messages)
To: LinuxPPC-Dev <linuxppc-dev@...>, LKML <linux-kernel@...>, OF-General <general@...>, Roland Dreier <rolandd@...>, OF-EWG <ewg@...>
Cc: Hoang-Nam Nguyen <hnguyen@...>, Christoph Raisch <raisch@...>, Stefan Roscher <stefan.roscher@...>
Date: Wednesday, September 12, 2007 - 8:39 am

From: Hoang-Nam Nguyen <hnguyen@de.ibm.com>

...because, on virtualized hardware like System p, we can't be sure that the
physical pages behind them are contiguous.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
---

Another patch for 2.6.24 that will apply cleanly on top of my previous
patchset. Please review and apply. Thanks!

drivers/infiniband/hw/ehca/ehca_classes.h | 8 ++--
drivers/infiniband/hw/ehca/ehca_mrmw.c | 82 +++++++++++++++++++++++++----
2 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 206d4eb..c2edd4c 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -99,10 +99,10 @@ struct ehca_sport {
struct ehca_sma_attr saved_attr;
};

-#define HCA_CAP_MR_PGSIZE_4K 1
-#define HCA_CAP_MR_PGSIZE_64K 2
-#define HCA_CAP_MR_PGSIZE_1M 4
-#define HCA_CAP_MR_PGSIZE_16M 8
+#define HCA_CAP_MR_PGSIZE_4K 0x80000000
+#define HCA_CAP_MR_PGSIZE_64K 0x40000000
+#define HCA_CAP_MR_PGSIZE_1M 0x20000000
+#define HCA_CAP_MR_PGSIZE_16M 0x10000000

struct ehca_shca {
struct ib_device ib_device;
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index 4c8f3b3..1bb9d23 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -41,6 +41,8 @@
*/

#include <asm/current.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>

#include <rdma/ib_umem.h>

@@ -51,6 +53,7 @@

#define NUM_CHUNKS(length, chunk_size) \
(((length) + (chunk_size - 1)) / (chunk_size))
+
/* max number of rpages (per hcall register_rpages) */
#define MAX_RPAGES 512

@@ -279,6 +282,52 @@ reg_phys_mr_exit0:
} /* end ehca_reg_phys_mr() */

/*----------------------------------------------------------------------*/
+static int ehca_is_mem_hugetlb(unsigned long addr, unsigned long size)
+{
+ s...

To: Joachim Fenkes <fenkes@...>
Cc: LinuxPPC-Dev <linuxppc-dev@...>, LKML <linux-kernel@...>, OF-General <general@...>, Roland Dreier <rolandd@...>, OF-EWG <ewg@...>, Hoang-Nam Nguyen <hnguyen@...>, Christoph Raisch <raisch@...>, Stefan Roscher <stefan.roscher@...>
Date: Thursday, September 13, 2007 - 12:33 am

> -#define HCA_CAP_MR_PGSIZE_4K 1
> -#define HCA_CAP_MR_PGSIZE_64K 2
> -#define HCA_CAP_MR_PGSIZE_1M 4
> -#define HCA_CAP_MR_PGSIZE_16M 8
> +#define HCA_CAP_MR_PGSIZE_4K 0x80000000
> +#define HCA_CAP_MR_PGSIZE_64K 0x40000000
> +#define HCA_CAP_MR_PGSIZE_1M 0x20000000
> +#define HCA_CAP_MR_PGSIZE_16M 0x10000000

Not sure I understand what this has to do with things... is this an
unrelated fix?

> +static int ehca_is_mem_hugetlb(unsigned long addr, unsigned long size)

This is rather awful -- another call to get_user_pages() to iterate
over all the vmas...

I would suggest extending ib_umem_get() to check the vmas and adding a
member to struct ib_umem to say whether the memory is entirely covered
by hugetlb pages or not.

> + ret = ehca_is_mem_hugetlb(virt, length);
> + switch (ret) {
> + case 0: /* mem is not from hugetlb */
> + hwpage_size = PAGE_SIZE;
> + break;
> + case 1:
> + if (length <= EHCA_MR_PGSIZE4K
> + && PAGE_SIZE == EHCA_MR_PGSIZE4K)
> + hwpage_size = EHCA_MR_PGSIZE4K;
> + else if (length <= EHCA_MR_PGSIZE64K)
> + hwpage_size = EHCA_MR_PGSIZE64K;
> + else if (length <= EHCA_MR_PGSIZE1M)
> + hwpage_size = EHCA_MR_PGSIZE1M;
> + else
> + hwpage_size = EHCA_MR_PGSIZE16M;
> + break;
> + default: /* out of mem */
> + ib_mr = ERR_PTR(-ENOMEM);
> + goto reg_user_mr_exit1;

It seems like it would be better to just assume the memory is not from
a hugetlb is ehca_is_mem_hugetlb() fails its memory allocation and
fall back to the PAGE_SIZE case rather than failing entirely.

Also if someone runs a kernel with 64K pages on a machine where they
end up being simulated from 4K pages, do you have the same issue with
the hypervisor ganging together non-contiguous pages?

- R.
-

To: Roland Dreier <rdreier@...>
Cc: Christoph Raisch <raisch@...>, OF-EWG <ewg@...>, OF-General <general@...>, Hoang-Nam Nguyen <hnguyen@...>, LKML <linux-kernel@...>, LinuxPPC-Dev <linuxppc-dev@...>, Roland Dreier <rolandd@...>, Stefan Roscher <stefan.roscher@...>
Date: Thursday, September 13, 2007 - 10:27 am

If ehca_is_mem_hugetlb() runs out of memory, ehca_reg_mr() is rather
unlikely to get the memory, but it's worth a try, I'll give you that. I'll
make the umem patch work that way.

Joachim
-

To: Roland Dreier <rdreier@...>
Cc: OF-EWG <ewg@...>, OF-General <general@...>, Hoang-Nam Nguyen <HNGUYEN@...>, Joachim Fenkes <FENKES@...>, LKML <linux-kernel@...>, LinuxPPC-Dev <linuxppc-dev@...>, Roland Dreier <rolandd@...>, Stefan Roscher <stefan.roscher@...>
Date: Thursday, September 13, 2007 - 5:49 am

With todays hypervisor and todays pagesizes and todays MMUs
we don't have this problem if eHCA is enabled.

It is difficult to make predictions about the future,

- Christoph R.

-

Previous thread: Problem: screen partially garbled in i830m (regression) by Michael Haas on Wednesday, September 12, 2007 - 8:03 am. (4 messages)

Next thread: RE: boot hangs with SMP, USB storage, and ACPI by tx tox on Wednesday, September 12, 2007 - 9:00 am. (3 messages)