Re: more iommu sg merging fallout

Previous thread: Access MSR functions in userspace? by Hasan Rashid on Tuesday, February 5, 2008 - 8:29 pm. (3 messages)

Next thread: RE: [git patches] net driver updates #2 by Pravin Nanaware on Tuesday, February 5, 2008 - 9:47 pm. (1 message)
From: David Miller
Date: Tuesday, February 5, 2008 - 9:41 pm

The sparc64 change:

    commit fde6a3c82d67f592eb587be4d12222b0ae6d4321
    Author: FUJITA Tomonori <tomof@acm.org>
    Date:   Mon Feb 4 22:28:02 2008 -0800

    iommu sg merging: sparc64: make iommu respect the segment size limits
    
    This patch makes iommu respect segment size limits when merging sg
    lists.
    
    Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
    Cc: Jeff Garzik <jeff@garzik.org>
    Cc: James Bottomley <James.Bottomley@steeleye.com>
    Acked-by: Jens Axboe <jens.axboe@oracle.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

has significant errors and is going to eat people's disks, as it just
nearly did to mine.

Typically what you'll see are NULL pointer derefernces in
dma_4v_map_sg() and dma_4u_map_sg() and then the kernel usually craps
on your superblock very shortly thereafter.

The changeset above modified only prepare_sg() but that is only the
first pass of the SG mapping algorithm of the sparc64 IOMMU layer.

The second pass that fills in the entries depends upon how the first
pass does things.  So if you change the first pass decision making you
have to update the second pass's as well.

That second pass is implemented in fill_sg() (there is a version in
both arch/sparc64/kernel/iommu.c and arch/sparc64/kernel/pci_sun4v.c),
which probably needs new logic as was added to prepare_sg() to handle
dma_get_max_seg_size().
--

From: FUJITA Tomonori
Date: Wednesday, February 6, 2008 - 4:12 pm

On Tue, 05 Feb 2008 20:41:38 -0800 (PST)


PARISC, Alpha, and IA64 IOMMUs use the two-pass algorithm like SPARC
but their first pass decides how to merge sg entires (and stores that
information in the sg entries), then the second pass simpliy follows
it (Hopefully I understand these IOMMUs correctly, or else I break
them too).

I'll try this work again for 2.6.26. The SPARC IOMMUs are the most
complicated and seems that few people tests the -mm SPARC code. So I
guess that I need to find a SPARC box...

Thanks,
--

From: David Miller
Date: Wednesday, February 6, 2008 - 4:18 pm

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>


For now I've removed all of the merging code from the sparc64 IOMMU
support so that other users do not get corrupt filesystems.  It
basically mimicks how the intel-iommu code works, ie. no attempts to
merge anything.

I intend to put merging back in, perhaps something similar to
powerpc's merging logic but without the expensive (in my opinion)
IOMMU allocation every loop.  I think it is better to determine the
segment breaks in one pass, allocate that many IOMMU entries in one
allocation, then fill them all in.

Ideally, we should have some generic code that does all of this.
Then you would only need to test one implementation.

It is definitely doable and increasingly necessary as we have so
many reimplementations of what is essentially identical core code.

--

From: FUJITA Tomonori
Date: Wednesday, February 6, 2008 - 4:53 pm

On Wed, 06 Feb 2008 15:18:55 -0800 (PST)



I thought about asking you if I can modify the SPARC IOMMUs to do
allocation in every loop.

The reason why I need the allocation in every loop is that I also need
to fix the problem that IOMMUs allocate memory areas without
considering a low level driver's segment boundary limits.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-11/msg07616.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-12/msg02286.html

As far as I know, all the IOMMUs except for SPARC allocate a free area
in every loop but if it's too expensive for SPARC, then we need to

Agreed though it's a very hard task.
--

From: David Miller
Date: Wednesday, February 6, 2008 - 5:01 pm

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Ok then what I'll do is adopt some version of powerpc's merging
allocator into the sparc64 code.
--

From: FUJITA Tomonori
Date: Wednesday, February 6, 2008 - 6:38 pm

On Wed, 06 Feb 2008 16:01:47 -0800 (PST)

Great, thanks! SPARC IOMMUs use bitmap for the free area management
like POWER IOMMUs so it could use lib/iommu-helper as POWER does.
--

From: David Miller
Date: Monday, February 11, 2008 - 10:40 pm

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Please look at Linus's current tree, I believe I have things
in a working state, and the DMA mask portions should be easier
to add now.
--

From: FUJITA Tomonori
Date: Friday, February 15, 2008 - 11:03 pm

Sorry for the late response,

On Mon, 11 Feb 2008 21:40:36 -0800 (PST)

Thanks! It looks great.

iommu->page_table_map_base is on IO_PAGE_SIZE boundary? If so, the
following patch works, I think (sorry, it's only compile tested
again).

Thanks,

=
From 91af83802c30d071f98444846e85310a8d58ab3d Mon Sep 17 00:00:00 2001
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Sat, 16 Feb 2008 14:29:02 +0900
Subject: [PATCH] sparc64: make IOMMU code respect the segment boundary limits

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 arch/sparc64/kernel/iommu.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/sparc64/kernel/iommu.c b/arch/sparc64/kernel/iommu.c
index d3276eb..ffefbf7 100644
--- a/arch/sparc64/kernel/iommu.c
+++ b/arch/sparc64/kernel/iommu.c
@@ -134,7 +134,8 @@ unsigned long iommu_range_alloc(struct device *dev,
 	else
 		boundary_size = ALIGN(1UL << 32, 1 << IO_PAGE_SHIFT);
 
-	n = iommu_area_alloc(arena->map, limit, start, npages, 0,
+	n = iommu_area_alloc(arena->map, limit, start, npages,
+			     iommu->page_table_map_base >> IO_PAGE_SHIFT,
 			     boundary_size >> IO_PAGE_SHIFT, 0);
 	if (n == -1) {
 		if (likely(pass < 1)) {
-- 
1.5.3.4


--

From: David Miller
Date: Monday, February 18, 2008 - 12:41 am

From: FUJITA Tomonori <tomof@acm.org>

Looks good, but I think it will break sound for some ALI chips.

Please see arch/sparc64/kernel/pci.c:ali_sound_dma_hack()
and it's caller pci_dma_supported().

--

From: FUJITA Tomonori
Date: Tuesday, February 19, 2008 - 3:57 am

On Sun, 17 Feb 2008 23:41:42 -0800 (PST)

Could you explain the problem a little more?

The shift argument is only used as an offset when iommu-helper decides
whether a memory area (index plus npages) spanning LLD's segment
boudnary size or not.

For example, if a device's segment boudary size is 64K, the helper see
the following value is larger than 64K or not:

((the offset + index of the IOMMU table) ((64K / 8K) - 1) + npages
--

From: David Miller
Date: Wednesday, February 20, 2008 - 11:56 pm

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Sorry, I was thinking of a different problem (for when I add
dma mask support to this code).

I will apply your patch, thanks!
--

Previous thread: Access MSR functions in userspace? by Hasan Rashid on Tuesday, February 5, 2008 - 8:29 pm. (3 messages)

Next thread: RE: [git patches] net driver updates #2 by Pravin Nanaware on Tuesday, February 5, 2008 - 9:47 pm. (1 message)