Note that this isn't for the next merge window. Seems that it works but I need more testings and cleanups (and need to fix ia64 code). = From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Subject: [PATCH] swiotlb: enlarge iotlb buffer on demand This enables swiotlb to enlarg iotlb (bounce) buffer on demand. On x86_64, swiotlb is enabled only when more than 4GB memory is available. swiotlb uses 64MB memory by default. 64MB is not so precious in this case, I suppose. The problem is that it's likely that x86_64 always needs to enable swiotlb due to hotplug memory support. 64MB could be very precious. swiotlb iotlb buffer is physically continuous (64MB by default). With this patch, iotlb buffer doesn't need to be physically continuous. So swiotlb can allocate iotlb buffer on demand. Currently, swiotlb allocates 256KB at a time. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- lib/swiotlb.c | 186 ++++++++++++++++++++++++++++++++++++++++++--------------- 1 files changed, 138 insertions(+), 48 deletions(-) diff --git a/lib/swiotlb.c b/lib/swiotlb.c index a009055..e2c64ab 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -65,11 +65,14 @@ int swiotlb_force; * sync_single_*, to see if the memory was in fact allocated by this * API. */ -static char *io_tlb_start, *io_tlb_end; +static char **__io_tlb_start; + +static int alloc_io_tlb_chunks; /* - * The number of IO TLB blocks (in groups of 64) betweeen io_tlb_start and - * io_tlb_end. This is command line adjustable via setup_io_tlb_npages. + * The number of IO TLB blocks (in groups of 64) betweeen + * io_tlb_start. This is command line adjustable via + * setup_io_tlb_npages. */ static unsigned long io_tlb_nslabs; @@ -130,11 +133,11 @@ void swiotlb_print_info(void) unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT; phys_addr_t pstart, pend; - pstart = virt_to_phys(io_tlb_start); - pend = virt_to_phys(io_tlb_end); + pstart = virt_to_phys(__io_tlb_start[0]); + pend ...
I was hoping you would base this on:
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6.git stable/swiotlb-0.8.3
This spinlock- would be better to replace it with a r/w spinlock?
I am asking this b/c this routine 'is_swiotlb_buffer' ends up being
called during unmap/sync. The unmap part I think is not such a big deal
if it takes a bit of time, but the sync part.. Well, looking at the list
of DMA pages I see the e1000e/e100/igb allocate would it make sense to speed this
<tangent>
Back in the past we spoke about expanding the SWIOTLB to make it
possible for other SWIOTLB users to do their own virt->phys. With this
I can see this still working, if:
- We exported the __io_tlb_start, so that the other library can expand
if required.
- Ditto for the spinlock: io_tlb_lock
- And also the alloc_io_tlb_chunks
- And some way of running the SWIOTLB library after the expand_io_tlb
call - so that it can make the new chunk physically contingous.
Perhaps it might be then time to revisit a registration mechanism?
This also might solve the problem that hpa has with the Xen-SWIOTLB
mucking around in pci-dma.c file.
The rough idea is to have a structure for the following routines:
- int (*detect)(void);
- void (*init)(void);
- int (*is_swiotlb)(dma_addr_t dev_addr, struct swiotlb_data *);
- int (*expand)(struct swiotlb_data *);
The 'detect' would be used in the 'pci_swiotlb_detect' as:
int __init pci_swiotlb_detect(void) {
return iotlb->detect();
}
and the 'init' similary for the 'pci_swiotlb_init'.
The 'is_swiotlb' and 'new_iotlb' would do what they need to do.
That is 'is_swiotlb' would determine if the bus address sits
within the IOTLB chunks. The 'expand' would do what 'expand_io_tlb'
does. But would use whatever neccessary mechanism to make sure it would
be contingous under the architecture it is running.
And the 'struct swiotlb_data' would contain the all of the
data to make decisiosn. This would include the ...I took your patch and was trying to fit it over the
You should also initialize the __io_tlb_start array first:
__io_tlb_start = __get_free_pages(GFP_KERNEL,
get_order((io_tlb_nslabs / IO_TLB_SEGSIZE) * sizeof(char *)));
if (!__io_tlb_start)
That isn't exactly right I think. You are de-allocating the first array,
which size is determined by 'order'. Probably 10. And you not freeing
I think you need this:
free_bootmem_late(__pa(__io_tlb_start[0]),
IO_TLB_SEGSIZE << IO_TLB_SHIFT);
free_bootmem_late(__pa(__io_tlb_start),
(io_tlb_nslabs / IO_TLB_SEGSIZE) * sizeof(char *));
--
On Fri, 30 Jul 2010 21:07:06 -0400 Yeah, I know. As I wrote, this patchset breaks IA64. I really merge to swiotlb's two memory allocator mechanisms (swiotlb_init_with_default_size and swiotlb_late_init_with_default_size). I need to look at the x86 memory boot code after memblock surgery finishes. And as you know, I've not fixed the error path and swiotlb_free. I'll do later if people are not against swiotlb dynamic allocation. Thanks, --
It looks to me like it would be a good patch. I am curious about the handling of the -ENOMEM stage. Naturally we would return an error the device - are the most common ones (ahci, r8169, ata_piix - those that are DMA_32) equipped to deal with unavailable memory? --
On Mon, 2 Aug 2010 09:40:08 -0400 libata does dma mapping for ata drivers. It can handle mapping errors. Looks like r8169 can't handle errors. All drivers should handle mapping errors because IOMMUs are pretty common now. I think that drivers that vendor people are serious about can handle mapping errors. --
