This patchset adds per-device dma_mapping_ops support for CONFIG_X86_64 like POWER architecture does. This change enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU [1]. It also would be helpful to handle KVM PCI passthrough. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. [1] http://lkml.org/lkml/2008/5/8/423 --
dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use different dma_mapping_error functions. So dma_mapping_error needs the device argument. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/arm/common/dmabounce.c | 2 +- arch/ia64/hp/common/hwsw_iommu.c | 5 ++- arch/ia64/hp/common/sba_iommu.c | 2 +- arch/ia64/sn/pci/pci_dma.c | 2 +- arch/mips/mm/dma-default.c | 2 +- arch/powerpc/platforms/cell/celleb_scc_pciex.c | 2 +- arch/powerpc/platforms/cell/spider-pci.c | 2 +- arch/powerpc/platforms/iseries/mf.c | 2 +- arch/x86/kernel/pci-gart_64.c | 1 - arch/x86/kernel/pci-nommu.c | 2 +- drivers/firewire/fw-iso.c | 2 +- drivers/firewire/fw-ohci.c | 2 +- drivers/firewire/fw-sbp2.c | 8 +++--- drivers/infiniband/hw/ipath/ipath_sdma.c | 2 +- drivers/infiniband/hw/ipath/ipath_user_sdma.c | 6 ++-- drivers/infiniband/hw/mthca/mthca_eq.c | 2 +- drivers/media/dvb/pluto2/pluto2.c | 2 +- drivers/net/arm/ep93xx_eth.c | 4 +- drivers/net/b44.c | 22 +++++++++------- drivers/net/bnx2x.c | 2 +- drivers/net/e100.c | 2 +- drivers/net/e1000e/ethtool.c | 4 +- drivers/net/e1000e/netdev.c | 11 ++++--- drivers/net/ibmveth.c | 32 ++++++++++++----------- drivers/net/iseries_veth.c | 4 +- drivers/net/mlx4/eq.c | 2 +- drivers/net/pasemi_mac.c ...
This adds per-device dma_mapping_ops support for CONFIG_X86_64.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If
it's NULL, the system-wide dma_ops pointer is used as before.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
arch/x86/kernel/pci-calgary_64.c | 2 +-
arch/x86/kernel/pci-dma.c | 2 +-
arch/x86/kernel/pci-gart_64.c | 2 +-
arch/x86/kernel/pci-nommu.c | 14 +-----
arch/x86/kernel/pci-swiotlb_64.c | 2 +-
include/asm-x86/device.h | 3 +
include/asm-x86/dma-mapping.h | 95 ++++++++++++++++++++++++++------------
7 files changed, 74 insertions(+), 46 deletions(-)
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index e28ec49..dca9a82 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -541,7 +541,7 @@ error:
return ret;
}
-static const struct dma_mapping_ops calgary_dma_ops = {
+static struct dma_mapping_ops calgary_dma_ops = {
.alloc_coherent = calgary_alloc_coherent,
.map_single = calgary_map_single,
.unmap_single = calgary_unmap_single,
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 0c37f16..33ded93 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -11,7 +11,7 @@
int forbid_dac __read_mostly;
EXPORT_SYMBOL(forbid_dac);
-const struct dma_mapping_ops *dma_ops;
+struct dma_mapping_ops *dma_ops;
EXPORT_SYMBOL(dma_ops);
static int iommu_sac_force __read_mostly;
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index e1a6954..8469f76 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -621,7 +621,7 @@ static __init int init_k8_gatt(struct agp_kern_info *info)
extern int agp_amd64_init(void);
-static const struct dma_mapping_ops gart_dma_ops = {
+static struct dma_mapping_ops gart_dma_ops = {
...Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cheers, Muli --
On Mon, May 12, 2008 at 11:04 PM, FUJITA Tomonori do you have other patch to assign value to dma_ops for every device? it is always NULL at this time. YH --
On Thu, 15 May 2008 13:45:58 -0700 Not yet. As I said, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created. It enables IOMMUs to set up an appropriate dma_mapping_ops per device: http://lkml.org/lkml/2008/5/14/519 But I wanted to make sure whether per-device dma_ops is useful for everyone. --
Seems to be missing updates to Documentation/DMA-API.txt ?? --- ~Randy --
On Tue, 13 May 2008 08:39:01 -0700 Oops, I'll post an updated version against -mm soon. Thanks, --
Awesome! Much needed, thank you for doing this. Cheers, Muli --
On Wed, 14 May 2008 08:49:24 +0300 No problem. Well, as you know, it's just a base. We need more work to solve the problems on the top of this. I'd like to have a mechanism to register a hook called when a new pci (or dma capable) device is created. It enables IOMMUs to set up an appropriate dma_mapping_ops per device. It could also enables us to simplify the IOMMUs code to initilize devices at startup (for exmple, intel-iommu checks all the pci devices and creates a domain per device if necessary). I'll post an updated version against -mm. If people seems to be fine with per-device dma_mapping_ops, then I'll work on further issues. --
I wanted to try and create a test version of this, at least for the
calgary side, but ran into an error with this patch set:
Fusion MPT SAS Host driver 3.04.06
ACPI: PCI Interrupt 0000:34:00.0[A] -> GSI 142 (level, low) -> IRQ 142
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1078 C1: Capabilities={Initiator}
mptbase: ioc0: PCI-MSI enabled
Calgary: DMA error on CalIOC2 PHB 0x33
Calgary: 0x02000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK
Calgary: 0x00000000@0x810 0xfee0c000@0x820 0x00000000@0x830
0x00000000@0x840 0x03804a00@0x850 0x00000000@0x860 0x00000000@0x870
Calgary: 0x00000000@0xcb0
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY!
mptbase: ioc0: WARNING - Cannot recover rc = -1!
mptbase: ioc0: WARNING - Firmware Reload FAILED!
Clocksource tsc unstable (delta = 48333779773 ns)
Not sure what is causing this error, but will keep digging. Any ideas?
--Alexis
--
On Wed, 14 May 2008 19:00:13 -0700 Hmm, you don't hit this without my patches, right? --
oops, you are right, this failure was due to a .config change not your patch set (MSI not behaving on MegaRAID). Sorry for the false alarm. Your patches work as expected. --
Calgary caught an errant DMA. If it was working before, we are probably mis-programming the TCE tables. The first thing to check is whether all dma_map_xxx calls by mptbase end up calling the Calgary dma_ops. Cheers, Muli --
That's great---it will be needed to support hot-plugging of devices on I'm not sure if it will be easier than the current "loop over all devices" method, but since we will need it for hotplug anyway, we Excellent. Cheers, Muli --
This makes it basically impossible to do stack ops, which some people have been doing. -Andi --
On Thu, 15 May 2008 11:01:09 +0200 Seems that I misunderstand what those people want. What those people want to do and how the stack ops achieve it? Or can you tell me where their patches are? --
I've seen it in two cases: first was for KVM IO bypass and the other was a (unfinished) patch to support the NoDMA bitmaps on some systems. In this case you really want to do a wrapper around the existing ops and extend the mapping. That worked fine by just replacing the global pointer, but will be quite hard in your set up. -Andi --
On Thu, 15 May 2008 11:30:12 +0200 Thanks, I thought that KVM people want to do it per device (in the first case). So with my patchse, they can replace the dma_ops pointer in dev_archdata with what they want. --
But where would they save the original pointer? -Andi --
See my other mail on this subject, we don't need the original pointer, we just use stackable ops as a poor man's replacement for per-device ops. Cheers, Muli --
On Thu, 15 May 2008 12:48:04 +0200 Yeah, we need an extra mechanism for that but it's same for the system-wide dma_ops pointer (i.e. without my patches), isn't it? I'm still not sure how this patchset make it impossible to have stack dma_ops. These people need per-device dma_ops and we can do stack per-device dma_ops? --
Anybody who does stack ops in your scheme would need to hook into new device creation and an own per device saving pointer. Also there are livetime issues when to wrap. It's certainly possible, but likely complicated -Andi --
On Fri, 16 May 2008 07:24:33 +0200 But is that what those people want, setting up dma_ops per device? I think that creating a hook is not difficult and we could put a Hmm, let's see what those people think about on this. I'll repost a patchset for -mm with CC'ing those people. --
That's my understanding too. We use stackable ops as a poor man's replacement for per-device ops (depending on what kind of device it is, call the original ops or our pvdma ops). Cheers, Muli --
But in the KVM case you still need to support the underlying ops too, e.g. in case of bouncing through swiotlb needed -Andi --
Good point, although "secondary" DMA-ops should only be needed in rare cases (the only one I can think of is bouncing through swiotlb for a pass-through device which has a limited DMA mask), whereas per-device ops are needed for every pass-through device. Cheers, Muli --
Even rare cases need to be handled. In the nodma bitmap case it was the same always btw -Andi --
Not sure what you mean? Cheers, Muli --
... which is not to say of course we can't do both, dev_archdata could have a list of struct dma_mapping_ops instead of (or in addition to) a struct dma_mapping_ops pointer. If space was at a premium we could even multiplex both on the same pointer by using one of the low bits to select between them. Cheers, Muli --
Yes an explanation of stack ops, or a pointer to a patch set that implements them would be very useful. --
