Hi all, the block layer usually tries to merge s/g segments if consecutive segments combined fit into the queue's max_segment_size. When such a scatter gather list is DMA-mapped, can it happen that an IOMMU collapses the elements even further, so that sg_dma_len() of a DMA-mapped s/g segment exceeds max_segment_size? As I understood some discussions in the past, this could indeed happen, which is a nuisance. But I may have misunderstood something, or something may have changed in the meantime... -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ --
On Fri, Aug 8, 2008 at 12:44 PM, Stefan Richter I don't see how. The IOMMU code only collapses the "physical" mappings and does not add new elements to the SG list. ergo sg_dma_len() shouldn't change. --
Well, I'm just doing my homework and am tracking down the various dma_map_sg implementations. Here is what PPC does: http://lxr.linux.no/linux+v2.6.26/arch/powerpc/kernel/iommu.c#L270 It looks at dma_get_max_seg_size(dev); and then merges according to it. That's all nice and well, but in my case (FireWire storage protocol a.k.a. SBP-2, which is basically remote DMA), the max_segment_size of the PCI device is different from the size limit of the protocol. We currently have to deconstruct such merges in the sbp2 drivers again: http://lxr.linux.no/linux+v2.6.26/drivers/firewire/fw-sbp2.c#L1384 Either I keep it that way, or I let the protocol driver manipulate the FireWire controller's dev->dma_parms->max_segment_size (via dma_set_max_seg_size() of course), which is not entirely correct. I first wanted to use blk_queue_max_segment_size(), but that falls short with dma_map_sg implementations like the above are at work. -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ --
On Fri, 08 Aug 2008 23:21:28 +0200 Why is it not correct? Device drivers can set Yeah, we store the same value at two places (request_queue and device) though it's not nice. --
FireWire is a multi-protocol bus. SBP-2 is just one of many quite different protocols. SBP-2 targets read or write the initiators memory buffers via remote DMA. These buffer may be exposed as s/g lists to the target. The protocol limits these s/g lists to up to 65535 elements of up to 65535 bytes size each. FireWire controllers on the other hand get their maximum segment size set to 65536 by the PCI subsystem. (All FireWire controllers supported by mainline Linux are PCI or PCIe devices.) In case of the drivers/firewire/ stack, the SBP-2 driver is currently the only one which uses dma_map_sg. In case of the drivers/ieee1394/ stack, also the drivers for isochronous protocols, including userspace drivers via raw1394, use dma_map_sg. So if the SBP-2 driver manipulated the controller device's max_segment_size, it would influence the DMA mappings of the other protocols. It wouldn't be a big deal; the isochronous mappings could only be collapsed to chunks of at most 15 pages instead of 16 pages. However, the mapping deconstructing code in the SBP-2 drivers is not a big deal either. -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ --
On Fri, 08 Aug 2008 23:58:23 +0200 IMHO, setting the safest value to max_segment_size is fine. After all, --
1. We don't need to round the SBP-2 segment size limit down to a
multiple of 4 kB (0xffff -> 0xf000). It is only necessary to
ensure quadlet alignment (0xffff -> 0xfffc).
2. Use dma_set_max_seg_size() to tell the DMA mapping infrastructure
and the block IO layer about the restriction. This way we can
remove the size checks and segment splitting in the queuecommand
path.
This assumes that no other code in the ieee1394 stack uses
dma_map_sg() with conflicting requirements. It furthermore assumes
that the controller device's platform actually allows us to set the
segment size to our liking. Assert the latter with a BUG_ON().
3. Also use blk_queue_max_segment_size() to tell the block IO layer
about it. It cannot know it because our scsi_add_host() does not
point to the FireWire controller's device.
We can also uniformly use dma_map_sg() for the single segment case just
like for the multi segment case, to further simplify the code.
Also clean up how the page table is converted to big endian.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Applicable after
[PATCH update] ieee1394: sbp2: stricter dma_sync
[PATCH] ieee1394: sbp2: check for DMA mapping failures
from a few minutes ago.
drivers/ieee1394/sbp2.c | 106 +++++++++++++---------------------------
drivers/ieee1394/sbp2.h | 37 +++++--------
2 files changed, 52 insertions(+), 91 deletions(-)
Index: linux-2.6.27-rc2/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.h
+++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.h
@@ -139,13 +139,10 @@ struct sbp2_logout_orb {
u32 status_fifo_lo;
} __attribute__((packed));
-#define PAGE_TABLE_SET_SEGMENT_BASE_HI(v) ((v) & 0xffff)
-#define PAGE_TABLE_SET_SEGMENT_LENGTH(v) (((v) & 0xffff) << 16)
-
struct sbp2_unrestricted_page_table {
- u32 length_segment_base_hi;
- u32 segment_base_lo;
-} ...1. We don't need to round the SBP-2 segment size limit down to a
multiple of 4 kB (0xffff -> 0xf000). It is only necessary to
ensure quadlet alignment (0xffff -> 0xfffc).
2. Use dma_set_max_seg_size() to tell the DMA mapping infrastructure
and the block IO layer about the restriction. This way we can
remove the size checks and segment splitting in the queuecommand
path.
This assumes that no other code in the firewire stack uses
dma_map_sg() with conflicting requirements. It furthermore assumes
that the controller device's platform actually allows us to set the
segment size to our liking. Assert the latter with a BUG_ON().
3. Also use blk_queue_max_segment_size() to tell the block IO layer
about it. It cannot know it because our scsi_add_host() does not
point to the FireWire controller's device.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
drivers/firewire/fw-sbp2.c | 68 +++++++++++++++++--------------------
1 file changed, 33 insertions(+), 35 deletions(-)
Index: linux-2.6.27-rc2/drivers/firewire/fw-sbp2.c
===================================================================
--- linux-2.6.27-rc2.orig/drivers/firewire/fw-sbp2.c
+++ linux-2.6.27-rc2/drivers/firewire/fw-sbp2.c
@@ -29,6 +29,7 @@
*/
#include <linux/blkdev.h>
+#include <linux/bug.h>
#include <linux/delay.h>
#include <linux/device.h>
#include <linux/dma-mapping.h>
@@ -181,10 +182,16 @@ struct sbp2_target {
#define SBP2_MAX_LOGIN_ORB_TIMEOUT 40000U /* Timeout in ms */
#define SBP2_ORB_TIMEOUT 2000U /* Timeout in ms */
#define SBP2_ORB_NULL 0x80000000
-#define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000
#define SBP2_RETRY_LIMIT 0xf /* 15 retries */
#define SBP2_CYCLE_LIMIT (0xc8 << 12) /* 200 125us cycles */
+/*
+ * The default maximum s/g segment size of a FireWire controller is
+ * usually 0x10000, but SBP-2 only allows 0xffff. Since buffers have to
+ * be quadlet-aligned, we set the length limit to 0xffff & ~3.
+ */
+#define ...On Sat, Aug 9, 2008 at 11:21 AM, Stefan Richter I didn't check the rest of the driver - but it would be good if it explicitly called dma_set_mask() or pci_dma_set_mask() with a 32-bit mask value. Most drivers assume 32-bit and that's why I point this out. The rest looks ok at first glance. thanks, --
I thought about this and currently think that I should not do this. There is the API to set the mask, but there is no API to _decrease_ the mask only. The SBP-2 protocol driver should not set DMA_32BIT_MASK if any other driver already set for example DMA_31BIT_MASK for whatever reason. For now, this is no issue. FireWire low-level drivers exist only for controllers with 32 bit local bus addressing capability. I am sure that somebody will remember to modify the SBP-2 driver(s) if there will ever be a controller type and a driver for it which can address more than 4 GB. On the other hand, we also currently have no reason to set a smaller mask. We recently discovered a chip bug which requires DMA_31BIT_MASK for a special mode of data reception (only for cache-coherent DMA though), but we now work around this chip bug by simply falling back to an alternative mode. -- Stefan Richter -=====-==--- =--- -==-= http://arcgraph.de/sr/ --
On Sat, Aug 9, 2008 at 11:20 AM, Stefan Richter While this code will probably work correctly, I believe if the sg_dma_len() returns 0, one has walked off the end of the "bus address" list. pci_map_sg() returns how many DMA addr/len pairs are used and that count could (should?) be used when pulling the dma_len/addr pairs out of the sg list. Effectively, the SG list is really two lists with seperate counts: the original list with virtual addresses/len/count and the "DMA mapped" list with it's own count of addresses/len pairs. When no IOMMU is used those lists are 1:1. hth, --
On Tue, 12 Aug 2008 10:04:14 -0700
Yeah, from a quick look, seems that this patch wrongly handles
sg_count.
This patch sets scsi_sg_count(sc) to sg_count, right? for_each_sg is
expected to be used with a return value of pci_map_sg.
Then this patch can simply do something like:
for_each_sg(sg, sg, sg_count, i) {
pt[i].high = cpu_to_be32(sg_dma_len(sg) << 16);
pt[i].low = cpu_to_be32(sg_dma_address(sg));
}
--
Well, I keep the original scsi_sg_count to call dma_unmap_sg with it later. According to Corbet/ Rubini/ Kroah-Hartman: LDD3, dma_unmap_sg is to be called with the number of s/g elements before DMA-mapping. From a quick look at some dma_unmap_sg implementations, this seems still Thanks a lot for looking at this; here is my patch fixed. From: Stefan Richter <stefanr@s5r6.in-berlin.de> Subject: ieee1394: sbp2: enforce s/g segment size limit 1. We don't need to round the SBP-2 segment size limit down to a multiple of 4 kB (0xffff -> 0xf000). It is only necessary to ensure quadlet alignment (0xffff -> 0xfffc). 2. Use dma_set_max_seg_size() to tell the DMA mapping infrastructure and the block IO layer about the restriction. This way we can remove the size checks and segment splitting in the queuecommand path. This assumes that no other code in the ieee1394 stack uses dma_map_sg() with conflicting requirements. It furthermore assumes that the controller device's platform actually allows us to set the segment size to our liking. Assert the latter with a BUG_ON(). 3. Also use blk_queue_max_segment_size() to tell the block IO layer about it. It cannot know it because our scsi_add_host() does not point to the FireWire controller's device. We can also uniformly use dma_map_sg() for the single segment case just like for the multi segment case, to further simplify the code. Also clean up how the page table is converted to big endian. Thanks to Grant Grundler and FUJITA Tomonori for advice. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- drivers/ieee1394/sbp2.c | 100 ++++++++++++---------------------------- drivers/ieee1394/sbp2.h | 37 ++++++-------- 2 files changed, 46 insertions(+), 91 deletions(-) Index: linux/drivers/ieee1394/sbp2.h =================================================================== --- linux.orig/drivers/ieee1394/sbp2.h +++ linux/drivers/ieee1394/sbp2.h @@ -139,13 +139,10 @@ struct ...
1. We don't need to round the SBP-2 segment size limit down to a
multiple of 4 kB (0xffff -> 0xf000). It is only necessary to
ensure quadlet alignment (0xffff -> 0xfffc).
2. Use dma_set_max_seg_size() to tell the DMA mapping infrastructure
and the block IO layer about the restriction. This way we can
remove the size checks and segment splitting in the queuecommand
path.
This assumes that no other code in the firewire stack uses
dma_map_sg() with conflicting requirements. It furthermore assumes
that the controller device's platform actually allows us to set the
segment size to our liking. Assert the latter with a BUG_ON().
3. Also use blk_queue_max_segment_size() to tell the block IO layer
about it. It cannot know it because our scsi_add_host() does not
point to the FireWire controller's device.
Thanks to Grant Grundler and FUJITA Tomonori for advice.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
drivers/firewire/fw-sbp2.c | 64 ++++++++++++++++---------------------
1 file changed, 28 insertions(+), 36 deletions(-)
Index: linux/drivers/firewire/fw-sbp2.c
===================================================================
--- linux.orig/drivers/firewire/fw-sbp2.c
+++ linux/drivers/firewire/fw-sbp2.c
@@ -29,6 +29,7 @@
*/
#include <linux/blkdev.h>
+#include <linux/bug.h>
#include <linux/delay.h>
#include <linux/device.h>
#include <linux/dma-mapping.h>
@@ -181,10 +182,16 @@ struct sbp2_target {
#define SBP2_MAX_LOGIN_ORB_TIMEOUT 40000U /* Timeout in ms */
#define SBP2_ORB_TIMEOUT 2000U /* Timeout in ms */
#define SBP2_ORB_NULL 0x80000000
-#define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000
#define SBP2_RETRY_LIMIT 0xf /* 15 retries */
#define SBP2_CYCLE_LIMIT (0xc8 << 12) /* 200 125us cycles */
+/*
+ * The default maximum s/g segment size of a FireWire controller is
+ * usually 0x10000, but SBP-2 only allows 0xffff. Since buffers have to
+ * be quadlet-aligned, we set the length limit to 0xffff ...PS: That's the 2.6.26 version; the 2.6.27 patch has an additional argument here, if anybody cares... -- Stefan Richter -=====-==--- =--- -==-= http://arcgraph.de/sr/ --
On Wed, 13 Aug 2008 12:19:59 +0200 (CEST)
You need to keep it? You can access to it via scsi_sg_count when
Are you talking about?
=
To unmap a scatterlist, just call:
pci_unmap_sg(pdev, sglist, nents, direction);
Again, make sure DMA activity has already finished.
PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be
the _same_ one you passed into the pci_map_sg call,
it should _NOT_ be the 'count' value _returned_ from the
pci_map_sg call.
I think that all the IOMMUs are fine even if you call dma_unmap_sg
with a 'nents' value that dma_map_sg returned. But, well, it's against
the rule.
--
Yes, I can easily do it this way; fixed below.
Right. How did I forget about that?
From: Stefan Richter <stefanr@s5r6.in-berlin.de>
Subject: ieee1394: sbp2: remove redundant struct members
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
drivers/ieee1394/sbp2.c | 14 +++++---------
drivers/ieee1394/sbp2.h | 4 ----
2 files changed, 5 insertions(+), 13 deletions(-)
Index: linux/drivers/ieee1394/sbp2.c
===================================================================
--- linux.orig/drivers/ieee1394/sbp2.c
+++ linux/drivers/ieee1394/sbp2.c
@@ -656,11 +656,11 @@ static struct sbp2_command_info *sbp2uti
static void sbp2util_mark_command_completed(struct sbp2_lu *lu,
struct sbp2_command_info *cmd)
{
- if (cmd->sg_buffer) {
- dma_unmap_sg(lu->ud->ne->host->device.parent, cmd->sg_buffer,
- cmd->sg_count, cmd->sg_dir);
- cmd->sg_buffer = NULL;
- }
+ if (scsi_sg_count(cmd->Current_SCpnt) > 0)
+ dma_unmap_sg(lu->ud->ne->host->device.parent,
+ scsi_sglist(cmd->Current_SCpnt),
+ scsi_sg_count(cmd->Current_SCpnt),
+ cmd->Current_SCpnt->sc_data_direction);
list_move_tail(&cmd->list, &lu->cmd_orb_completed);
}
@@ -1505,10 +1505,6 @@ static int sbp2_prep_command_orb_sg(stru
if (n == 0)
return -ENOMEM;
- cmd->sg_buffer = sg;
- cmd->sg_count = sg_count;
- cmd->sg_dir = dma_dir;
-
orb->data_descriptor_hi = ORB_SET_NODE_ID(hi->host->node_id);
orb->misc |= ORB_SET_DIRECTION(orb_direction);
Index: linux/drivers/ieee1394/sbp2.h
===================================================================
--- linux.orig/drivers/ieee1394/sbp2.h
+++ linux/drivers/ieee1394/sbp2.h
@@ -252,10 +252,6 @@ struct sbp2_command_info {
struct sbp2_unrestricted_page_table
scatter_gather_element[SG_ALL] __attribute__((aligned(8)));
dma_addr_t sge_dma;
-
- struct scatterlist *sg_buffer;
- int sg_count;
- enum dma_data_direction sg_dir;
};
/* Per FireWire host */
--
Stefan ...On Thu, 14 Aug 2008 09:12:01 +0200 (CEST) static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd) --
