On Mon, Apr 12, 2010 at 07:15:07PM +0200, Daniel Mack wrote:
You might want to run some benchmarks first to see if it is such a
problem. Keep in mind that you would be addressing only the host-side of
this: all DMA transfers from the USB controller to the memory. But for any
transfer from the user space to the USB device you can't make
the <4GB assumption as the stack/heap in the user-land is stiched from
various memory areas - some of them above your 4GB mark. So when you
write your response to this e-mail, and your /var/spool/clientmqueue is on your
USB disk, the page with your response that is being written to the disk, can be
allocated from a page above the 4GB mark and then has to be bounced-buffered
for the USB controller. Note, I am only talking about 64-bit kernels,
the 32-bit are a different beast altogether when it comes to
Thought please keep in mind that this issue of bounce-buffer is less of
a problem nowadays. Both AMD and Intel are outfitting their machines
with hardware IOMMU's that replace the SWIOTLB (and IBM's high-end boxes
with the Calgary ones). And on AMD the GART has been used for many years
as a poor-man IOMMU.
Fix whatever makes the DMA address have the wrong value. In the
0x08...00<bus address> address the 0x08 looks quite suspicious. Like it
has been used as a flag or the generated casting code (by GCC) from 64-bit
to 32-bit didn't get the right thing (I remember seeing this with
InfiniBand with RHEL5.. which was GCC 4.1 I think?)
It would be worth instrumenting the PCI-DMA API code and trigger a
dump_stack when that flag (0x008) is detected in the return from the
underlaying page mapping code. If you need help with this I can
give you some debug patches.
--