On Wed, Jun 4, 2008 at 7:47 AM, FUJITA Tomonori
<fujita.tomonori@lab.ntt.co.jp> wrote:
...
It's possible to split up one flat address space and share the IOMMU
among several users. Each user gets her own segment of bitmap and
corresponding IO Pdir. So I don't see allocation policy as a strong reason
to use Red/Black Tree.
I suspect R/B tree was chosen becuase they expected a host VM to allocate
one large "static" entry for a guest OS and the guest would manage that range
itself. R/B Tree seems like a very efficient way to handle that from the host
VM point of view.
You can easily emulate SSD drives by doing sequential 4K reads
from a normal SATA HD. That should result in ~7-8K IOPS since the disk
will recognize the sequential stream and read ahead. SAS/SCSI/FC will
probably work the same way with different IOP rates.
Just to make this clear, this is a 10% performance difference.
But a second metric is more telling: CPU utilization.
How much time was spent in the IOMMU code for each
implementation with the same workload?
This isn't a demand for that information but just a request
to measure that in any future benchmarking.
oprofile or perfmon2 are the best tools to determine that.
Just as important as the allocation data structure is the allocation policy.
The allocation policy will perform best if it matches the IO TLB
replacement implemented in the IOMMU HW. Thrashing the IO TLB
by allocating aliases to competing streams will hurt perf as well.
Obviously a single benchmark is unlikely to detect this.
I personally found this to be one of the more interesting talks :)
Excellent work!
...
Sorry, I didn't see a replacement for the deferred_flush_tables.
Mark Gross and I agree this substantially helps with unmap performance.
See http://lkml.org/lkml/2008/3/3/373
...
Can you make reserved_it_size a module or kernel parameter?
I've never been able to come up with a good heuristic
for determining the size of the IOVA space. It generally
does NOT need to map all of Host Physical RAM.
The actual requirement depends entirely on the workload,
type and number of IO devices installed. The problem is we
don't know any of those things until well after the IOMMU
is already needed.
I didn't check....but reserving MMIO address space might be better done
by looking at MMIO ranges routed by all the top level PCI Host-bus controllers
(aka PCI-e Root ports). Maybe this is an idea for Mark Gross to implement.
"32-PAGE_SHIFT_4K" expression is used in several places but I didn't see
an explanation of why 32. Can you add one someplace?
out of time...
thanks!
grant
--