> On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>>
>> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
>> >> Neil Horman <nhorman@tuxdriver.com> writes:
>> >>
>> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
>> >>
>> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
>> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu
>> >> >> mapping entries.
>> >> >>
>> >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I
>> >> > think about it, there is still a small possibility that a device like a NIC
>> >> > which has several buffers pre-dma-mapped could start a new dma before we
>> >> > completely disabled the iommu, althought thats small. I never saw that in my
>> >> > testing, but hitting that would be fairly difficult I think, since its literally
>> >> > just a few hundred cycles between the flush and the actual hardware disable
>> >> > operation.
>> >> >
>> >> > According to this though:
>> >> >
http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
>> >> > That window could be closed fairly easily, but simply disabling read and write
>> >> > permissions for each device table entry prior to calling flush. If we do that,
>> >> > then flush the device table, any subsequently started dma operation would just
>> >> > get noted in the error log, which we could ignore, since we're abot to boot to
>> >> > the kdump kernel anyway.
>> >> >
>> >> > Would you like me to respin w/ that modification?
>> >>
>> >> Disabling permissions on all devices sounds good for the new virtualization
>> >> capable iommus. I think older iommus will still be challenged. I think
>> >> on x86 we have simply been able to avoid using those older iommus.
>> >>
>> >> I like the direction you are going but please let's put this in a
>> >> paranoid iommu enable routine.
>> >>
>> > You mean like initialize the device table so that all devices are default
>> > disabled on boot, and then selectively enable them (perhaps during a
>> > device_attach)? I can give that a spin.
>>
>> That sounds good.
>>
>
> So I'm officially rescinding this patch. It apparently just covered up the
> problem, rather than solved it outright. This is going to take some more
> thought on my part. I read the code a bit closer, and the amd iommu on boot up
> currently marks all its entries as valid and having a valid translation (because
> if they're marked as invalid they're passed through untranslated which strikes
> me as dangerous, since it means a dma address treated as a bus address could
> lead to memory corruption. The saving grace is that they are marked as
> non-readable and non-writeable, so any device doing a dma after the reinit
> should get logged (which it does), and then target aborted (which should
> effectively squash the translation)
>
> I'm starting to wonder if:
>
> 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> the kdump kernel leading to various erroneous behavior