Hello Joerg, The requested info is attached. So that would mean a bios problem ? (those are not on my wishlist :-p) -- Sander -- Best regards, Sander [ message continues ]
Yeah, looks like a BIOS problem. But the driver should handle that without crashing the system, so there is a bug in the driver too. Problem is: AMD-Vi: DEV_ALIAS_RANGE devid: 0a:01.0 flags: 00 devid_to: 0a:00.0 AMD-Vi: DEV_RANGE_END devid: 0a:1f.7 This means that PCI devices from 0a:01.0 to 0a:1f.7 may use their own device-id or 0a:00.0. But a device which id 0a:00.0 is not present in the system. From the lspci output this looks like your USB3 controler should alias to 09:00.0. I prepare a patch for you to fix the crash but I can't guarantee that your USB3 controler will work afterwards. If you see IO-Page-Faults please report them to me. Joerg --
Hello Joerg, Could you also provide a perhaps more specific message what is wrong with the bios, that i could forward to MSI, in the hope it will reach the bios engineers someday ? :-) -- Sander -- Best regards, Sander mailto:linux@eikelenboom.it --
Lets first prove that my theory is right before contacting MSI directly. Can you try the attached patch? it should fix the boot-crash. When the system booted successfully please try some USB device (make sure it uses the seperate usb-controler, I guess the seperate device is responsible for USB 3, so try to plug a device into one of your USB 3 ports). If you finished that please send me whether it worked or not and the full dmesg output of the system. Joerg
Hello Joerg, Errr which seperate usb controller ? .. it has actually: - 1 pci-e usb 2.0 controller - 2 pci-e usb 3.0 controller (one of which includes a sata controller as well) (apart from the onboard stuff) -- Sander -- Best regards, Sander mailto:linux@eikelenboom.it --
Hi Sander, The devices should be attached to this controler: 0a:01.0 USB Controller [0c03]: NEC Corporation USB [1033:0035] (rev 43) (prog-if 10 [OHCI]) 0a:01.1 USB Controller [0c03]: NEC Corporation USB [1033:0035] (rev 43) (prog-if 10 [OHCI]) 0a:01.2 USB Controller [0c03]: NEC Corporation USB 2.0 [1033:00e0] (rev 04) (prog-if 20 [EHCI]) The PCI devices associated with that controler alias to 0a:00.0 which does not exist in your system (hence the crash). And the fact that these devices have an alias makes me believe that the BIOS detects them as legacy PCI devices. PCI-e does typically not has aliases. Can you send lcpi -t output to see to which upstream bridge these devices are connected to? Joerg --
Hmmm the fun part seems to be .. that the usb devices on that usb2 controller seemed to work fine on Xen.
And i have some problems about xen not willing to passthrough things with the usb3 controllers (supposedly due to the (extra) bridges),
that are the controllers on 04:00.0 and 08:00.0
-[0000:00]-+-00.0
+-00.2
+-02.0-[0000:0d]--+-00.0
| \-00.1
+-05.0-[0000:0c]----00.0
+-06.0-[0000:0b]----00.0
+-0a.0-[0000:09-0a]----00.0-[0000:0a]--+-01.0
| +-01.1
| \-01.2
+-0b.0-[0000:05-08]----00.0-[0000:06-08]--+-01.0-[0000:08]----00.0
| \-02.0-[0000:07]----00.0
+-0d.0-[0000:04]----00.0
+-11.0
+-12.0
+-12.2
+-13.0
+-13.2
+-14.0
+-14.3
+-14.4-[0000:03]----06.0
+-14.5
+-15.0-[0000:02]--
+-16.0
+-16.2
+-18.0
+-18.1
+-18.2
+-18.3
\-18.4
I had hoped things would become easier/better with my new mobo including iommu :-)
Doesn't seem that way yet. Previously i had 2 usb2.0 controllers(1x pci 1x pci-e) and 1 usb3.0(pci-e) passed through (with xen-swiotlb and no hardware iommu).. and that worked fine grabbing video 24/7 for several weeks.
But lets hope for the best :-)
--
Sander
--
Best regards,
Sander mailto:linux@eikelenboom.it
--
Hmm, thats weird. In this case these devices probably do not alias at Yeah, device 09:00.0 is a PCIe-to-PCI bridge and the addtional USB controlers are behind that bridge as legacy PCI devices. Thats why the BIOS sets up the alias-entry. It should set up 09:00.0 instead of 0a:00.0 to make things work correctly. Joerg --
Hi Joerg, Ok it boots ok now, but plugging in a USB device in the 2.0 controller (0a.01.*) results in a flood of error messages about the usb controller not functioning. When running same kernel with amd_iommu=off results in ...the device at least registering properly as usb device (altough trying to use it now resulted in an entirely new oops probably in the driver of the videograbber.) -- Sander -- Best regards, Sander mailto:linux@eikelenboom.it --
It boots now, dmesg attached. -- Best regards, Sander [ message continues ]
Ok, AMD-Vi: Event logged [IO_PAGE_FAULT device=0a:00.0 domain=0x0000 address=0x0000000000001080 flags=0x0070] So it indeed uses 0a:00.0 as the device id. Thats weird but states that the BIOS is actually ok. I need to fix that in the driver. Thanks, Joerg --
Ok, here is a quick and dirty patch wich should make your system boot again. It introduces other issues which will show up when you try to assign the devices to a virtual machine. But at least the devices should work again on bare-metal. Joerg
Hello Joerg, Had to apply the patch by hand, and found 2 typo's: arch/x86/kernel/amd_iommu.c: In function âdo_attachâ: arch/x86/kernel/amd_iommu.c:1456: error: implicit declaration of function âset_dte_enryâ arch/x86/kernel/amd_iommu.c: In function âdo_detachâ: arch/x86/kernel/amd_iommu.c:1486: error: implicit declaration of function âclear_dte_enryâ make[2]: *** [arch/x86/kernel/amd_iommu.o] Error 1 Should be "entry" of course. -- Sander -- Best regards, Sander mailto:linux@eikelenboom.it --
