NVIDIA
Trouble assigning vGPU
Hello, I am using a P4 card on an Ubuntu 16.04 host with KVM as hyper, ubuntu on my VM and using virsh to manage. I have made 2x P4-2Q profiles, the identifier for each showing in /sys/bus/mdev/devices/ I add the vGPU to my VM in the VM xml file, and update it as the new domain and on trying to run the VM again it throws an error (same if using directly with qemu-system-x86_64 line): "error: Failed to start domain ubuntu16.04-2 error: internal error: qemu unexpectedly closed the monitor: 2019-02-01T02:40:06.288715Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/8addd195-9715-45be-ad92-c7e3bcbac944,bus=pci.0,addr=0x9: vfio error: 8addd195-9715-45be-ad92-c7e3bcbac944: error getting device from group 12: Input/output error Verify all devices in group 12 are bound to vfio-<bus> or pci-stub and not already in use" it is the only device in group 12 it is bound to vfio (as far as I understand - it has container in /dev/vfio/ along with group 11) it is not being used by anything else - it is only assigned to that VM I am also wondering if perhaps ubuntu does not support grid via KVM (if RHEL distro is required) and it is throwing this because the vGPU profile is not readable I'm still reading up on vfio and iommu, but this is all fairly new to me (and no doubt straightforward to someone with more experience) so any tips or advice would be extremely helpful - thanks in advance EDIT: using 90 day trial if this matters, also not using a listed supported server motherboard (just a regular Gigabyte z390)
Hello, I am using a P4 card on an Ubuntu 16.04 host with KVM as hyper, ubuntu on my VM and using virsh to manage. I have made 2x P4-2Q profiles, the identifier for each showing in /sys/bus/mdev/devices/

I add the vGPU to my VM in the VM xml file, and update it as the new domain and on trying to run the VM again it throws an error (same if using directly with qemu-system-x86_64 line):
"error: Failed to start domain ubuntu16.04-2

error: internal error: qemu unexpectedly closed the monitor: 2019-02-01T02:40:06.288715Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/8addd195-9715-45be-ad92-c7e3bcbac944,bus=pci.0,addr=0x9: vfio error: 8addd195-9715-45be-ad92-c7e3bcbac944: error getting device from group 12: Input/output error

Verify all devices in group 12 are bound to vfio-<bus> or pci-stub and not already in use"

it is the only device in group 12
it is bound to vfio (as far as I understand - it has container in /dev/vfio/ along with group 11)
it is not being used by anything else - it is only assigned to that VM

I am also wondering if perhaps ubuntu does not support grid via KVM (if RHEL distro is required) and it is throwing this because the vGPU profile is not readable

I'm still reading up on vfio and iommu, but this is all fairly new to me (and no doubt straightforward to someone with more experience) so any tips or advice would be extremely helpful - thanks in advance

EDIT: using 90 day trial if this matters, also not using a listed supported server motherboard (just a regular Gigabyte z390)

#1
Posted 02/01/2019 04:52 AM   
As you already stated you are using a not supported Linux version on your host. Only RHEL is currently supported. Please try with RHEL as hypervisor... regards Simon
As you already stated you are using a not supported Linux version on your host. Only RHEL is currently supported. Please try with RHEL as hypervisor...

regards
Simon

#2
Posted 02/01/2019 08:04 PM   
try disabling ECC if you haven't already, nvidia-smi -e 0 I think. Let me know if it works, because I'm running on RHEL 7.6 and have similar issues (I can see the mdev device listed, but the VM won't start). This is my thread: https://gridforums.nvidia.com/default/topic/9590/tesla-boards/vgpu-timeout-status-0x65-vfio-error-qemu-kvm-rhel7-6/
try disabling ECC if you haven't already, nvidia-smi -e 0 I think. Let me know if it works, because I'm running on RHEL 7.6 and have similar issues (I can see the mdev device listed, but the VM won't start).

This is my thread:


https://gridforums.nvidia.com/default/topic/9590/tesla-boards/vgpu-timeout-status-0x65-vfio-error-qemu-kvm-rhel7-6/

#3
Posted 02/14/2019 04:19 PM   
I had to disable ECC for another setup and it worked, but with XenServer. If it won't startup properly it may be an issue building the VM (or possibly the ECC issue you mentioned, as mine did the same thing as it builds the vGPU on the first bootup)
I had to disable ECC for another setup and it worked, but with XenServer. If it won't startup properly it may be an issue building the VM (or possibly the ECC issue you mentioned, as mine did the same thing as it builds the vGPU on the first bootup)

#4
Posted 02/20/2019 12:53 AM   
Scroll To Top

Add Reply