Artificial Intelligence Computing Leadership from NVIDIA
vGPU timeout status 0x65, VFIO error, QEMU/KVM RHEL7.6
I have created a vGPU with UUID def87179-9c53-42d7-b224-a5d281037b84. The license server is running, and I've provided GRID-Virtual App and QUADRO-DWS resources to the mac address of the VM. I get the following output when I try to start my VM: [code] [root@instance-1 ~]# dmesg [nvidia-vgpu-vfio] def87179-9c53-42d7-b224-a5d281037b84: start failed. status: 0x65 Timeout Occured [root@instance-1 ~]# virsh start win10_1 error: Failed to start domain win10_1 error: internal error: process exited while connecting to monitor: Verify all devices in group 0 are bound to vfio-pci or pci-stub and not already in use 2019-02-13T15:11:50.129364Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: vfio: failed to get device def87179-9c53-42d7-b224-a5d281037b84 2019-02-13T15:11:50.129455Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: Device initialization failed. 2019-02-13T15:11:50.129479Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: Device 'vfio-pci' could not be initialized [/code] I've tried GRID P100-1Q, P100-16Q, and P100-1A vGPUs with the same results. Further, while I can see the device's uuid listed in the mdev/devices, I get the following when I run the following: [code] [root@instance-1 ~]# nvidia-smi vgpu -q GPU 00000000:00:04.0 Active vGPUs : 0 [root@instance-1 ~]# nvidia-smi vgpu -c GPU 00000000:00:04.0 GRID P100-1Q [/code] I am running qemu-kvm version 1.5.3 and RHEL 7.6 with kernel 3.10.0-957.el7.x86_64. Here's the relevant portion of my VM's XML file: [code] <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'> <source> <address uuid='def87179-9c53-42d7-b224-a5d281037b84'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> [/code]
I have created a vGPU with UUID def87179-9c53-42d7-b224-a5d281037b84. The license server is running, and I've provided GRID-Virtual App and QUADRO-DWS resources to the mac address of the VM.

I get the following output when I try to start my VM:

[root@instance-1 ~]# dmesg
[nvidia-vgpu-vfio] def87179-9c53-42d7-b224-a5d281037b84: start failed. status: 0x65 Timeout Occured

[root@instance-1 ~]# virsh start win10_1
error: Failed to start domain win10_1
error: internal error: process exited while connecting to monitor: Verify all devices in group 0 are bound to vfio-pci or pci-stub and not already in use
2019-02-13T15:11:50.129364Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: vfio: failed to get device def87179-9c53-42d7-b224-a5d281037b84
2019-02-13T15:11:50.129455Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: Device initialization failed.
2019-02-13T15:11:50.129479Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/def87179-9c53-42d7-b224-a5d281037b84,display=off,bus=pci.0,addr=0x8: Device 'vfio-pci' could not be initialized


I've tried GRID P100-1Q, P100-16Q, and P100-1A vGPUs with the same results. Further, while I can see the device's uuid listed in the mdev/devices, I get the following when I run the following:

[root@instance-1 ~]# nvidia-smi vgpu -q
GPU 00000000:00:04.0
Active vGPUs : 0

[root@instance-1 ~]# nvidia-smi vgpu -c
GPU 00000000:00:04.0
GRID P100-1Q


I am running qemu-kvm version 1.5.3 and RHEL 7.6 with kernel 3.10.0-957.el7.x86_64. Here's the relevant portion of my VM's XML file:
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='def87179-9c53-42d7-b224-a5d281037b84'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</hostdev>

#1
Posted 02/14/2019 04:12 PM   
Hi, did you disable ECC memory on the P100? Regards Simon
Hi,

did you disable ECC memory on the P100?

Regards
Simon

#2
Posted 02/18/2019 07:30 AM   
Yes I forgot to mention! I did disable ECC memory.
Yes I forgot to mention! I did disable ECC memory.

#3
Posted 02/18/2019 11:56 AM   
I have similar problem with Tesla P40. I have installed vgpu manager 7.1 on Debian 9. When I am starting VM in Proxmox I am getting that error: [code]Verify all devices in group 79 are bound to vfio-<bus> or pci-stub and not already in use[/code] dmesg: [code][ 150.834555] iommu: Adding device 00000000-0000-0000-0000-000000000100 to group 79 [ 150.834557] vfio_mdev 00000000-0000-0000-0000-000000000100: MDEV: group_id = 79 [ 161.498679] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000100: start failed. status: 0x65 Timeout Occured [/code] Nvidia please help us :)
I have similar problem with Tesla P40.

I have installed vgpu manager 7.1 on Debian 9.

When I am starting VM in Proxmox I am getting that error:

Verify all devices in group 79 are bound to vfio-<bus> or pci-stub and not already in use


dmesg:

[  150.834555] iommu: Adding device 00000000-0000-0000-0000-000000000100 to group 79
[ 150.834557] vfio_mdev 00000000-0000-0000-0000-000000000100: MDEV: group_id = 79
[ 161.498679] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000100: start failed. status: 0x65 Timeout Occured


Nvidia please help us :)

#4
Posted 02/22/2019 10:06 AM   
Debian is not supported at all so your situation is different...
Debian is not supported at all so your situation is different...

#5
Posted 02/24/2019 10:37 AM   
Are you going to support other than RedHat linux distribution soon? You should support Proxmox too... So please tell that driver on nvidia download page: NVIDIA vGPU for Linux KVM - for what linux distribution is it created?
Are you going to support other than RedHat linux distribution soon?

You should support Proxmox too...

So please tell that driver on nvidia download page:

NVIDIA vGPU for Linux KVM - for what linux distribution is it created?

#6
Posted 02/26/2019 12:54 AM   
As I said, we currently only support vGPU for RHEL KVM as you can see here: https://docs.nvidia.com/grid/latest/product-support-matrix/index.html
As I said, we currently only support vGPU for RHEL KVM as you can see here:

https://docs.nvidia.com/grid/latest/product-support-matrix/index.html

#7
Posted 02/27/2019 11:02 AM   
Scroll To Top

Add Reply