NVIDIA
vGPU on CentOS 7.4 VM with RHEV 4.2
Hi, I'm having trouble getting my Nvidia vGPU to work on a VM. Setup: RHEV 4.2 on RHEL 7.5, Tesla M60 (switched to graphics mode). I'm using the NVIDIA-GRID-RHEL-7.5-410.92-410.91-412.16.zip package from Nvidia. On the hypervisor, I've installed the NVIDIA-vGPU-rhel-7.5-410.91.x86_64 rpm. vfio kernel modules are loaded, nvidia-smi shows the card, and I can see all the vGPUs via vdsm-client I've created a CentOS 7.4 VM and added a 'B' type vGPU instance in 'custom properties'. I've configured gridd.conf to point to the license server and it reports picking up a license in /var/log/messages. I installed the driver via the .run file (NVIDIA-Linux-x86_64-410.92-grid.run). The nvidia kernel module is loaded, but so also is the 'qxl' paravirtual driver. lspci reports: 00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04) 00:07.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) The Xorg.0.log reports: [ 1622.212] (--) PCI:*(0:0:2:0) 1b36:0100:1af4:1100 rev 4, Mem @ 0xf0000000/134217728, 0xfb000000/8388608, 0xfb870000/8192, I/O @ 0x0000c100/32, BIOS @ 0x????????/65536 [ 1622.212] (--) PCI: (0:0:7:0) 10de:13f2:10de:1177 rev 161, Mem @ 0xfa000000/16777216, 0xd0000000/268435456, 0xf8000000/33554432, I/O @ 0x0000c000/128, BIOS @ 0x????????/131072 [ 1622.212] (II) LoadModule: "glx" [ 1622.212] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [ 1622.213] (II) Module glx: vendor="X.Org Foundation" [ 1622.213] compiled for 1.19.3, module version = 1.0.0 [ 1622.213] ABI class: X.Org Server Extension, version 10.0 [ 1622.213] (II) LoadModule: "nvidia" [ 1622.214] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so [ 1622.214] (II) Module nvidia: vendor="NVIDIA Corporation" [ 1622.214] compiled for 4.0.2, module version = 1.0.0 [ 1622.214] Module class: X.Org Video Driver [ 1622.214] (II) NVIDIA dlloader X Driver 410.92 Thu Dec 20 04:48:17 CST 2018 [ 1622.214] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs [ 1622.214] (++) using VT number 1 [ 1622.214] (EE) No devices detected. [ 1622.214] (EE) Fatal server error: [ 1622.214] (EE) no screens found(EE) [ 1622.214] (EE) I've tried blacklisting the qxl module on the VM in case it is blocking the nvidia driver (though that didn't work anyway). I suspect it is something on the hypervisor. I've tried running nvidia-xconfig to generate a xorg.conf as well as a custom one specifying the BusID of the card, but neither works (same error). Thanks in advance for any help. Cam
Hi,

I'm having trouble getting my Nvidia vGPU to work on a VM.

Setup: RHEV 4.2 on RHEL 7.5, Tesla M60 (switched to graphics mode). I'm using the NVIDIA-GRID-RHEL-7.5-410.92-410.91-412.16.zip package from Nvidia.

On the hypervisor, I've installed the NVIDIA-vGPU-rhel-7.5-410.91.x86_64 rpm. vfio kernel modules are loaded, nvidia-smi shows the card, and I can see all the vGPUs via vdsm-client

I've created a CentOS 7.4 VM and added a 'B' type vGPU instance in 'custom properties'. I've configured gridd.conf to point to the license server and it reports picking up a license in /var/log/messages. I installed the driver via the .run file (NVIDIA-Linux-x86_64-410.92-grid.run). The nvidia kernel module is loaded, but so also is the 'qxl' paravirtual driver.

lspci reports:

00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:07.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

The Xorg.0.log reports:

[ 1622.212] (--) PCI:*(0:0:2:0) 1b36:0100:1af4:1100 rev 4, Mem @ 0xf0000000/134217728, 0xfb000000/8388608, 0xfb870000/8192, I/O @ 0x0000c100/32, BIOS @ 0x????????/65536
[ 1622.212] (--) PCI: (0:0:7:0) 10de:13f2:10de:1177 rev 161, Mem @ 0xfa000000/16777216, 0xd0000000/268435456, 0xf8000000/33554432, I/O @ 0x0000c000/128, BIOS @ 0x????????/131072
[ 1622.212] (II) LoadModule: "glx"
[ 1622.212] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 1622.213] (II) Module glx: vendor="X.Org Foundation"
[ 1622.213] compiled for 1.19.3, module version = 1.0.0
[ 1622.213] ABI class: X.Org Server Extension, version 10.0
[ 1622.213] (II) LoadModule: "nvidia"
[ 1622.214] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 1622.214] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 1622.214] compiled for 4.0.2, module version = 1.0.0
[ 1622.214] Module class: X.Org Video Driver
[ 1622.214] (II) NVIDIA dlloader X Driver 410.92 Thu Dec 20 04:48:17 CST 2018
[ 1622.214] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 1622.214] (++) using VT number 1
[ 1622.214] (EE) No devices detected.
[ 1622.214] (EE)
Fatal server error:
[ 1622.214] (EE) no screens found(EE)
[ 1622.214] (EE)

I've tried blacklisting the qxl module on the VM in case it is blocking the nvidia driver (though that didn't work anyway). I suspect it is something on the hypervisor.

I've tried running nvidia-xconfig to generate a xorg.conf as well as a custom one specifying the BusID of the card, but neither works (same error).

Thanks in advance for any help.

Cam

#1
Posted 03/01/2019 04:28 PM   
Scroll To Top

Add Reply