NVIDIA
M60 on ESXi: No Profiles
I've installed the VIB that shows up in the licensing portal, but the profiles aren't showing up in the VM settings. We're runnin ESXi, 6.5.0, 7967591 (and View 7.2). The vib appears to be installed properly, when I run: esxcli software vib list | grep -i nvidia I get: NVIDIA-VMware_ESXi_6.5_Host_Driver 410.68-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-12-04 However when I try to run: nvidia-smi I get: Failed to initialize NVML: Unknown Error I've verified the card is in graphics mode (as opposed to compute). I've seen a few comments suggesting that I may have the wrong VIB, but it is the only one offered in the licensing portal. Does anybody have any ideas?
I've installed the VIB that shows up in the licensing portal, but the profiles aren't showing up in the VM settings. We're runnin ESXi, 6.5.0, 7967591 (and View 7.2). The vib appears to be installed properly, when I run:

esxcli software vib list | grep -i nvidia

I get:

NVIDIA-VMware_ESXi_6.5_Host_Driver 410.68-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-12-04

However when I try to run:

nvidia-smi

I get:

Failed to initialize NVML: Unknown Error

I've verified the card is in graphics mode (as opposed to compute).

I've seen a few comments suggesting that I may have the wrong VIB, but it is the only one offered in the licensing portal. Does anybody have any ideas?

#1
Posted 12/04/2018 11:57 PM   
Hi, you should run dmesg on your host to figure out what the issue is. I assume a BIOS issue with MMIO. Which server are we talking about? Dell R740? Regards Simon
Hi,

you should run dmesg on your host to figure out what the issue is. I assume a BIOS issue with MMIO. Which server are we talking about? Dell R740?

Regards

Simon

#2
Posted 12/05/2018 08:02 AM   
Hi Simon, thanks for responding. To answer your question, the server is a Cisco C240-M4SX. After I posted this message I ran across: https://citrixguyblog.com/2017/07/25/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver/#more-1506 And as an experiment I enabled DirectPath, and at that point the profiles started showing up in the VM settings. After disabling DirectPath again (and after rebooting) I get the following: [code]dmesg | grep -i nvidia VMB: 323: name: /NVIDIA_V.v00 2018-12-05T18:28:48.612Z cpu0:65536)VisorFSTar: 1982: NVIDIA_V.v00 for 0x482d082 bytes 2018-12-05T18:29:03.431Z cpu13:66178)Loading module nvidia ... 2018-12-05T18:29:03.450Z cpu13:66178)Elf: 2043: module nvidia has license NVIDIA 2018-12-05T18:29:03.862Z cpu13:66178)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 410.68 Sat Oct 13 22:59:52 CDT 2018 2018-12-05T18:29:03.862Z cpu13:66178)Device: 191: Registered driver 'nvidia' from 20 2018-12-05T18:29:03.863Z cpu13:66178)Mod: 4968: Initialization of nvidia succeeded with module ID 20. 2018-12-05T18:29:03.863Z cpu13:66178)nvidia loaded successfully. 2018-12-05T18:29:46.532Z cpu30:67021)Starting service nvidia-init 2018-12-05T18:29:46.532Z cpu30:67021)Activating Jumpstart plugin nvidia-init. 2018-12-05T18:29:46.555Z cpu3:68278)ALERT: NVIDIA: module load failed during VIB install/upgrade. 2018-12-05T18:29:46.564Z cpu0:68279)NVIDIA: Starting vGPU Services. 2018-12-05T18:29:46.578Z cpu1:68282)NVIDIA: Starting Xorg service. 2018-12-05T18:29:48.051Z cpu20:68427)NVIDIA: Starting the DCGM node engine. 2018-12-05T18:29:54.596Z cpu26:67021)Jumpstart plugin nvidia-init activated. lspci | grep NVIDIA 0000:8f:00.0 Display controller: NVIDIA Corporation NVIDIATesla M60 [vmgfx0] 0000:90:00.0 Display controller: NVIDIA Corporation NVIDIATesla M60 [vmgfx1][/code] xorg will try to start, and then stop.
Hi Simon, thanks for responding.

To answer your question, the server is a Cisco C240-M4SX.

After I posted this message I ran across:

https://citrixguyblog.com/2017/07/25/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver/#more-1506

And as an experiment I enabled DirectPath, and at that point the profiles started showing up in the VM settings. After disabling DirectPath again (and after rebooting) I get the following:

dmesg | grep -i nvidia

VMB: 323: name: /NVIDIA_V.v00
2018-12-05T18:28:48.612Z cpu0:65536)VisorFSTar: 1982: NVIDIA_V.v00 for 0x482d082 bytes
2018-12-05T18:29:03.431Z cpu13:66178)Loading module nvidia ...
2018-12-05T18:29:03.450Z cpu13:66178)Elf: 2043: module nvidia has license NVIDIA
2018-12-05T18:29:03.862Z cpu13:66178)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 410.68 Sat Oct 13 22:59:52 CDT 2018
2018-12-05T18:29:03.862Z cpu13:66178)Device: 191: Registered driver 'nvidia' from 20
2018-12-05T18:29:03.863Z cpu13:66178)Mod: 4968: Initialization of nvidia succeeded with module ID 20.
2018-12-05T18:29:03.863Z cpu13:66178)nvidia loaded successfully.
2018-12-05T18:29:46.532Z cpu30:67021)Starting service nvidia-init
2018-12-05T18:29:46.532Z cpu30:67021)Activating Jumpstart plugin nvidia-init.
2018-12-05T18:29:46.555Z cpu3:68278)ALERT: NVIDIA: module load failed during VIB install/upgrade.
2018-12-05T18:29:46.564Z cpu0:68279)NVIDIA: Starting vGPU Services.
2018-12-05T18:29:46.578Z cpu1:68282)NVIDIA: Starting Xorg service.
2018-12-05T18:29:48.051Z cpu20:68427)NVIDIA: Starting the DCGM node engine.
2018-12-05T18:29:54.596Z cpu26:67021)Jumpstart plugin nvidia-init activated.

lspci | grep NVIDIA

0000:8f:00.0 Display controller: NVIDIA Corporation NVIDIATesla M60 [vmgfx0]
0000:90:00.0 Display controller: NVIDIA Corporation NVIDIATesla M60 [vmgfx1]


xorg will try to start, and then stop.

#3
Posted 12/05/2018 06:57 PM   
Scroll To Top

Add Reply