NVIDIA
P40 with Dell 740xd: nvidia-smi Failed to initialize NVML: Unknown Error
Using the Dell vSphere installer (VMware-VMvisor-Installer-6.5.0.update01-6765664.x86_64-DellEMC_Customized-A02), the NVIDIA Grid VIB installed fine (NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673). However, nvidia-smi returns: Failed to initialize NVML: Unknown Error Is there an incompatibility between the two versions that I have installed? Thanks, -Ryan
Using the Dell vSphere installer (VMware-VMvisor-Installer-6.5.0.update01-6765664.x86_64-DellEMC_Customized-A02), the NVIDIA Grid VIB installed fine (NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673).

However, nvidia-smi returns:

Failed to initialize NVML: Unknown Error

Is there an incompatibility between the two versions that I have installed?

Thanks,

-Ryan

#1
Posted 11/04/2017 09:42 PM   
Please try the default bits from VMWare. I don't think our VIB is tested with Dell installer. In addition please check with dmesg to see if there are any other errors that may indicate a BIOS settings error. Regards Simon
Please try the default bits from VMWare. I don't think our VIB is tested with Dell installer. In addition please check with dmesg to see if there are any other errors that may indicate a BIOS settings error.

Regards

Simon

#2
Posted 11/04/2017 09:59 PM   
Thanks Simon. The problem with the default installer from VMWare was that it did not recognize the 10Gb network ports on the server. I'll try to find another workaround for that. In the mean time, it does appear that there are some issues from the results of dmesg: 2017-11-04T20:02:39.735Z cpu30:67099)Starting service nvidia-init 2017-11-04T20:02:39.736Z cpu30:67099)Activating Jumpstart plugin nvidia-init. 2017-11-04T20:02:39.751Z cpu0:68125)ALERT: NVIDIA: module load failed during VIB install/upgrade. 2017-11-04T20:02:39.756Z cpu4:68126)NVIDIA: Starting vGPU Services. 2017-11-04T20:02:39.766Z cpu37:68129)NVIDIA: Starting Xorg service. 2017-11-04T20:02:40.872Z cpu12:68209)ALERT: NVIDIA: Xorg service start failed. 2017-11-04T20:02:40.876Z cpu34:68210)NVIDIA: Starting the DCGM node engine. 2017-11-04T20:02:41.959Z cpu26:67491)Config: 706: "VMOverheadGrowthLimit" = 4294967295, Old Value: -1, (Status: 0x0) 2017-11-04T20:02:42.961Z cpu20:67099)Jumpstart plugin nvidia-init activated. I will look into these issues. Thanks, -Ryan
Thanks Simon.

The problem with the default installer from VMWare was that it did not recognize the 10Gb network ports on the server. I'll try to find another workaround for that.

In the mean time, it does appear that there are some issues from the results of dmesg:


2017-11-04T20:02:39.735Z cpu30:67099)Starting service nvidia-init
2017-11-04T20:02:39.736Z cpu30:67099)Activating Jumpstart plugin nvidia-init.
2017-11-04T20:02:39.751Z cpu0:68125)ALERT: NVIDIA: module load failed during VIB install/upgrade.
2017-11-04T20:02:39.756Z cpu4:68126)NVIDIA: Starting vGPU Services.
2017-11-04T20:02:39.766Z cpu37:68129)NVIDIA: Starting Xorg service.
2017-11-04T20:02:40.872Z cpu12:68209)ALERT: NVIDIA: Xorg service start failed.
2017-11-04T20:02:40.876Z cpu34:68210)NVIDIA: Starting the DCGM node engine.
2017-11-04T20:02:41.959Z cpu26:67491)Config: 706: "VMOverheadGrowthLimit" = 4294967295, Old Value: -1, (Status: 0x0)
2017-11-04T20:02:42.961Z cpu20:67099)Jumpstart plugin nvidia-init activated.

I will look into these issues.

Thanks,

-Ryan

#3
Posted 11/04/2017 10:38 PM   
According to this doc: https://kb.vmware.com/s/article/2064775, the Module Name needs to be "nvidia", but I show it as "None", which might explain why Xorg will not start. I'm also not sure if the fact that >esxcli hardware pci list -c 0x0300 -m 0xf returns the embedded VGA controller as well as the NVIDIA controller is an issue or not...
According to this doc: https://kb.vmware.com/s/article/2064775, the Module Name needs to be "nvidia", but I show it as "None", which might explain why Xorg will not start.

I'm also not sure if the fact that

>esxcli hardware pci list -c 0x0300 -m 0xf

returns the embedded VGA controller as well as the NVIDIA controller is an issue or not...

#4
Posted 11/05/2017 06:23 PM   
Hi Ryan, Did you get any further? I have the same issue... Paul
Hi Ryan,

Did you get any further? I have the same issue...

Paul

#5
Posted 11/09/2017 07:32 PM   
Found it. Just if someone else has the same problem: [url]http://topics-cdn.dell.com/pdf/vmware-esxi-6.5.x_release%20notes_en-us.pdf[/url] page 8 Description: When system BIOS has "Memory Mapped I/O Base" set to 56 TB and if the server has GPU cards such as Nvidia M60 as the PCIe Pass-Through device, the virtual machines fails to power on. Applies to: ESXi 6.5.x and Dell EMC's 14th generation PowerEdge servers Solution: To resolve this, set the MMIO to 12 TB. To set MMIO, in System BIOS Settings > Integrated Devices, you have to set "Memory Mapped I/O Base" to 12 TB. For more information, refer to VMware Knowledge Base article 2142307
Found it.
Just if someone else has the same problem:
http://topics-cdn.dell.com/pdf/vmware-esxi-6.5.x_release%20notes_en-us.pdf
page 8



Description:
When system BIOS has "Memory Mapped I/O Base" set to 56 TB and if the server has GPU cards such as Nvidia M60 as the PCIe Pass-Through device, the virtual machines fails to power on.

Applies to:
ESXi 6.5.x and Dell EMC's 14th generation PowerEdge servers

Solution:
To resolve this, set the MMIO to 12 TB. To set MMIO, in System BIOS Settings >
Integrated Devices, you have to set "Memory Mapped I/O Base" to 12 TB.
For more information, refer to
VMware Knowledge Base article 2142307

#6
Posted 11/09/2017 09:47 PM   
Scroll To Top

Add Reply