NVIDIA
Failed to initialize NVML: Unknown Error
Hello, I'm pretty new at this so please have patience :) We have a 3 node VMware cluster running VMware 6.0U1a. We have just installed a Nvidia Grid K1 in one of our hosts. The host is an IBM 3850 X6 and we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3 I have followed the deployment guide and have changed the BIOS settings accordingly: [b]* Memory Mapped Config Base memory window - changed from auto to 2 GB - (I think it supposed to be below 4 GB) * 64-bit PCI Resource - changed from Enabled to Disabled[/b] I have installed the Virtual GPU manager: [b][i]esxcli software vib list | grep -i nvidia [i]NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver 346.42-1OEM.600.0.0.2159203 NVIDIA VMwareAccepted 2015-12-14[/i][/i][/b] The module is loaded: [b][i]esxcfg-module -l | grep nvidia nvidia 0 8420[/i][/b] When I run the nvidia-smi command: [b][i]nvidia-smi Failed to initialize NVML: Unknown Error[/i][/b] Theres no output in the vmkernel.log: [b][i]cat /var/log/vmkernel.log | grep NVRM [root@ESX-F-1:/var/log][/i][/b] VMware doesn't seem to be aware of the Nvidia card. It only finds the onboard graphics card: [b][i]lspci | grep -i display 0000:1b:00.0 Display controller: Matrox Electronics Systems Ltd. G200eR2 [/i][/b] I have struggled with this issue quite some time now so I really hope you can help. /Michael
Hello,
I'm pretty new at this so please have patience :)

We have a 3 node VMware cluster running VMware 6.0U1a. We have just installed a Nvidia Grid K1 in one of our hosts.

The host is an IBM 3850 X6 and we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3

I have followed the deployment guide and have changed the BIOS settings accordingly:
* Memory Mapped Config Base memory window - changed from auto to 2 GB - (I think it supposed to be below 4 GB)
* 64-bit PCI Resource - changed from Enabled to Disabled


I have installed the Virtual GPU manager:
esxcli software vib list | grep -i nvidia
NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver 346.42-1OEM.600.0.0.2159203 NVIDIA VMwareAccepted 2015-12-14


The module is loaded:
esxcfg-module -l | grep nvidia
nvidia 0 8420


When I run the nvidia-smi command:
nvidia-smi
Failed to initialize NVML: Unknown Error


Theres no output in the vmkernel.log:
cat /var/log/vmkernel.log | grep NVRM
[root@ESX-F-1:/var/log]


VMware doesn't seem to be aware of the Nvidia card. It only finds the onboard graphics card:
lspci | grep -i display
0000:1b:00.0 Display controller: Matrox Electronics Systems Ltd. G200eR2


I have struggled with this issue quite some time now so I really hope you can help.

/Michael

#1
Posted 12/15/2015 12:19 PM   
Do you have the K1's configured for PCI Passthrough in vSphere? If you do, you need to undo that.
Do you have the K1's configured for PCI Passthrough in vSphere?

If you do, you need to undo that.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#2
Posted 12/15/2015 04:51 PM   
Hi Jason, No It's not configured for passthrough in VMware
Hi Jason,
No It's not configured for passthrough in VMware

#3
Posted 12/16/2015 01:47 PM   
"we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3" Is that CPU socket populated?
"we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3"

Is that CPU socket populated?

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#4
Posted 12/16/2015 04:21 PM   
Hi Jason, Yes the CPU socket is populated - but thanks
Hi Jason,

Yes the CPU socket is populated - but thanks

#5
Posted 12/22/2015 10:49 AM   
I've seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc) If you have established no hardware issues or compatibility problems then checking the driver version might be an option
I've seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc)

If you have established no hardware issues or compatibility problems then checking the driver version might be an option

#6
Posted 12/25/2015 11:57 AM   
I've seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc) If you have established no hardware issues or compatibility problems then checking the driver version might be an option
I've seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc)

If you have established no hardware issues or compatibility problems then checking the driver version might be an option

#7
Posted 12/25/2015 11:58 AM   
Scroll To Top

Add Reply