Artificial Intelligence Computing Leadership from NVIDIA
vGPU 11 - Error during VM start - vSphere
Hello, during the start of a VM on vSphere 6.7 and latest NVIDIA vGPU 11.0 the tasks fails: Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_t4-8q'. The host is fresh installed and configured the same way as the other hosts in the cluster (where everything is working). Same issue with vGPU 10.3. Version 10.2 is working.
Hello,

during the start of a VM on vSphere 6.7 and latest NVIDIA vGPU 11.0 the tasks fails: Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_t4-8q'.
The host is fresh installed and configured the same way as the other hosts in the cluster (where everything is working).
Same issue with vGPU 10.3. Version 10.2 is working.

#1
Posted 07/18/2020 12:13 PM   
The 3 things I'd look for is [olist] [.]Have I configured the resource for vGPU (as opposed to passthrough)[/.] [.]Did i disable ecc on the GPU (believe this is required for pascal based cards)[/.] [.]Do the GPU's have other VM's w/ different profile types working on them?[/.] [/olist] Are there other vm's on the host with a different
The 3 things I'd look for is
  1. Have I configured the resource for vGPU (as opposed to passthrough)
  2. Did i disable ecc on the GPU (believe this is required for pascal based cards)
  3. Do the GPU's have other VM's w/ different profile types working on them?



Are there other vm's on the host with a different

#2
Posted 07/20/2020 01:27 PM   
Hi Korbinian, we run into the same issue when updating our 4 ESXI hosts from 10.2 to 11.0. Only one host got the same issue. we resolved it by uninstalling and re-installing grid manager on this machine. please also note that ECC has to be disabled. before 11.0 it was a known issue from nvidia that some machines cannot start (this was the reason we upgraded). hope this helps! regards, sandro
Hi Korbinian,

we run into the same issue when updating our 4 ESXI hosts from 10.2 to 11.0. Only one host got the same issue. we resolved it by uninstalling and re-installing grid manager on this machine. please also note that ECC has to be disabled. before 11.0 it was a known issue from nvidia that some machines cannot start (this was the reason we upgraded). hope this helps!

regards,
sandro

#3
Posted 07/30/2020 11:31 AM   
Scroll To Top

Add Reply