NVIDIA
Can't power on another vGPU enabled VM
Hello, I sometimes get the following error if I power on another VM (10 VMs per host are running): Could not initialize plugin /usr/lib64/vmware/plugin/libnvidia-vpx.so for vGPU " passthrough device 'pciPassthru0' vGPU 'grid_m60-2q' disallowed by vmkernel: Out of memory" The hosts have enough memory and vGPU resources left to power on the VM. The support says it’s an known issue: http://docs.nvidia.com/grid/5.0/grid-vgpu-release-notes-vmware-vsphere/index.html#bug-200060499-vGPU-enabled-VMs-fail-too-much-memory But why can I sometimes power on another VM, e.g. the eleventh and sometimes not? I think it’s another issue. Does anyone have the same problem? Best Regards
Hello,

I sometimes get the following error if I power on another VM (10 VMs per host are running):

Could not initialize plugin /usr/lib64/vmware/plugin/libnvidia-vpx.so for vGPU " passthrough device 'pciPassthru0' vGPU 'grid_m60-2q' disallowed by vmkernel: Out of memory"

The hosts have enough memory and vGPU resources left to power on the VM.
The support says it’s an known issue:

http://docs.nvidia.com/grid/5.0/grid-vgpu-release-notes-vmware-vsphere/index.html#bug-200060499-vGPU-enabled-VMs-fail-too-much-memory

But why can I sometimes power on another VM, e.g. the eleventh and sometimes not? I think it’s another issue.
Does anyone have the same problem?

Best Regards

#1
Posted 02/21/2018 03:07 PM   
Hi, how much system memory have the VMs affected? Regards Simon
Hi,

how much system memory have the VMs affected?

Regards

Simon

#2
Posted 02/22/2018 06:41 AM   
I have the same problem. In my environment with the profile I'm using I [i]should[/i] be able to have 96 VM's running. Sometimes with only 85 VM's provisioned I still get the error you mention. After hammering Power On it will eventually power on. Sometimes I even have to delete another VM before I can power on my parent image. Extremely frustrating.
I have the same problem. In my environment with the profile I'm using I should be able to have 96 VM's running. Sometimes with only 85 VM's provisioned I still get the error you mention. After hammering Power On it will eventually power on. Sometimes I even have to delete another VM before I can power on my parent image.

Extremely frustrating.

#3
Posted 02/22/2018 04:15 PM   
5 VMs with 128 GB system memory 5 VMs with 16 GB system memory In total 720 GB. The hosts have 960 GB system memory.
5 VMs with 128 GB system memory
5 VMs with 16 GB system memory

In total 720 GB.

The hosts have 960 GB system memory.

#4
Posted 02/26/2018 09:27 AM   
So did you file a ticket with Nvidia and VMWare?
So did you file a ticket with Nvidia and VMWare?

#5
Posted 02/26/2018 01:21 PM   
Hello, yes, after the issue came back again i've opened Cases at Nvidia and VMware. Nvidia Case: 00007591 The Nvidia Support Engineer didn't find any errors on Nvidia side. VMware says it's an Known Issue of Nvidia. https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-vmware-vsphere/index.html Nvidia REF: 200060499 Best Regards Georg
Hello,

yes, after the issue came back again i've opened Cases at Nvidia and VMware.

Nvidia Case: 00007591
The Nvidia Support Engineer didn't find any errors on Nvidia side.

VMware says it's an Known Issue of Nvidia.

https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-vmware-vsphere/index.html

Nvidia REF: 200060499

Best Regards
Georg

#6
Posted 05/03/2018 06:59 AM   
Hi Georg, yes it seems you hit the given issue but I disagree that this is a NV issue. From my understanding you need to "fully reserve memory" for the vGPU enabled VMs on ESX. This works until there is not enough system memory available for the hypervisor (VMKernel) any more. There seem to be no rule how much memory needs to be available for hypervisor and therefore only trial and error with reducing the allocated system memory to the VMs seems to help. I'll try to get some advise what we can do here or if this is something that needs to be addressed from VMWare (which I believe) as I've never heard that the same issue occurs on other hypervisors. regards Simon
Hi Georg,

yes it seems you hit the given issue but I disagree that this is a NV issue. From my understanding you need to "fully reserve memory" for the vGPU enabled VMs on ESX. This works until there is not enough system memory available for the hypervisor (VMKernel) any more. There seem to be no rule how much memory needs to be available for hypervisor and therefore only trial and error with reducing the allocated system memory to the VMs seems to help. I'll try to get some advise what we can do here or if this is something that needs to be addressed from VMWare (which I believe) as I've never heard that the same issue occurs on other hypervisors.

regards

Simon

#7
Posted 05/03/2018 09:57 AM   
Hello Simon, any news from VMware? Best Regards Georg
Hello Simon,

any news from VMware?

Best Regards
Georg

#8
Posted 05/14/2018 01:31 PM   
Scroll To Top

Add Reply