Artificial Intelligence Computing Leadership from NVIDIA
ESXi 6.7 U2, Tesla M10 - Cannot use all available GPU ressources
Hi Guys. I have 3 Dell PowerEdge 740 servers with 2x Tesla M10 in each server. All hosts are running ESXi 6.7 U2 with "NVIDIA-VMware_ESXi_6.7_Host_Driver" vib in version 430.27-1OEM.670.0.0.8169922. Graphics type for all hosts is set to "Shared Direct". In general the GPUs and vGPUs are recognized by the guest operating system (Win Server 2016/2019) and can be used. But apparently there are 2 problems which have the same effect. With 3 hosts, 2x M10 in each host, I should be able to deploy 12 VMs with "M10-8A" profile. Right? Or 24 VMs with "M10-4A" profile, for example. But at the moment I only have the following vGPU assignments in the cluster: 3x M10-8A 1x M10-4A 3x M10-2A And I cannot start another VM with a "M10-4A" vGPU profile... If I look at the GPUs and which VMs are running on them, then I see that all GPUs except 1 have 0 bytes memory and all VMs on this host are running on the same GPU. The other 2 hosts show all available memory (8x 8 GB), but only 2 VMs are running on these hosts, each VM on a vGPU. I'm completely confused why I can't start more virtual machines with vGPUs in this cluster. Maybe someone has an idea. Thank you all.
Hi Guys.

I have 3 Dell PowerEdge 740 servers with 2x Tesla M10 in each server. All hosts are running ESXi 6.7 U2 with "NVIDIA-VMware_ESXi_6.7_Host_Driver" vib in version 430.27-1OEM.670.0.0.8169922.

Graphics type for all hosts is set to "Shared Direct".

In general the GPUs and vGPUs are recognized by the guest operating system (Win Server 2016/2019) and can be used.

But apparently there are 2 problems which have the same effect.

With 3 hosts, 2x M10 in each host, I should be able to deploy 12 VMs with "M10-8A" profile. Right? Or 24 VMs with "M10-4A" profile, for example.

But at the moment I only have the following vGPU assignments in the cluster:
3x M10-8A
1x M10-4A
3x M10-2A

And I cannot start another VM with a "M10-4A" vGPU profile...

If I look at the GPUs and which VMs are running on them, then I see that all GPUs except 1 have 0 bytes memory and all VMs on this host are running on the same GPU.

The other 2 hosts show all available memory (8x 8 GB), but only 2 VMs are running on these hosts, each VM on a vGPU.

I'm completely confused why I can't start more virtual machines with vGPUs in this cluster.

Maybe someone has an idea.

Thank you all.

#1
Posted 04/05/2020 09:30 AM   
Hi If you're using the "A" profile with M10s, the only profile you should be running is 8A. Because of the type of use case recommended for the "A" profile, it makes no sense to run a smaller profile. What's probably happened in your environment, is your vCenter vGPU setting is set for "Performance" instead of "Density" and it's spread the VMs across your M10s so you can't start another VM. Either allocate all your vGPU profiles to 8A or change the vCenter setting to "Density" and see if that helps. Regards MG
Hi

If you're using the "A" profile with M10s, the only profile you should be running is 8A. Because of the type of use case recommended for the "A" profile, it makes no sense to run a smaller profile.

What's probably happened in your environment, is your vCenter vGPU setting is set for "Performance" instead of "Density" and it's spread the VMs across your M10s so you can't start another VM. Either allocate all your vGPU profiles to 8A or change the vCenter setting to "Density" and see if that helps.

Regards

MG

#2
Posted 04/05/2020 11:01 AM   
Hi MrGRID Thank you very much for the prompt reply. I changed the "Shared passthrough GPU assignment policy" for each host from "Spread VMs" (performance) to "Group VMs" (consolidation). And at the moment it looks good. I will test it further. And I will also think about changing all profiles to 8A. Thanks again. :-)
Hi MrGRID

Thank you very much for the prompt reply.

I changed the "Shared passthrough GPU assignment policy" for each host from "Spread VMs" (performance) to "Group VMs" (consolidation). And at the moment it looks good. I will test it further.

And I will also think about changing all profiles to 8A.

Thanks again. :-)

#3
Posted 04/05/2020 11:52 AM   
Scroll To Top

Add Reply