NVIDIA
Horizon View Tesla M10 - Poor Performance
Hello, I should start by mentioning I have cases open with VMware (the horizon view team, esxi team, vcentre team) and I just opened a case with nvidia enterprise support. I'm exhausting all options. I have a vsphere 6.7 environment using the Tesla M10 GPUs in Shared Direct mode. I have the latest 410 driver release. I am using the 4Q profile for all desktops, and each desktop is configured with 16GB of memory (all reserved) 8 vCPU (as recommend by vmware for autocad) and paravirtual disks for better disk performance. I've had lots of different issues, some related some not, but I will stick with the facts. - Users are on a dual screen configuration (1920x1080) - Some users use H264 decoding on their laptop for those that have a GPU that supports it, others do not - Users are hardwired - Users connect to the View environment via an IPSEC VPN Tunnel (traffic is UDP) using vmware blast, they connect to the unified access gateway, which connects to a nginx load balancer, which connects to the view connection server (There are two) Users are reporting poor graphical performance and even general performance. Lag/delays in typing, poor performance in revit/autocad. It seems to have gotten worse after the initial deployment. I have been trying to reproduce some of the issues from my office but they seem fairly inconsistent. Users report the poor performance even in the evening when 10 people or less are on the system. I've been reviewing esxtop to see if there is CPU contention, I am monitoring both sides of the VPN tunnel to see the traffic flow (we arent using any QOS but the local connection to the internet is only used for Horizon). I have been using the various nvidia-smi commands but GPU utilization always seems relatively low (50% or less). I've been monitoring the performance from within the virtual desktop as well. One curious thing I've noted is that the VM seems to think it has 2GB dedicated vram when the GPU profile should have 4GB. I also have weird issues were I cannot power on the base image because it says insufficient graphics resources available to the parent pool, however when I try to increase the size of the pool I can provision a new VM no problem... and all the KB articles related to this error don't seem to help me. I honestly did not expect so many issues with this technology as I thought it was fairly mature by now, but clearly I must be missing something here?
Hello,

I should start by mentioning I have cases open with VMware (the horizon view team, esxi team, vcentre team) and I just opened a case with nvidia enterprise support. I'm exhausting all options.

I have a vsphere 6.7 environment using the Tesla M10 GPUs in Shared Direct mode. I have the latest 410 driver release. I am using the 4Q profile for all desktops, and each desktop is configured with 16GB of memory (all reserved) 8 vCPU (as recommend by vmware for autocad) and paravirtual disks for better disk performance.

I've had lots of different issues, some related some not, but I will stick with the facts.

- Users are on a dual screen configuration (1920x1080)
- Some users use H264 decoding on their laptop for those that have a GPU that supports it, others do not
- Users are hardwired
- Users connect to the View environment via an IPSEC VPN Tunnel (traffic is UDP) using vmware blast, they connect to the unified access gateway, which connects to a nginx load balancer, which connects to the view connection server (There are two)

Users are reporting poor graphical performance and even general performance. Lag/delays in typing, poor performance in revit/autocad. It seems to have gotten worse after the initial deployment. I have been trying to reproduce some of the issues from my office but they seem fairly inconsistent. Users report the poor performance even in the evening when 10 people or less are on the system.

I've been reviewing esxtop to see if there is CPU contention, I am monitoring both sides of the VPN tunnel to see the traffic flow (we arent using any QOS but the local connection to the internet is only used for Horizon). I have been using the various nvidia-smi commands but GPU utilization always seems relatively low (50% or less). I've been monitoring the performance from within the virtual desktop as well. One curious thing I've noted is that the VM seems to think it has 2GB dedicated vram when the GPU profile should have 4GB.

I also have weird issues were I cannot power on the base image because it says insufficient graphics resources available to the parent pool, however when I try to increase the size of the pool I can provision a new VM no problem... and all the KB articles related to this error don't seem to help me.

I honestly did not expect so many issues with this technology as I thought it was fairly mature by now, but clearly I must be missing something here?

#1
Posted 02/07/2019 04:49 PM   
Hi, first of all I would recommend to start to dig deeper with the right tools like GPUProfiler to find out a possible bottleneck. I would also check with nvidia-smi encodersessions to see how NVENC works. You need to find out if the GPU renders enough frames and afterwards you need to check what the remoting stack delivers to the endpoint. You didn't even mention the protocol you are using. What is the poor performance in Revit/AutoCAD? You are mentioning a lot of different things and I'm sure there are different root causes. Important questions are: -Which remoting protocol? Blast Extreme 420 or 444? PCoIP? -Which endpoint? Endpoint capable to decode H.264 in hardware? -Policy set? 30FPS or 60FPS? -Describe poor performance -I don't understand how you think you have 2GB vRAM with a 4Q profile. For sure you have 4GB with a 4Q profile! regards Simon
Hi,

first of all I would recommend to start to dig deeper with the right tools like GPUProfiler to find out a possible bottleneck. I would also check with nvidia-smi encodersessions to see how NVENC works. You need to find out if the GPU renders enough frames and afterwards you need to check what the remoting stack delivers to the endpoint. You didn't even mention the protocol you are using. What is the poor performance in Revit/AutoCAD?
You are mentioning a lot of different things and I'm sure there are different root causes.
Important questions are:
-Which remoting protocol? Blast Extreme 420 or 444? PCoIP?
-Which endpoint? Endpoint capable to decode H.264 in hardware?
-Policy set? 30FPS or 60FPS?
-Describe poor performance
-I don't understand how you think you have 2GB vRAM with a 4Q profile. For sure you have 4GB with a 4Q profile!

regards Simon

#2
Posted 02/09/2019 08:38 PM   
OP, We notice the same thing with the TESLA M10 cards, and they have not performed well at all. We have a sizable VDI environment and it is growing, but MATLAB, CAD, Abaqus, and simulation drawings are suffering heavily. We use the M10_2q profile where we are at, and there is no difference between the 4Q or 2Q, at least that we have seen with our CAD users. And to answer Simon's post: -Which remoting protocol? Blast Extreme 420 or 444? PCoIP? Blast Extreme 444 -Which endpoint? Endpoint capable to decode H.264 in hardware? Yes. We use the 10Zig 5948tq with the latest firmware update. -Policy set? 30FPS or 60FPS? 60 FPS. And yes, our license server is responding, we have not over-provisioned on our Linux and Windows machines. -Describe poor performance Exactly as the OP said. We're going to be purchasing the TESLA V100 instead. The TESLA M10 cards are pretty bad and do not virtualize memory properly like the Turing and PASCAL architectures. -I don't understand how you think you have 2GB vRAM with a 4Q profile. For sure you have 4GB with a 4Q profile! When you run NVIDIA-SMI, yes you see the 4Q profile applied, but it does not scale above 2 GB. We tested this by running a CAD program, then ran watch -n 0.5 nvidia-smi. Sure enough, it did not scale above 2 GB with a 4Q profile. My recommendation is if you have a TESLA M10, dump it and go with a PASCAL architecture or newer if you setup a large scale VDI environment like we have.
OP,

We notice the same thing with the TESLA M10 cards, and they have not performed well at all. We have a sizable VDI environment and it is growing, but MATLAB, CAD, Abaqus, and simulation drawings are suffering heavily. We use the M10_2q profile where we are at, and there is no difference between the 4Q or 2Q, at least that we have seen with our CAD users.

And to answer Simon's post:

-Which remoting protocol? Blast Extreme 420 or 444? PCoIP?

Blast Extreme 444

-Which endpoint? Endpoint capable to decode H.264 in hardware?

Yes. We use the 10Zig 5948tq with the latest firmware update.

-Policy set? 30FPS or 60FPS?

60 FPS. And yes, our license server is responding, we have not over-provisioned on our Linux and Windows machines.

-Describe poor performance

Exactly as the OP said. We're going to be purchasing the TESLA V100 instead. The TESLA M10 cards are pretty bad and do not virtualize memory properly like the Turing and PASCAL architectures.

-I don't understand how you think you have 2GB vRAM with a 4Q profile. For sure you have 4GB with a 4Q profile!

When you run NVIDIA-SMI, yes you see the 4Q profile applied, but it does not scale above 2 GB. We tested this by running a CAD program, then ran watch -n 0.5 nvidia-smi. Sure enough, it did not scale above 2 GB with a 4Q profile.

My recommendation is if you have a TESLA M10, dump it and go with a PASCAL architecture or newer if you setup a large scale VDI environment like we have.

#3
Posted 04/18/2019 05:41 PM   
Hi harryn240, I cannot follow your argumentation. For sure the M10 is not made for Matlab, Abaqus or similar 3D apps but this should be really clear. M10 is made for office VDI. And for sure the M10 can scale up to 4GB of FB usage but when is this really needed? There is a difference with Pascal/Volta and Turing in terms of graphics and compute for vGPU but this is not relevant for the issue described here. Conclusion: You should know your use case and choose the right board. Indeed you should consider V100 if you run Abaqus, Matlab and these type of apps. regards Simon
Hi harryn240,

I cannot follow your argumentation. For sure the M10 is not made for Matlab, Abaqus or similar 3D apps but this should be really clear. M10 is made for office VDI. And for sure the M10 can scale up to 4GB of FB usage but when is this really needed? There is a difference with Pascal/Volta and Turing in terms of graphics and compute for vGPU but this is not relevant for the issue described here.
Conclusion: You should know your use case and choose the right board.
Indeed you should consider V100 if you run Abaqus, Matlab and these type of apps.


regards
Simon

#4
Posted 04/18/2019 05:49 PM   
Scroll To Top

Add Reply