NVIDIA
GRID vGPU Scalability Guide
Sharing the [url=http://www.nvidia.com/content/grid/resources/AutoCAD_GRID_vGPU_Scalability_Solutions_Guide.pdf]Autodesk AutoCAD GRID vGPU scalability guide[/url]. For additional resources visit our [url=http://www.nvidia.com/object/grid-resources.html]GRID resources webpage[/url]. Curious to hear from forum users if you have any direct experieince with vGPU scalability?
Sharing the Autodesk AutoCAD GRID vGPU scalability guide.

For additional resources visit our GRID resources webpage.

Curious to hear from forum users if you have any direct experieince with vGPU scalability?

#1
Posted 05/28/2014 08:48 PM   
Victoria, Thank you for the very interesting posting. I'd like to make a number of observations and comments. There seems to be a leaning towards not necessarily mapping VMs to vGPUs for situations where a lot of CPU plus graphics power need to be leveraged, and so what would be very interesting would be to see how a GRID K1 or K2 with GPU passthrough stacked up against the vGPU mapping model. I suspect a GRID K2, divided among two hefty XenApp 7.5 servers, each with eight or more VCPUs and 64 GB or more memory, could do as well or better than a one-to-one mapping of XenDesktop VMs to vGPUs. The end user's client/Receiver is also going to potentially make a big difference. Above all, one apparent advantage -- in particular when mapping to VMs involving thin or zero clients -- is that in many cases, they have to depend on the video driver leveraged within the XenDesktop VM itself; for Windows 7, this would be Cirrus, and for Windows 8.X clients, the VGA driver. This test did not apparently specify what VM client OS was used, but note that the Cirrus driver uses a fixed 4 MB buffer (standard under Windows 7), while the VGA driver uses a default of 8 MB (standard under Windows 8.X, as well as 2012/2012 R2). Therefore, the results when using a vGPU will very possibly vary, depending on whether the XenDesktop VMs were running Windows 7 vs. a newer Windows release. Note that with the VGA driver, the buffer can be increased up to 16 MB on any VM running the VGA driver, which in particular for 32-bit graphics, can make a large difference. Perhaps more importantly, I am pretty convinced that a setup with GPU passthrough will bypass this Cirrus vs. VGA driver issue altogether, as the rendering work is done on the XenApp server side and the video just pushed through. It would also be expected that even with that many concurrent users, that the GPU engine could potentially be better leveraged than with the vGPU distribution. It would also mean that the upper mapping limit for vGPUs dictated by the number of slots per engine would not pertain. Since in some cases, the drop in performance was not even all that great with as many VMs that could be mapped to a GPU engine, one could envision perhaps even 40 users taking advantage of all four engines on a K1 leveraging a pure GPU passthrough approach (running four XenApp instances). It should also be noted that a typical XenDesktop VM doing a lot of graphics work is going to easily have four VCPUs designated (as indeed specified in this study), so in a server with, say, only 32 VCPUs, 8 users are going to make use of 32 VPCUs -- as many the server has -- with the consequence that CPU over-utilization will kick in. If you look at the graph presented in Example 4 in the article, note that there is a distinct initial drop in performance between one and eight users for the K100 case (-31%) and beyond that, adding more users has proportionately a much smaller effect. I would venture to say that a XenServer host with more VCPUs probably could have managed somewhat better, plus it may also depend on how the XenServer was configured (the number of dom0 instances and memory per dom0). With a XenApp server dedicated to just processing the Autocad instances, you would think such a server should be able to scale better. The other interesting case would be that of the "Goldilocks" K120Q configuration, which should come close to the K140Q in performance specs while allowing for as many vGPU instances as the K100. It is too bad that specific configuration was not tested, as it may yield the greatest density with reasonable performance. It was, in fact, rather surprising how well the K140Q setup performed vs the K240Q, especially taking the cost difference into account between a GRID K1 and a K2. With GPU passthrough, however, the higher CUDA rating of the K2 would likely make a much greater difference, though, vs. the pure vGPU comparisons. Finally, in the academic environment in which I am currently engaged, there are many situations where indeed "[i]every single provisioned VM is going to be under a high demand workload at any given moment in time[/i]." A lab where the instructor has students all go through the same exercise at the same time is a very realistic (and real!) scenario. Nonetheless, this study does give some general indications on scalability and I, for one, would very much like to see a vGPU vs. GPU passthrough bake off, plus also see how much the division of labor between letting the XenDesktop VM do all the work leveraging vGPUs vs. delegating much of the load to a XenApp instance using GPU passthrough. Thanks very much again for this most interesting article.
Victoria,
Thank you for the very interesting posting. I'd like to make a number of observations and comments.

There seems to be a leaning towards not necessarily mapping VMs to vGPUs for situations where a lot of CPU plus graphics power need to be leveraged, and so what would be very interesting would be to see how a GRID K1 or K2 with GPU passthrough stacked up against the vGPU mapping model. I suspect a GRID K2, divided among two hefty XenApp 7.5 servers, each with eight or more VCPUs and 64 GB or more memory, could do as well or better than a one-to-one mapping of XenDesktop VMs to vGPUs. The end user's client/Receiver is also going to potentially make a big difference. Above all, one apparent advantage -- in particular when mapping to VMs involving thin or zero clients -- is that in many cases, they have to depend on the video driver leveraged within the XenDesktop VM itself; for Windows 7, this would be Cirrus, and for Windows 8.X clients, the VGA driver. This test did not apparently specify what VM client OS was used, but note that the Cirrus driver uses a fixed 4 MB buffer (standard under Windows 7), while the VGA driver uses a default of 8 MB (standard under Windows 8.X, as well as 2012/2012 R2). Therefore, the results when using a vGPU will very possibly vary, depending on whether the XenDesktop VMs were running Windows 7 vs. a newer Windows release. Note that with the VGA driver, the buffer can be increased up to 16 MB on any VM running the VGA driver, which in particular for 32-bit graphics, can make a large difference.

Perhaps more importantly, I am pretty convinced that a setup with GPU passthrough will bypass this Cirrus vs. VGA driver issue altogether, as the rendering work is done on the XenApp server side and the video just pushed through. It would also be expected that even with that many concurrent users, that the GPU engine could potentially be better leveraged than with the vGPU distribution. It would also mean that the upper mapping limit for vGPUs dictated by the number of slots per engine would not pertain. Since in some cases, the drop in performance was not even all that great with as many VMs that could be mapped to a GPU engine, one could envision perhaps even 40 users taking advantage of all four engines on a K1 leveraging a pure GPU passthrough approach (running four XenApp instances). It should also be noted that a typical XenDesktop VM doing a lot of graphics work is going to easily have four VCPUs designated (as indeed specified in this study), so in a server with, say, only 32 VCPUs, 8 users are going to make use of 32 VPCUs -- as many the server has -- with the consequence that CPU over-utilization will kick in. If you look at the graph presented in Example 4 in the article, note that there is a distinct initial drop in performance between one and eight users for the K100 case (-31%) and beyond that, adding more users has proportionately a much smaller effect. I would venture to say that a XenServer host with more VCPUs probably could have managed somewhat better, plus it may also depend on how the XenServer was configured (the number of dom0 instances and memory per dom0). With a XenApp server dedicated to just processing the Autocad instances, you would think such a server should be able to scale better.

The other interesting case would be that of the "Goldilocks" K120Q configuration, which should come close to the K140Q in performance specs while allowing for as many vGPU instances as the K100. It is too bad that specific configuration was not tested, as it may yield the greatest density with reasonable performance. It was, in fact, rather surprising how well the K140Q setup performed vs the K240Q, especially taking the cost difference into account between a GRID K1 and a K2. With GPU passthrough, however, the higher CUDA rating of the K2 would likely make a much greater difference, though, vs. the pure vGPU comparisons.

Finally, in the academic environment in which I am currently engaged, there are many situations where indeed "every single provisioned VM is going to be under a high demand workload at any given moment in time." A lab where the instructor has students all go through the same exercise at the same time is a very realistic (and real!) scenario. Nonetheless, this study does give some general indications on scalability and I, for one, would very much like to see a vGPU vs. GPU passthrough bake off, plus also see how much the division of labor between letting the XenDesktop VM do all the work leveraging vGPUs vs. delegating much of the load to a XenApp instance using GPU passthrough.

Thanks very much again for this most interesting article.

-=Tobias

#2
Posted 05/30/2014 06:58 AM   
Scroll To Top

Add Reply