Artificial Intelligence Computing Leadership from NVIDIA
Putting best foot forward
Hi all - forgive me if this has all been asked and answered - I did try looking! Scenario:- vSphere 6.0U3 and above across a number of hosts. Horizon View 7.x RDSH Windows 2012 and above (being migrated over time to 2019) HP based site RDSH sessions accessed through View Client and a little BLAST. 1800-ish users of which around 1,000 are on at any time. The vast majority of users are 'Knowledge Users' - so Email, Office, bit of web usage etc. This system operates for 90%+ of users just fine and we can manage/maintain/deploy etc etc all fine and dandy. Here comes the question :-) We've been struggling with the last 10%-ish of users in terms of GPU access. We don't have 'high-end' CAD users but the occasional user wants to use mapping software, or someone else may want to make a promotional video or some-such, so, we've begun dipping our toes into NVIDIA GPU world...... And struggling to be honest! I think we're at the point where we have two options...M10s for 'everyone' (I believe these offer maximum density and whilst they are fine at the moment we need to consider moving forwards) and T4s for the more 'power users'. I understand that we need to licence the software (vPC stuff?) per CCU - so in our instance would we need 1,800 licences or 1,000? I've been reading about core count vs clock speed - Am I correct in assuming that anything over 3.0GHz base (not turbo) should be just fine? What's really confusing us is the relationship between the Physical Hosts, the virtual Windows Server 201x RDSH hosts (more than one per physical host) and the actual sessions people connect to (many per virtual 201x host). For example we have a pool of 522 users spread across 35 virtual Windows Server 201x guests on 9 physical hosts. In addition the same hosts support a (small-ish) number of 'real' VM's for people who have 'wonky stuff' to run :-) My further understanding is that a Vib is needed within vSphere/ESX and that driver software is needed on the RDSH guests - is this correct? Does anyone have any thoughts/recommendations for pointing us in a direction of travel that means we can get to where we need to be - supporting the 90% people to have a 'better' experience and future proofing us and allowing the 10% of people actually do what they need to do? I hope this is enough information - if not, ask and I'll get what you need! Many thanks in advance Pat
Hi all - forgive me if this has all been asked and answered - I did try looking!

Scenario:-
vSphere 6.0U3 and above across a number of hosts.
Horizon View 7.x
RDSH Windows 2012 and above (being migrated over time to 2019)
HP based site
RDSH sessions accessed through View Client and a little BLAST.
1800-ish users of which around 1,000 are on at any time.

The vast majority of users are 'Knowledge Users' - so Email, Office, bit of web usage etc.

This system operates for 90%+ of users just fine and we can manage/maintain/deploy etc etc all fine and dandy.

Here comes the question :-)

We've been struggling with the last 10%-ish of users in terms of GPU access. We don't have 'high-end' CAD users but the occasional user wants to use mapping software, or someone else may want to make a promotional video or some-such, so, we've begun dipping our toes into NVIDIA GPU world......

And struggling to be honest!

I think we're at the point where we have two options...M10s for 'everyone' (I believe these offer maximum density and whilst they are fine at the moment we need to consider moving forwards) and T4s for the more 'power users'.

I understand that we need to licence the software (vPC stuff?) per CCU - so in our instance would we need 1,800 licences or 1,000?

I've been reading about core count vs clock speed - Am I correct in assuming that anything over 3.0GHz base (not turbo) should be just fine?

What's really confusing us is the relationship between the Physical Hosts, the virtual Windows Server 201x RDSH hosts (more than one per physical host) and the actual sessions people connect to (many per virtual 201x host). For example we have a pool of 522 users spread across 35 virtual Windows Server 201x guests on 9 physical hosts.

In addition the same hosts support a (small-ish) number of 'real' VM's for people who have 'wonky stuff' to run :-)

My further understanding is that a Vib is needed within vSphere/ESX and that driver software is needed on the RDSH guests - is this correct?

Does anyone have any thoughts/recommendations for pointing us in a direction of travel that means we can get to where we need to be - supporting the 90% people to have a 'better' experience and future proofing us and allowing the 10% of people actually do what they need to do?

I hope this is enough information - if not, ask and I'll get what you need!

Many thanks in advance

Pat

#1
Posted 02/19/2020 11:13 AM   
Hey Pat Firstly, before you do anything hardware related, check over the application requirements to see how it uses the resources. It may be able to use multiple Cores, it may put more emphasis on the GPU or it may just want a single high speed core, or a bit of everything. If you can't find it in the system requirements documentation, contact the software vendor and ask them directly, you can then look at appropriate hardware to support it. If the application is single thread limited, then yes, to make life much easier for yourself, focus on the base Clock and forget Turbo. It's obviously possible to get Turbo to work in a virtualised environment, but there are many hoops to jump through, and to be honest, it's not worth the hassle. Then you're into Single Core Boost vs All Core Boost (they have different levels of Boost depending on how many Cores Boost at the same time, and this is dependant on all sorts of variables as well) and trying to keep them engaged so you don't get peaky performance, it's just a real pain to work with when virtualising and it's far easier just to opt for a high Base Clock from the start and not have to worry about it. For CAD applications that are predominantly single threaded, 3.0Ghz would be the minimum I'd work with whilst maintaining a balance vs Cores. For example, there are Xeons that have a higher base Clock, but you're compromised on Core count. The best balance at the moment is the Gen 2 Scalable Xeon Gold 6254, which is 18 Cores @ 3.1Ghz. You can obviously trade up or down (Core vs Clock) depending on your requirements, but this is the sweet spot in my opinion. Regarding vGPU Licensing ... It's really simple ... vApps, vPC, QvDWS are all licensed Per CCU. So if you have 1800 users, but only 1000 connect to the platform at any one time, then you only need 1000 licenses. As for the type of license, if you're running RDSH you'll want vApps, for your normal single user Windows VMs, you'll need vPC and for your 3D Workstations you'll potentially want QvDWS depending on the application requirements. All of those are Per CCU. Honestly ... Forget the M10 at this stage. You can absolutely still buy it and it will be supported, but unless you already have them and are looking to scale out an already existing M10 deployment, give them a miss at this stage. They're superseded 3 times architecture wise and are lacking in features and functionality compared to the current generation. From what you've said above about the majority of your user base being Knowledge Workers (Now referred to as "Digital Workers") and that you don't have any high-end CAD users, you should be looking at T4s for all of your workloads. You can use the same model of GPU, but use the Profiles to allow different amounts of performance, for example: RDSH VMs will use the T4-8A and you'll have 2 of those per T4. Single User VMs will have either the T4-1B or T4-2B depending on your requirements, and you'll have 8 or 16 of those per T4. 3D Single User VMs "could" have T4-4Q and you'll have 4 of those per T4. You mention HP above (I'm assuming this is your server platform), but don't mention what your server hardware or generation is (DL380 Gx .. ?). If you were planning on purchasing completely new HP hardware, then you can speak to your partner about the appropriate configuration, if retro fitting GPUs into existing servers, firstly make sure they're supported, the BIOS and Firmware are fully updated, then purchase appropriate PSUs, PCIe Risers, GPU Enablement Kits (low profile heatsink for CPUs, and GPU power cables if needed) and potentially high-performance fans and also the GPUs. When working with vGPU, it's important to have an up to date software stack, this includes Hypervisor, Management (vCenter in your case), Operating System and vGPU Drivers. This prevents a lot of issues and allows the best performance and functionality. Before proceeding with vGPU, you need to check your licensing situation. For VMware, at a minimum you'll want vCenter Standard and vSphere Enterprise and ideally, you'll be running 6.7U3. Before setting up vGPU, make sure you get your vGPU License Server built and then vGPU licenses ordered through your Partner. Licenses can sometimes take up to 24hrs to arrive in your NVIDIA Portal, so best to get that kicked off from the start as your vGPUs have very limited (unusable) performance until licensed. As per your question, the vGPU software comes in two parts. A .vib is required to be installed in each physical vSphere Host that will be running vGPU (this is referred to as the vGPU Manager), this will allow you to allocate vGPU profiles to your VMs through vCenter. The second part is a driver that goes inside the Windows / Linux OS. Regarding configuration, it's relatively strait forward with some extremely basic maths to workout approximate user capacity per VM as a ballpark number to aim for pending a POC where you can firm things up. These are very general guidelines, and there are [b][u]a lot[/u][/b] of variables to consider, but on [b][u]average[/u][/b], you should be aiming for 20 - 25 concurrent users per vGPU enabled RDSH VM (sometimes it's slightly more, sometimes it's slightly less), and 2 VMs per T4. This is really going to depend on your applications, how the users use those applications, platform and system hardware including hardware generations and overall optimisations etc etc. If we take the lower of those two numbers (which combined is 40 (2 VMs each with 20 users)) as a working number, we can workout how many GPUs we'll need to support 1000 concurrent users, and by default, how may servers we'll need to support X number of GPUs. 1000 (Users) / 40 (Users Per GPU) = 25 (GPUs Required) (This bit would be handy to know your server hardware and generation, so I'm just going to use the current Gen10 ...) HP DL380 G10 will support 5 T4 GPUs ( https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/ ). 25 (GPUs) / 5 (The capacity of a DL380 G10) = 5 (DL380 G10 each with 5 T4 GPUs, each supporting 40 users) So what this means, is that if you're able to support 20 users per RDSH VM, and get 2 of those on 1 T4, then you'll want 25 T4 GPUs, and 5 DL380 G10 servers to support it. This scales up / down either way. If you can get 25 users on a VM, that's 50 users per T4, long story short, you only need 4 DL380s instead of 5. If you can only get 4 T4s in your existing server hardware, then you're going to need more DL380s etc etc ... And don't forget your N+1 for resilience and maintenance windows, or every time you want to work on a Host, you'll be doing it out of hours, or removing it from service and impacting your total usable capacity ... ;-) FYI, your limiting factor will be vGPU Framebuffer, as you'll split the T4 into 2 8GB vGPU Profiles and run the RDSH VMs with 8GB, and you'll be using the T4-8A vGPU Profile. Now, as you'll also have single user desktop VMs, as well as your 3D Workstation VMs, what I would do is run all of your RDSH VMs in a dedicated vSphere Cluster, that way there are no vGPU Profile issues and it makes scaling, migration and management really easy. Then create a second vSphere Cluster for your single user VMs that have a different vGPU Profile (Probably T4-2B), and depending on how many 3D Workstations you need you may even want a 3rd vSphere Cluster to support that vGPU Profile. It is possible to manage that from a single Cluster using the breadth first / depth first configuration settings in vCenter (Performance vs Density), and if you wanted to do that, you'd be constantly running in "Density" mode, so up to you how you'd like to configure it. The reason it can be better to split out the vGPU Profiles into dedicated Clusters, is that you can't mix Profiles on the same physical GPU. So if a VM with a particular profile starts up on a GPU, only other VMs with the same Profile can use that specific GPU. Despite a lot of customer frustration, this does make sense. For example, a vPC workload assumes a different workload to QvDWS or vApps, so why would you run those on the same GPU, or in some instances, even the same physical server? We also need to remember, that this isn't about cramming as many users onto a single physical server as is humanly possible, this is about giving the best user experience whilst still delivering a cost effective solution. If anyone wants to cram users on to the platform to the detriment of the user experience, then simply remove the GPU from the system altogether and you can do just that. Right, I've rattled on for long enough. I hope at least some of that is useful. As said, recalculate some of those numbers and apply what's useful to your own environment and hardware. Let me know if you'd like more detail about anything ... Regards MG
Hey Pat

Firstly, before you do anything hardware related, check over the application requirements to see how it uses the resources. It may be able to use multiple Cores, it may put more emphasis on the GPU or it may just want a single high speed core, or a bit of everything. If you can't find it in the system requirements documentation, contact the software vendor and ask them directly, you can then look at appropriate hardware to support it.

If the application is single thread limited, then yes, to make life much easier for yourself, focus on the base Clock and forget Turbo. It's obviously possible to get Turbo to work in a virtualised environment, but there are many hoops to jump through, and to be honest, it's not worth the hassle. Then you're into Single Core Boost vs All Core Boost (they have different levels of Boost depending on how many Cores Boost at the same time, and this is dependant on all sorts of variables as well) and trying to keep them engaged so you don't get peaky performance, it's just a real pain to work with when virtualising and it's far easier just to opt for a high Base Clock from the start and not have to worry about it. For CAD applications that are predominantly single threaded, 3.0Ghz would be the minimum I'd work with whilst maintaining a balance vs Cores. For example, there are Xeons that have a higher base Clock, but you're compromised on Core count. The best balance at the moment is the Gen 2 Scalable Xeon Gold 6254, which is 18 Cores @ 3.1Ghz. You can obviously trade up or down (Core vs Clock) depending on your requirements, but this is the sweet spot in my opinion.

Regarding vGPU Licensing ... It's really simple ... vApps, vPC, QvDWS are all licensed Per CCU. So if you have 1800 users, but only 1000 connect to the platform at any one time, then you only need 1000 licenses. As for the type of license, if you're running RDSH you'll want vApps, for your normal single user Windows VMs, you'll need vPC and for your 3D Workstations you'll potentially want QvDWS depending on the application requirements. All of those are Per CCU.

Honestly ... Forget the M10 at this stage. You can absolutely still buy it and it will be supported, but unless you already have them and are looking to scale out an already existing M10 deployment, give them a miss at this stage. They're superseded 3 times architecture wise and are lacking in features and functionality compared to the current generation. From what you've said above about the majority of your user base being Knowledge Workers (Now referred to as "Digital Workers") and that you don't have any high-end CAD users, you should be looking at T4s for all of your workloads. You can use the same model of GPU, but use the Profiles to allow different amounts of performance, for example:

RDSH VMs will use the T4-8A and you'll have 2 of those per T4.

Single User VMs will have either the T4-1B or T4-2B depending on your requirements, and you'll have 8 or 16 of those per T4.

3D Single User VMs "could" have T4-4Q and you'll have 4 of those per T4.

You mention HP above (I'm assuming this is your server platform), but don't mention what your server hardware or generation is (DL380 Gx .. ?). If you were planning on purchasing completely new HP hardware, then you can speak to your partner about the appropriate configuration, if retro fitting GPUs into existing servers, firstly make sure they're supported, the BIOS and Firmware are fully updated, then purchase appropriate PSUs, PCIe Risers, GPU Enablement Kits (low profile heatsink for CPUs, and GPU power cables if needed) and potentially high-performance fans and also the GPUs. When working with vGPU, it's important to have an up to date software stack, this includes Hypervisor, Management (vCenter in your case), Operating System and vGPU Drivers. This prevents a lot of issues and allows the best performance and functionality. Before proceeding with vGPU, you need to check your licensing situation. For VMware, at a minimum you'll want vCenter Standard and vSphere Enterprise and ideally, you'll be running 6.7U3.

Before setting up vGPU, make sure you get your vGPU License Server built and then vGPU licenses ordered through your Partner. Licenses can sometimes take up to 24hrs to arrive in your NVIDIA Portal, so best to get that kicked off from the start as your vGPUs have very limited (unusable) performance until licensed.

As per your question, the vGPU software comes in two parts. A .vib is required to be installed in each physical vSphere Host that will be running vGPU (this is referred to as the vGPU Manager), this will allow you to allocate vGPU profiles to your VMs through vCenter. The second part is a driver that goes inside the Windows / Linux OS.

Regarding configuration, it's relatively strait forward with some extremely basic maths to workout approximate user capacity per VM as a ballpark number to aim for pending a POC where you can firm things up. These are very general guidelines, and there are a lot of variables to consider, but on average, you should be aiming for 20 - 25 concurrent users per vGPU enabled RDSH VM (sometimes it's slightly more, sometimes it's slightly less), and 2 VMs per T4. This is really going to depend on your applications, how the users use those applications, platform and system hardware including hardware generations and overall optimisations etc etc. If we take the lower of those two numbers (which combined is 40 (2 VMs each with 20 users)) as a working number, we can workout how many GPUs we'll need to support 1000 concurrent users, and by default, how may servers we'll need to support X number of GPUs.

1000 (Users) / 40 (Users Per GPU) = 25 (GPUs Required)

(This bit would be handy to know your server hardware and generation, so I'm just going to use the current Gen10 ...)

HP DL380 G10 will support 5 T4 GPUs ( https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/ ).

25 (GPUs) / 5 (The capacity of a DL380 G10) = 5 (DL380 G10 each with 5 T4 GPUs, each supporting 40 users)

So what this means, is that if you're able to support 20 users per RDSH VM, and get 2 of those on 1 T4, then you'll want 25 T4 GPUs, and 5 DL380 G10 servers to support it.

This scales up / down either way. If you can get 25 users on a VM, that's 50 users per T4, long story short, you only need 4 DL380s instead of 5. If you can only get 4 T4s in your existing server hardware, then you're going to need more DL380s etc etc ... And don't forget your N+1 for resilience and maintenance windows, or every time you want to work on a Host, you'll be doing it out of hours, or removing it from service and impacting your total usable capacity ... ;-)

FYI, your limiting factor will be vGPU Framebuffer, as you'll split the T4 into 2 8GB vGPU Profiles and run the RDSH VMs with 8GB, and you'll be using the T4-8A vGPU Profile.

Now, as you'll also have single user desktop VMs, as well as your 3D Workstation VMs, what I would do is run all of your RDSH VMs in a dedicated vSphere Cluster, that way there are no vGPU Profile issues and it makes scaling, migration and management really easy. Then create a second vSphere Cluster for your single user VMs that have a different vGPU Profile (Probably T4-2B), and depending on how many 3D Workstations you need you may even want a 3rd vSphere Cluster to support that vGPU Profile. It is possible to manage that from a single Cluster using the breadth first / depth first configuration settings in vCenter (Performance vs Density), and if you wanted to do that, you'd be constantly running in "Density" mode, so up to you how you'd like to configure it.

The reason it can be better to split out the vGPU Profiles into dedicated Clusters, is that you can't mix Profiles on the same physical GPU. So if a VM with a particular profile starts up on a GPU, only other VMs with the same Profile can use that specific GPU. Despite a lot of customer frustration, this does make sense. For example, a vPC workload assumes a different workload to QvDWS or vApps, so why would you run those on the same GPU, or in some instances, even the same physical server? We also need to remember, that this isn't about cramming as many users onto a single physical server as is humanly possible, this is about giving the best user experience whilst still delivering a cost effective solution. If anyone wants to cram users on to the platform to the detriment of the user experience, then simply remove the GPU from the system altogether and you can do just that.

Right, I've rattled on for long enough. I hope at least some of that is useful. As said, recalculate some of those numbers and apply what's useful to your own environment and hardware. Let me know if you'd like more detail about anything ...

Regards

MG

#2
Posted 02/19/2020 05:11 PM   
MG Thanks so much for the information - helps massively. HP...yes we have a mix HP DL380 Gen9 and Gen 8 ... ideally I'd be looking to get the budget for a number of new machines and buy brand new completely configured HP Gen10's with 5 T4's each as that seems to be the sweet spot. We currently have 0 GPU offering for our users and a handful are 'not very happy' (to quote you, above, 'If anyone wants to cram users on to the platform to the detriment of the user experience, then simply remove the GPU from the system altogether and you can do just that.') - that's pretty much where we are. For the VAST majority of users what we've given them is just fine, but as we've been rolling out we've been putting 'problem use cases' to the end, and, well we're pretty close to the end and have to revist the last 10%! if I could get the budget for at least two machines (we have two sites) then, from the above, I should be able to get in the order of 40 users happy. From what you've said above though it sounds like I should be pushing for four machines - 2 for an RDSH 'Cluster' of 2019 (for 2 sites) that will support 40-ish users and a further 2 for dedicated VMs with, again, 40 (or 80 depending on config) per machine. There are no 3D users as such, just planners who like complicated maps ... I will try to gain more info on the spec of the machines they would expect to run on non-virtually. Is there a URL of which you are aware that explains these vGPU profiles? Oh I meant to say - it's all on Nimble Flash storage (which is good :-) ) Seriously, thank you again. Pat
MG

Thanks so much for the information - helps massively.

HP...yes we have a mix HP DL380 Gen9 and Gen 8 ... ideally I'd be looking to get the budget for a number of new machines and buy brand new completely configured HP Gen10's with 5 T4's each as that seems to be the sweet spot.

We currently have 0 GPU offering for our users and a handful are 'not very happy' (to quote you, above, 'If anyone wants to cram users on to the platform to the detriment of the user experience, then simply remove the GPU from the system altogether and you can do just that.') - that's pretty much where we are.

For the VAST majority of users what we've given them is just fine, but as we've been rolling out we've been putting 'problem use cases' to the end, and, well we're pretty close to the end and have to revist the last 10%!

if I could get the budget for at least two machines (we have two sites) then, from the above, I should be able to get in the order of 40 users happy.

From what you've said above though it sounds like I should be pushing for four machines - 2 for an RDSH 'Cluster' of 2019 (for 2 sites) that will support 40-ish users and a further 2 for dedicated VMs with, again, 40 (or 80 depending on config) per machine.

There are no 3D users as such, just planners who like complicated maps ... I will try to gain more info on the spec of the machines they would expect to run on non-virtually.

Is there a URL of which you are aware that explains these vGPU profiles?

Oh I meant to say - it's all on Nimble Flash storage (which is good :-) )

Seriously, thank you again.

Pat

#3
Posted 02/20/2020 08:51 PM   
Hi Pat No worries, glad the info is useful When designing each cluster, try and keep the server hardware as similar as possible (ideally identical) as this keeps performance, functionality and density the same no matter which VM the users connect to. Use the numbers I’ve mentioned above as a guide to get started, your best bet would be to work with your technology supplier and get a DL380 Gen10 on evaluation, run a small POC on it with each type of VM and work out which specs of each are correct for you, then reconfigure it and purchase accordingly. The main vGPU Documentation Site is available here: https://docs.nvidia.com/grid/index.html The current release is vGPU 10.1. If you select that, it will open up all the documentation for that specific version. The document you’re looking for with all the vGPU Profiles is called “Virtual GPU Software User Guide”. Open that and scroll down a few pages and you’ll see all the Profiles listed in a few big tables. If you’re looking at physical workstation specs, it’s well worth doing a little monitoring to check utilisation to make sure you cover off any unforeseen requirements. This is my favourite monitoring software: https://github.com/JeremyMain/GPUProfiler/releases It’s free to use and will give you plenty of accurate metrics to work with. To save you a bit of troubleshooting, when you GPU enable your RDSH VMs, you must set this GPO for them, otherwise the RDSH VMs won’t use the GPU: Computer Configuration\Administrative Templates\Windows Components\Remote Desktop Services\Remote Desktop Session Host\Remote Session Environment: Use the hardware default graphics adapters for all Remote Desktop Services sessions Nice work with the All Flash! That’ll definitely help with performance and user experience! Let me know if you have any questions on the vGPU Profiles Regards MG
Hi Pat

No worries, glad the info is useful

When designing each cluster, try and keep the server hardware as similar as possible (ideally identical) as this keeps performance, functionality and density the same no matter which VM the users connect to.

Use the numbers I’ve mentioned above as a guide to get started, your best bet would be to work with your technology supplier and get a DL380 Gen10 on evaluation, run a small POC on it with each type of VM and work out which specs of each are correct for you, then reconfigure it and purchase accordingly.

The main vGPU Documentation Site is available here: https://docs.nvidia.com/grid/index.html

The current release is vGPU 10.1. If you select that, it will open up all the documentation for that specific version. The document you’re looking for with all the vGPU Profiles is called “Virtual GPU Software User Guide”. Open that and scroll down a few pages and you’ll see all the Profiles listed in a few big tables.

If you’re looking at physical workstation specs, it’s well worth doing a little monitoring to check utilisation to make sure you cover off any unforeseen requirements. This is my favourite monitoring software: https://github.com/JeremyMain/GPUProfiler/releases It’s free to use and will give you plenty of accurate metrics to work with.

To save you a bit of troubleshooting, when you GPU enable your RDSH VMs, you must set this GPO for them, otherwise the RDSH VMs won’t use the GPU:

Computer Configuration\Administrative Templates\Windows Components\Remote Desktop Services\Remote Desktop Session Host\Remote Session Environment: Use the hardware default graphics adapters for all Remote Desktop Services sessions

Nice work with the All Flash! That’ll definitely help with performance and user experience!

Let me know if you have any questions on the vGPU Profiles

Regards

MG

#4
Posted 02/20/2020 11:49 PM   
Scroll To Top

Add Reply