Artificial Intelligence Computing Leadership from NVIDIA
Hyper-V Server 2019 RDSH farm
Hi We currently run a server 2019 RDSH farm and would like to start implementing dedicated graphics cards into our HPE ProLiant DL380’s. Requirements are to increase the user count on each VM by offloading load from the CPU for Chrome and Office applications, along with improving the look and feel of the RDSH session. These requirements might be unattainable as I have seen reference to max users/licenses on the M10 of 64, is this the case? Ideally we would be running 5-6 RDSH VMs ok each host, currently we average 20 users per RDSH. Our supplier is a bit un clear on how much and what we need to licence as well, so any help on that would be appreciated. Many thanks
Hi

We currently run a server 2019 RDSH farm and would like to start implementing dedicated graphics cards into our HPE ProLiant DL380’s.

Requirements are to increase the user count on each VM by offloading load from the CPU for Chrome and Office applications, along with improving the look and feel of the RDSH session.

These requirements might be unattainable as I have seen reference to max users/licenses on the M10 of 64, is this the case?

Ideally we would be running 5-6 RDSH VMs ok each host, currently we average 20 users per RDSH.

Our supplier is a bit un clear on how much and what we need to licence as well, so any help on that would be appreciated.

Many thanks

#1
Posted 08/17/2020 07:33 PM   
Hi As an example, depending on the type of applications and how they're used, you should expect about 15-25 users per GPU on an M10 before framebuffer exhaustion. The M10 has 4x 8GB GPUs on it and you'd allocate 1 GPU per RDSH VM. The user density varies between deployments as applications and how they're used vary, not to mention environment optimisations, monitors and resolutions etc all vary and they all impact framebuffer usage. 15-25 users per GPU would give you between 60-100 users per M10 which should be achievable. However, in any new GPU / vGPU deployment, I'd recommend the T4 GPU over the M10 due to the stage of their lifecycle (M10s were released in 2016). As has been shown many times and also with feedback on here to support it, it's the 8GB of framebuffer that is typically the limiting factor when it comes to RDSH deployments, not the encoding or processing, meaning that the 8GB on the M10 isn't really sufficient for modern and more importantly future RDSH deployments forcing customers to scale out their deployments and not up, hence my recommendation to avoid it for new installations, and go for the T4. As you're using Hyper-V, you'll need to use Passthrough (it doesn't support vGPU), so if you went with the T4, you'd need multiple GPUs per DL380. Regarding licensing, you'd want to use "vApps". This is licensed per Concurrent User (not total user), so match it to your CALs and you should be fine. Which generation of DL380 do you have? How many concurrent users do you support in you RDSH environment? Regards MG
Hi

As an example, depending on the type of applications and how they're used, you should expect about 15-25 users per GPU on an M10 before framebuffer exhaustion. The M10 has 4x 8GB GPUs on it and you'd allocate 1 GPU per RDSH VM. The user density varies between deployments as applications and how they're used vary, not to mention environment optimisations, monitors and resolutions etc all vary and they all impact framebuffer usage. 15-25 users per GPU would give you between 60-100 users per M10 which should be achievable.

However, in any new GPU / vGPU deployment, I'd recommend the T4 GPU over the M10 due to the stage of their lifecycle (M10s were released in 2016). As has been shown many times and also with feedback on here to support it, it's the 8GB of framebuffer that is typically the limiting factor when it comes to RDSH deployments, not the encoding or processing, meaning that the 8GB on the M10 isn't really sufficient for modern and more importantly future RDSH deployments forcing customers to scale out their deployments and not up, hence my recommendation to avoid it for new installations, and go for the T4.

As you're using Hyper-V, you'll need to use Passthrough (it doesn't support vGPU), so if you went with the T4, you'd need multiple GPUs per DL380.

Regarding licensing, you'd want to use "vApps". This is licensed per Concurrent User (not total user), so match it to your CALs and you should be fine.

Which generation of DL380 do you have?

How many concurrent users do you support in you RDSH environment?

Regards

MG

#2
Posted 08/18/2020 08:21 AM   
Hi Thanks for the reply. We have a mixture but predominently Gen10. Across the farm currently we are at around 1,000 users, with aspirations for 2,000 within two years. Ideally 30 - 40 users per RDSH VM, for this exercise to be worthwhile. So Host wise if running 5 x RDSH VM's, based on 30 users per session would be 150 users per ProLiant DL380 host. We currently have 2 x 10GB NIC's in the chassis as well. I had seen mentioned advice to look at Quadro GPU's, as they don't require licensing?
Hi

Thanks for the reply.

We have a mixture but predominently Gen10.

Across the farm currently we are at around 1,000 users, with aspirations for 2,000 within two years.

Ideally 30 - 40 users per RDSH VM, for this exercise to be worthwhile.

So Host wise if running 5 x RDSH VM's, based on 30 users per session would be 150 users per ProLiant DL380 host.

We currently have 2 x 10GB NIC's in the chassis as well.

I had seen mentioned advice to look at Quadro GPU's, as they don't require licensing?

#3
Posted 08/18/2020 10:11 AM   
Hi Great, thanks for the information! Depending on your Riser configuration, you can physically fit up to 3x M10s or 7x T4s per DL380 G10. Here's NVIDIAs Certified Server URL that collates information from the Server OEMs: https://www.nvidia.com/en-us/data-center/resources/vgpu-certified-servers/ Retro fitting GPUs into any Server can raise complications with the original Server configuration. These can be things like CPU choice, PSU choice (of which there are prerequisites that must be met) and RAM, so you need to make sure you have all the requirements met before installing the GPUs. You will also need a GPU enablement Kit which is typically a pair of low profile CPU heatsinks (to provide adequate cooling for the GPUs as they're Passive) and additional Power Cables for the GPUs that need them (depending on which GPUs you go for (M10s need one, T4s do not)). The licensing for vApps really isn't expensive, and for your environment, I'm pretty sure you can't avoid it due to the options available. Basically, the Quadro options you have won't give you the density you need. The largest single slot Quadro is an RTX4000 (8GB), if you move to a 16GB Quadro then you're looking at the RTX5000 which is dual slot, so you halve the density in your server vs a T4 which has the same amount of Framebuffer and much lower power requirements. If you go above that, you're into RTX6000 / 8000 which is not suitable for this type of deployment. 5x T4s per DL380 with 30+ users per RDSH should be [u]very[/u] achievable (pending a small POC for application validation of course). When purchasing your new DL380s as you scale later, you may want to revise the specification (CPU / RAM / GPU) so that you can install more T4s to further improve that density. Regards MG
Hi

Great, thanks for the information!

Depending on your Riser configuration, you can physically fit up to 3x M10s or 7x T4s per DL380 G10. Here's NVIDIAs Certified Server URL that collates information from the Server OEMs: https://www.nvidia.com/en-us/data-center/resources/vgpu-certified-servers/

Retro fitting GPUs into any Server can raise complications with the original Server configuration. These can be things like CPU choice, PSU choice (of which there are prerequisites that must be met) and RAM, so you need to make sure you have all the requirements met before installing the GPUs. You will also need a GPU enablement Kit which is typically a pair of low profile CPU heatsinks (to provide adequate cooling for the GPUs as they're Passive) and additional Power Cables for the GPUs that need them (depending on which GPUs you go for (M10s need one, T4s do not)).

The licensing for vApps really isn't expensive, and for your environment, I'm pretty sure you can't avoid it due to the options available. Basically, the Quadro options you have won't give you the density you need. The largest single slot Quadro is an RTX4000 (8GB), if you move to a 16GB Quadro then you're looking at the RTX5000 which is dual slot, so you halve the density in your server vs a T4 which has the same amount of Framebuffer and much lower power requirements. If you go above that, you're into RTX6000 / 8000 which is not suitable for this type of deployment.

5x T4s per DL380 with 30+ users per RDSH should be very achievable (pending a small POC for application validation of course).

When purchasing your new DL380s as you scale later, you may want to revise the specification (CPU / RAM / GPU) so that you can install more T4s to further improve that density.

Regards

MG

#4
Posted 08/18/2020 11:53 AM   
Brilliant advice MG, I will look into the T4's, any ceiling (Max users) on users for T4's or is it purely based on usage requirements? Trial and error.
Brilliant advice MG, I will look into the T4's, any ceiling (Max users) on users for T4's or is it purely based on usage requirements? Trial and error.

#5
Posted 08/18/2020 12:56 PM   
Hi Unfortunately there are too many variables to give an accurate answer on that, and there is no hard limit with RDSH (a VDI deployment is obviously different). Things like the amount of monitors a user has and the resolution of those monitors make a huge difference (1080P vs 4K) both to Framebuffer utilisation but also encoding as well. The type (and version) of applications installed and how they are used (we've all seen users with many multiples of tabs open in a browser before) makes a big difference too. Operating System configuration and optimisation all plays a part. I've heard of RDSH numbers of 10 - 11 users maxing out 8GB on an M10 due to sub-optimal applications being used, and then others right up to 30 users on single 8GB M10. Everyone's environment is different, so the end results are usually different. As an estimate in terms of absolute numbers, if you have the right type of applications installed, with a well optimised Operating System and users that behave themselves, then I'd be expecting over 40 users on a single T4. It's also important to leave a little headroom on all components to give a consistent experience and level of performance. No environment runs well when it's running at maximum utilisation, so find your maximum then dial it back about 10 - 15% per GPU / RDSH. Also, when you purchase new servers to scale your environment, I would be expecting better density due to a more optimised selection of components being chosen to work together (CPU / RAM / GPU / Storage). The numbers you're asking for are very realistic, and a small POC is always the best way to make sure you're making a positive change to the environment, and you only need 1x T4 to test initially which means the initial cost of the POC should be very low. Regards MG
Hi

Unfortunately there are too many variables to give an accurate answer on that, and there is no hard limit with RDSH (a VDI deployment is obviously different). Things like the amount of monitors a user has and the resolution of those monitors make a huge difference (1080P vs 4K) both to Framebuffer utilisation but also encoding as well. The type (and version) of applications installed and how they are used (we've all seen users with many multiples of tabs open in a browser before) makes a big difference too. Operating System configuration and optimisation all plays a part.

I've heard of RDSH numbers of 10 - 11 users maxing out 8GB on an M10 due to sub-optimal applications being used, and then others right up to 30 users on single 8GB M10. Everyone's environment is different, so the end results are usually different.

As an estimate in terms of absolute numbers, if you have the right type of applications installed, with a well optimised Operating System and users that behave themselves, then I'd be expecting over 40 users on a single T4. It's also important to leave a little headroom on all components to give a consistent experience and level of performance. No environment runs well when it's running at maximum utilisation, so find your maximum then dial it back about 10 - 15% per GPU / RDSH. Also, when you purchase new servers to scale your environment, I would be expecting better density due to a more optimised selection of components being chosen to work together (CPU / RAM / GPU / Storage).

The numbers you're asking for are very realistic, and a small POC is always the best way to make sure you're making a positive change to the environment, and you only need 1x T4 to test initially which means the initial cost of the POC should be very low.

Regards

MG

#6
Posted 08/18/2020 01:52 PM   
Understood, appreciate the feedback. Will go for a T4 and see how we get on ????
Understood, appreciate the feedback.

Will go for a T4 and see how we get on ????

#7
Posted 08/18/2020 03:39 PM   
Scroll To Top

Add Reply