Artificial Intelligence Computing Leadership from NVIDIA
Help with setup
We are planning on buildning a new Remote Desktop-solution. There will be two psysical hosts. HP ML 350 Gen 10. Dual CPU >150 GB RAM. RDS will be Windows Server 2019 Standard. Installed on bare metal, no virtual servers. We are only using remote app through RDWEB. Up to 100 users can be logged in to any of those RDS (50 per server). We dont use Autocad but are viewing alot of drawings in different applications. So we need to boost the GPU-performance. We are looking at Tesla M10. One for each RDS. Will that work? Must we use another configuration? What in that case? Will it be enough with one M10 per server? What license do we need? Are there any other GPU that will serve us better? All inputs are welcome. Thanks!
We are planning on buildning a new Remote Desktop-solution.
There will be two psysical hosts.
HP ML 350 Gen 10.
Dual CPU
>150 GB RAM.

RDS will be Windows Server 2019 Standard.
Installed on bare metal, no virtual servers.
We are only using remote app through RDWEB.
Up to 100 users can be logged in to any of those RDS (50 per server).

We dont use Autocad but are viewing alot of drawings in different applications.
So we need to boost the GPU-performance.

We are looking at Tesla M10.
One for each RDS.

Will that work?
Must we use another configuration? What in that case?
Will it be enough with one M10 per server?
What license do we need?
Are there any other GPU that will serve us better?

All inputs are welcome.

Thanks!

#1
Posted 04/04/2020 05:23 PM   
Hi Your choice in hardware would be ok (although I'd replace the M10s with T4s) if you were using virtualisation, but as you've said you want to use bare-metal then it's not appropriate. I can't actually remember the last bare-metal install I did, it would have been over 10 years ago though. Just out of interest, why don't you want to use virtualisation? You've already picked out appropriate hardware for it? ... The simplest, most cost effective solution would be to use the hardware you have selected, deploy a Hypervisor on each of the ML350s and run an RDSH VM on each of the GPUs on the M10 (4 VMs per ML350). That will easily handle 50 users per ML350 no problem at all, with capacity for a few more if needed. However, what I would do, is the same design, but as the M10 is pretty old now, I'd replace 1 M10 with 2 T4s. Depending on your Hypervisor choice, you could then either allocate an entire T4 to each RDSH VM having 2 RDSH VMs per ML350, or use vGPU to split each T4 in half to create 4 VMs and run 2 of them per T4 totalling 4 VMs. Virtualisation with T4s is my recommendation. Now let's look at what happens when you run bare-metal .... Firstly, your RDSH use case and hardware selection is going to be problematic. Multi-GPU with RDSH is a bad configuration and should be avoided as much as possible due to the nature of the workload and how RDSH uses the GPU. Your RDSH Server will not share the workload over 2 or more GPUs if that's what you were hoping for. Despite the M10 having 4 8GB GPUs totalling 32GB, your RDSH Server will use only 1 of them, meaning that the system will only have 8GB of usable framebuffer to cater for your 50 users (not to mention encoding and processing). Basically, you're going to run out of GPU resources before you get anywhere near your 50 user total. To be clear, certain applications can use Multi-GPU configurations, but they are not suited for RDSH deployments. Problem 1 with bare-metal, is that your RDSH Server will only make use of 1 GPU, so you're going to need a bigger GPU, but that said, I'm not sure I'd want to put 50 users on any currently available single GPU, I've never even heard of any one doing that, I think it would be cost prohibitive. Problem 2 with bare-metal, is because of the user count you're going to need something powerful (and expensive). Meaning that because you can't use the correct low-end hardware you have to use high-end hardware for a lower-end use case to support your user density. I could suggest something like a Quadro P5000 or RTX 5000, but I'm not sure 16GB would be enough for 50 users. Realistically, you're going to be looking at something with 24GB or more, so something like a P6000, P40, RTX 6000 or above. (It just gets silly for this type of use case). You could purchase 2 extra ML350s, downgrade the spec of each of them and aim for 25 users per RDSH and then you could opt for a 16GB GPU (you wouldn't use a smaller GPU in this configuration because you'd have no flexibility). Again, the reason you'd do this, is because as you're running bare-metal, the OS will only use 1 GPU so you need more Hosts. You already have the correct hardware for it (albeit with a GPU change to a pair of T4s) so unless there's a solid justification for [b]not[/b] using virtualisation, in my opinion you either need to scale out and purchase more ML350s or scale up with a larger single GPU and risk not over loading its resources. Regards MG
Hi

Your choice in hardware would be ok (although I'd replace the M10s with T4s) if you were using virtualisation, but as you've said you want to use bare-metal then it's not appropriate. I can't actually remember the last bare-metal install I did, it would have been over 10 years ago though. Just out of interest, why don't you want to use virtualisation? You've already picked out appropriate hardware for it? ...

The simplest, most cost effective solution would be to use the hardware you have selected, deploy a Hypervisor on each of the ML350s and run an RDSH VM on each of the GPUs on the M10 (4 VMs per ML350). That will easily handle 50 users per ML350 no problem at all, with capacity for a few more if needed. However, what I would do, is the same design, but as the M10 is pretty old now, I'd replace 1 M10 with 2 T4s. Depending on your Hypervisor choice, you could then either allocate an entire T4 to each RDSH VM having 2 RDSH VMs per ML350, or use vGPU to split each T4 in half to create 4 VMs and run 2 of them per T4 totalling 4 VMs. Virtualisation with T4s is my recommendation.

Now let's look at what happens when you run bare-metal .... Firstly, your RDSH use case and hardware selection is going to be problematic. Multi-GPU with RDSH is a bad configuration and should be avoided as much as possible due to the nature of the workload and how RDSH uses the GPU. Your RDSH Server will not share the workload over 2 or more GPUs if that's what you were hoping for. Despite the M10 having 4 8GB GPUs totalling 32GB, your RDSH Server will use only 1 of them, meaning that the system will only have 8GB of usable framebuffer to cater for your 50 users (not to mention encoding and processing). Basically, you're going to run out of GPU resources before you get anywhere near your 50 user total. To be clear, certain applications can use Multi-GPU configurations, but they are not suited for RDSH deployments.

Problem 1 with bare-metal, is that your RDSH Server will only make use of 1 GPU, so you're going to need a bigger GPU, but that said, I'm not sure I'd want to put 50 users on any currently available single GPU, I've never even heard of any one doing that, I think it would be cost prohibitive.

Problem 2 with bare-metal, is because of the user count you're going to need something powerful (and expensive). Meaning that because you can't use the correct low-end hardware you have to use high-end hardware for a lower-end use case to support your user density. I could suggest something like a Quadro P5000 or RTX 5000, but I'm not sure 16GB would be enough for 50 users. Realistically, you're going to be looking at something with 24GB or more, so something like a P6000, P40, RTX 6000 or above. (It just gets silly for this type of use case).

You could purchase 2 extra ML350s, downgrade the spec of each of them and aim for 25 users per RDSH and then you could opt for a 16GB GPU (you wouldn't use a smaller GPU in this configuration because you'd have no flexibility). Again, the reason you'd do this, is because as you're running bare-metal, the OS will only use 1 GPU so you need more Hosts.

You already have the correct hardware for it (albeit with a GPU change to a pair of T4s) so unless there's a solid justification for not using virtualisation, in my opinion you either need to scale out and purchase more ML350s or scale up with a larger single GPU and risk not over loading its resources.

Regards

MG

#2
Posted 04/05/2020 10:48 AM   
Thank you for that wonderful answer. :) Spot on what I needed to hear. I am open for a ESXi-solution. How about this configuration? 2 x HP ML 350 Gen 10. Dual CPU. >150 GB RAM. Installed with VMware Essentials Plus (latest version). vCenter on a seperate machine. One Virtual RDSH (Windows Server 2019) per ESXi-host (to start with). One "Tesla T4 - 16 GB GDDR6 - PCIe 3.0 x16" per host (to start with). Is that configuration ok? Or maybe it is enough with just one "HP ML 350 Gen 10" and boost up the RAM and add more T4 and install 3-4 virtual Server 2019? Could that handle up to 90 RDSH-users? What license do I need for the T4-card?
Thank you for that wonderful answer. :)
Spot on what I needed to hear.

I am open for a ESXi-solution.

How about this configuration?
2 x HP ML 350 Gen 10.
Dual CPU.
>150 GB RAM.

Installed with VMware Essentials Plus (latest version).
vCenter on a seperate machine.

One Virtual RDSH (Windows Server 2019) per ESXi-host (to start with).
One "Tesla T4 - 16 GB GDDR6 - PCIe 3.0 x16" per host (to start with).

Is that configuration ok?

Or maybe it is enough with just one "HP ML 350 Gen 10" and boost up the RAM and add more T4 and install 3-4 virtual Server 2019?
Could that handle up to 90 RDSH-users?

What license do I need for the T4-card?

#3
Posted 04/05/2020 05:14 PM   
Sorry for a total newbie in this area. But I think I'm getting what you are writing. Each M10 (board) has 4 GPU á 8 GB. Each virtual RDSH can handle one GPU (8 GB). Therefore I must install four virtual RDSH on each Host to fully be able to use all the four GPU. Or I could use a T4 that could be used by one single virtual RDHS. Or install two T4 on one Host and then install two virtual RDHS. My options are: 1. One ESXi-host. One M10 on that host. Four virtual RDSH (each get one GPU á 8 GB). 2. One ESXi-host. Two T4 on that host. Two or four virtual RDSH (If four the use vGPU to split the T4). Do you think one Host can manage 90 users? As I said, we are not using Autocad or any high end applikation. But we are opening a lot of drawings through various applikations. We are using Office 365, webbrowsing, a lot of drawings in PDF, DWG TrueViewer, BlueBeam Revu and some more smaller apps. Still my question about license remains. We use published Remote Desktop App through RDWEB only. i would really appreciate if you (or someone else) could give me some more answers.
Sorry for a total newbie in this area.

But I think I'm getting what you are writing.

Each M10 (board) has 4 GPU á 8 GB.
Each virtual RDSH can handle one GPU (8 GB).
Therefore I must install four virtual RDSH on each Host to fully be able to use all the four GPU.

Or I could use a T4 that could be used by one single virtual RDHS.
Or install two T4 on one Host and then install two virtual RDHS.

My options are:
1.
One ESXi-host.
One M10 on that host.
Four virtual RDSH (each get one GPU á 8 GB).

2.
One ESXi-host.
Two T4 on that host.
Two or four virtual RDSH (If four the use vGPU to split the T4).

Do you think one Host can manage 90 users?

As I said, we are not using Autocad or any high end applikation.
But we are opening a lot of drawings through various applikations.
We are using Office 365, webbrowsing, a lot of drawings in PDF, DWG TrueViewer, BlueBeam Revu and some more smaller apps.

Still my question about license remains.

We use published Remote Desktop App through RDWEB only.

i would really appreciate if you (or someone else) could give me some more answers.

#4
Posted 04/05/2020 07:19 PM   
Hi Great, virtualisation is back on the table! That makes things much easier! Yes, the M10 has 4 8GB GPUs, and because you're running RDSH you would use 1 of those for each RDSH VM, totalling 4 VMs. Depending on the resource requirements of your Apps and how your users use the system, each 8GB RDSH VM would support [b]approximately[/b] 20 - 25 concurrent users. [b][i]This is an industry average, your mileage will vary + / -[/i][/b] which is why you should definitely run a POC before making any firm design decisions. VMware (vSphere) is top of the line, but it's not cheap. For vGPU, you'll need vSphere Enterprise Plus. This is why it's important to understand the difference between Passthrough and vGPU and also understand your workload and how it will use the resources. If you were to use the M10, because of your workload and the fact the M10 is a Multi-GPU board, you'll be assigning an 8GB GPU; does it matter if that GPU is Passthrough or vGPU? The answer is, it depends on the amount of flexibility and features you want from the system. If you just want to provide access to a GPU enabled RDSH VM and don't need any additional features, then you can get away with Passthrough and Essentials licensing with the M10. If you were to use the T4 you would have to assign the whole T4 (due to not having VMware Enterprise Plus licensing), making them 16GB RDSH VMs, but you would need fewer of them to support your density. Does that make sense? For any VMware deployment, you really should be using vCenter. Without it, administering your vSphere Hosts becomes really limited. For vGPU however, it's a prerequisite, but if you're only using Passthrough, then you can get away without it, but it's not something I'd ever recommend. Regarding your ML350 specs, which CPUs are you looking at using? To get started for your POC, you'd be looking at something like this for each RDSH VM with an M10 running in either Passthrough or vGPU: 8 vCPU 32GB RAM 8GB GPU SSD / All Flash Storage That's 128GB RAM total, leaving the rest for the Hypervisor, and you'll be able to run 4 of those per ML350. If you were to go down the T4 in Passthrough route, then you'd be looking at a scaled up version of the above with something like this: 12 vCPUs 48GB RAM 16GB GPU SSD / All Flash Storage In theory, you should need less of them as they'll support a higher user density, and you'd tailor the vCPUs and RAM based on the POC results. If you were to use vGPU with the T4, then you'd be looking at the same specs as with the M10 but the T4 running as 8GB and supporting 2 VMs. Those specs are not definitive, and the final specs will be based on your POC results. If your ML350 is spec'd highly enough it will manage the 90 users (and above) without issue. But remember you should be providing N+1. If anything fails, you'll impact a large percentage of users. Licenisng: Depending on your final configuration, you'll need a combination of the following: vCenter Standard vSphere Essentials / Enterprise Plus Windows Server Standard RDS CALs vGPU vApps If you're running Passthrough and have no intention of ever wanting vGPU, you can look at other Quadro GPUs and not pay any vGPU licensing. Because we're now using virtualisation, we can have multiple GPUs in the same physical host (whereas with bare-metal, we can't). You can now look at multiple Quadro P4000 or Quadro RTX4000 as these are single slot 8GB GPUs. You can run these in Passthrough and there is no NVIDIA licensing, however, these are still a bit much for just RDSH workloads, but you wouldn't use anything smaller. If you wanted to use the M10 or T4, you would be looking at vApps licensing (regardless of Passthrough or vGPU). With the exception of vCompute, all other vGPU licensing is per Concurrent User, so you'll need 100 vApps licenses. I hope that helps. Overall, and despite all the configurable options, my recommendation remains the same as my first post. Don't try and skimp on features and functionality if you can afford them. Use the T4s running 8A vGPU Profiles. Regards MG
Hi

Great, virtualisation is back on the table! That makes things much easier!

Yes, the M10 has 4 8GB GPUs, and because you're running RDSH you would use 1 of those for each RDSH VM, totalling 4 VMs. Depending on the resource requirements of your Apps and how your users use the system, each 8GB RDSH VM would support approximately 20 - 25 concurrent users. This is an industry average, your mileage will vary + / - which is why you should definitely run a POC before making any firm design decisions.

VMware (vSphere) is top of the line, but it's not cheap. For vGPU, you'll need vSphere Enterprise Plus. This is why it's important to understand the difference between Passthrough and vGPU and also understand your workload and how it will use the resources. If you were to use the M10, because of your workload and the fact the M10 is a Multi-GPU board, you'll be assigning an 8GB GPU; does it matter if that GPU is Passthrough or vGPU? The answer is, it depends on the amount of flexibility and features you want from the system. If you just want to provide access to a GPU enabled RDSH VM and don't need any additional features, then you can get away with Passthrough and Essentials licensing with the M10. If you were to use the T4 you would have to assign the whole T4 (due to not having VMware Enterprise Plus licensing), making them 16GB RDSH VMs, but you would need fewer of them to support your density. Does that make sense?

For any VMware deployment, you really should be using vCenter. Without it, administering your vSphere Hosts becomes really limited. For vGPU however, it's a prerequisite, but if you're only using Passthrough, then you can get away without it, but it's not something I'd ever recommend.

Regarding your ML350 specs, which CPUs are you looking at using?

To get started for your POC, you'd be looking at something like this for each RDSH VM with an M10 running in either Passthrough or vGPU:

8 vCPU
32GB RAM
8GB GPU
SSD / All Flash Storage

That's 128GB RAM total, leaving the rest for the Hypervisor, and you'll be able to run 4 of those per ML350.

If you were to go down the T4 in Passthrough route, then you'd be looking at a scaled up version of the above with something like this:

12 vCPUs
48GB RAM
16GB GPU
SSD / All Flash Storage

In theory, you should need less of them as they'll support a higher user density, and you'd tailor the vCPUs and RAM based on the POC results. If you were to use vGPU with the T4, then you'd be looking at the same specs as with the M10 but the T4 running as 8GB and supporting 2 VMs.

Those specs are not definitive, and the final specs will be based on your POC results.

If your ML350 is spec'd highly enough it will manage the 90 users (and above) without issue. But remember you should be providing N+1. If anything fails, you'll impact a large percentage of users.

Licenisng:

Depending on your final configuration, you'll need a combination of the following:

vCenter Standard
vSphere Essentials / Enterprise Plus
Windows Server Standard
RDS CALs
vGPU vApps

If you're running Passthrough and have no intention of ever wanting vGPU, you can look at other Quadro GPUs and not pay any vGPU licensing. Because we're now using virtualisation, we can have multiple GPUs in the same physical host (whereas with bare-metal, we can't). You can now look at multiple Quadro P4000 or Quadro RTX4000 as these are single slot 8GB GPUs. You can run these in Passthrough and there is no NVIDIA licensing, however, these are still a bit much for just RDSH workloads, but you wouldn't use anything smaller.

If you wanted to use the M10 or T4, you would be looking at vApps licensing (regardless of Passthrough or vGPU). With the exception of vCompute, all other vGPU licensing is per Concurrent User, so you'll need 100 vApps licenses.

I hope that helps.

Overall, and despite all the configurable options, my recommendation remains the same as my first post. Don't try and skimp on features and functionality if you can afford them. Use the T4s running 8A vGPU Profiles.

Regards

MG

#5
Posted 04/06/2020 08:56 AM   
Hello, I am reading this information, because I try to find out the best efford. Cuurectly we have a customer with about 50 users on a RDSH server. It is working, but graphical performance for let say Google Maps is poor. So we want to offer a new RDSH. We like RDSH because of the easy central management. What would you advice? Which setup? One Hypervisor and 2 RDSH machines? Basicaly we start with a HPE DL380 Gen10, 128Gb memory, dual CPU and 480Gb Mixed use SSD in raid 50. Using server 2019 std.
Hello,

I am reading this information, because I try to find out the best efford.
Cuurectly we have a customer with about 50 users on a RDSH server. It is working, but graphical performance for let say Google Maps is poor.
So we want to offer a new RDSH. We like RDSH because of the easy central management.
What would you advice? Which setup? One Hypervisor and 2 RDSH machines?
Basicaly we start with a HPE DL380 Gen10, 128Gb memory, dual CPU and 480Gb Mixed use SSD in raid 50.
Using server 2019 std.

#6
Posted 11/11/2020 10:47 AM   
Scroll To Top

Add Reply