Artificial Intelligence Computing Leadership from NVIDIA
Looking for advice on optimal config for latest-gen Citrix Xenapp vGPU solution
Just for the record, in my POC testing of last year using the 2012R2 golden image I consistently noticed the behaviour that whenever the total framebuffer memory exceeded 8GB (at that time I used the 8A profile) the virtual machine entered a 'hung' state and could only be saved by a full hard reset. People's active citrix sessions were lost as a result. Now in production using 16a profiles and a brand new 2019 golden image I have never dared so far to allow or force the virtual machine to exceed the 16GB total framebuffer memory. Highest I've seen was 95% load. @MrGRID for the record, what is the expected behaviour according to your knowledge/experience ? Would the fact that I now assign a complete T4 card with its 16GB mem exclusively (as opposed to a 8a profile) to 1 virtual machine make any difference ?
Just for the record, in my POC testing of last year using the 2012R2 golden image I consistently noticed the behaviour that whenever the total framebuffer memory exceeded 8GB (at that time I used the 8A profile) the virtual machine entered a 'hung' state and could only be saved by a full hard reset. People's active citrix sessions were lost as a result.

Now in production using 16a profiles and a brand new 2019 golden image I have never dared so far to allow or force the virtual machine to exceed the 16GB total framebuffer memory. Highest I've seen was 95% load.

@MrGRID


for the record, what is the expected behaviour according to your knowledge/experience ? Would the fact that I now assign a complete T4 card with its 16GB mem exclusively (as opposed to a 8a profile) to 1 virtual machine make any difference ?

#46
Posted 09/11/2020 10:26 AM   
Hi Athomsen Sorry for the delay in responding, busy time at the moment ... Regarding Supporterd vs Unsupported; Supported should always give less issues as that configuration will have been validated to work together and this is what you would always want in a production environment for that peace of mind. Unfortunately what you're experiencing at the moment isn't a matter of support, it's a physical, technical limitation and the correct way to resolve it is to scale out. If you have any budget available, a more cost effective way to resolve this maintaining support could be to look at using C240 M4s with a couple of M10s installed. M4 architecture is really cheap now because it's superseded, plus you won't need any local disks (again helping to reduce cost) so this shouldn't cost you too much to do and this would give you 8 VMs per C240 (in [b]2U[/b] compared to B200s which if you're using a UCS 5108 is 6U) meaning that in 6U you could have 24x 8GB GPUs instead of 8. Connect them up to your AFF and it may be a viable solution to see you through until your next platform upgrade, which, if this was installed in 2017, should typically only be about another 1-2 years absolute maximum depending on how you run your technology lifecycle. It depends on whether you'd want to start the next upgrade a little early, or top this one up to keep it going and then upgrade the entire thing in one go with a new technology stack. But adding C240 M4s is definitely a cost effective way to add more density whilst maintaining support at this stage. Hi Profundido I've not personally had machines crash when they run out of resource, but the performance of the machine is severely impacted to a point where it becomes unusable for anyone connected. I don't run any of my environments consistently above 85-90% Framebuffer usage. This allows a little headroom when required. Regarding 8GB vs 16GB, the end result when you run out of resource should be the same, and this is purely down to how the environment will be designed and whether you want to scale up or out, the type of Applications being used etc etc. 2x 8GB should give you the same total density as 1x 16GB. However you need to account for additional CPU and RAM for the extra VM, as there are economies of scale for scaling up vs out, not forgetting Windows Licensing costs, Management etc etc ... Regards MG
Hi Athomsen

Sorry for the delay in responding, busy time at the moment ...

Regarding Supporterd vs Unsupported; Supported should always give less issues as that configuration will have been validated to work together and this is what you would always want in a production environment for that peace of mind. Unfortunately what you're experiencing at the moment isn't a matter of support, it's a physical, technical limitation and the correct way to resolve it is to scale out.

If you have any budget available, a more cost effective way to resolve this maintaining support could be to look at using C240 M4s with a couple of M10s installed. M4 architecture is really cheap now because it's superseded, plus you won't need any local disks (again helping to reduce cost) so this shouldn't cost you too much to do and this would give you 8 VMs per C240 (in 2U compared to B200s which if you're using a UCS 5108 is 6U) meaning that in 6U you could have 24x 8GB GPUs instead of 8. Connect them up to your AFF and it may be a viable solution to see you through until your next platform upgrade, which, if this was installed in 2017, should typically only be about another 1-2 years absolute maximum depending on how you run your technology lifecycle. It depends on whether you'd want to start the next upgrade a little early, or top this one up to keep it going and then upgrade the entire thing in one go with a new technology stack. But adding C240 M4s is definitely a cost effective way to add more density whilst maintaining support at this stage.


Hi Profundido

I've not personally had machines crash when they run out of resource, but the performance of the machine is severely impacted to a point where it becomes unusable for anyone connected. I don't run any of my environments consistently above 85-90% Framebuffer usage. This allows a little headroom when required.

Regarding 8GB vs 16GB, the end result when you run out of resource should be the same, and this is purely down to how the environment will be designed and whether you want to scale up or out, the type of Applications being used etc etc. 2x 8GB should give you the same total density as 1x 16GB. However you need to account for additional CPU and RAM for the extra VM, as there are economies of scale for scaling up vs out, not forgetting Windows Licensing costs, Management etc etc ...

Regards

MG

#47
Posted 09/14/2020 08:57 AM   
We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform. At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture. Is there a replacement on the way? Maybe on the latest ampere architecture?
We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform.

At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture.

Is there a replacement on the way? Maybe on the latest ampere architecture?

#48
Posted 09/14/2020 01:03 PM   
Yes, the only reason I listed the M10 was to maintain your existing (Cisco) M4 architecture throughout so you could extend the capacity with the same ecosystem without upgrading. Pascal and newer GPUs require M5+ architecture. You certainly wouldn't design anything new with Maxwell today (Maxwell has been superseded 4 times!), and this is something I routinely mention on here. The nearest replacement for the M10 is the T4. For configuration information on that, start from the beginning of this topic as we discuss it in detail. We'll have to see if anything gets announced at GTC in a few weeks regarding any replacements. But even before looking at that, it's worth asking Cisco if or when their M6 architecture is going to arrive so you can make sure you have the latest tech from all vendors. Regards MG
Yes, the only reason I listed the M10 was to maintain your existing (Cisco) M4 architecture throughout so you could extend the capacity with the same ecosystem without upgrading. Pascal and newer GPUs require M5+ architecture. You certainly wouldn't design anything new with Maxwell today (Maxwell has been superseded 4 times!), and this is something I routinely mention on here.

The nearest replacement for the M10 is the T4. For configuration information on that, start from the beginning of this topic as we discuss it in detail. We'll have to see if anything gets announced at GTC in a few weeks regarding any replacements. But even before looking at that, it's worth asking Cisco if or when their M6 architecture is going to arrive so you can make sure you have the latest tech from all vendors.

Regards

MG

#49
Posted 09/14/2020 02:03 PM   
[quote=""]We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform. At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture. Is there a replacement on the way? Maybe on the latest ampere architecture? [/quote] I'm officially in the dark as much as you are of course but since rumors/announcements of an Ampere-based new Quadro card are already starting to emerge, my 6th sense can already smell a T4 successor (with more memory) on the Horizon. That's what you should design around imho and the best platform to support it up till today is the one I'm having in production right now. It's just a matter of time now for that Ampere-based T4 successor. Mark my words.
said:We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform.

At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture.

Is there a replacement on the way? Maybe on the latest ampere architecture?




I'm officially in the dark as much as you are of course but since rumors/announcements of an Ampere-based new Quadro card are already starting to emerge, my 6th sense can already smell a T4 successor (with more memory) on the Horizon. That's what you should design around imho and the best platform to support it up till today is the one I'm having in production right now.

It's just a matter of time now for that Ampere-based T4 successor. Mark my words.

#50
Posted 09/16/2020 09:08 AM   
[quote=""]We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform. At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture. Is there a replacement on the way? Maybe on the latest ampere architecture? [/quote] aaaaaaand my predictive words from yesterday are barely cold or I stumble upon this today: https://imgur.com/a/yVi0vrZ https://www.nvidia.com/en-us/gtc/session-catalog/?search.language=1594320459782001LCjF&search=&tab.liveorondemand=1583520458947001NJiE
said:We will not be buying additional hardware for our current platform. Instead the plan is to look forward and plan for our next platform.

At the moment the M10 GPU seems like the best choice for maximum density/cost, but it's an old GPU. Designing a new platform with the M10 for the next 4-5 year would not be ideal. It's also the only current GPU on the now "old" Maxwell architecture.

Is there a replacement on the way? Maybe on the latest ampere architecture?




aaaaaaand my predictive words from yesterday are barely cold or I stumble upon this today:


https://imgur.com/a/yVi0vrZ



https://www.nvidia.com/en-us/gtc/session-catalog/?search.language=1594320459782001LCjF&search=&tab.liveorondemand=1583520458947001NJiE

#51
Posted 09/17/2020 03:36 PM   
Yeah, I'v already signed up for that. But it does mention the A100 specifically, so I'm not sure that means revealing a new GPU as well.
Yeah, I'v already signed up for that.
But it does mention the A100 specifically, so I'm not sure that means revealing a new GPU as well.

#52
Posted 09/21/2020 06:36 AM   
Scroll To Top

Add Reply