NVIDIA
Performance issue about GPU Passthrough with XS 6.5 and XD 7.6
I have set up a PoC server, passthrough the K1 card to VM, but the performance is so poor, running Unigine Valley only get about 1/4 FPS of my laptop (equip with GTX 850M). The barematal: cpu: E5-2620 memory: 32GB motherboard: Supermicro X9DRG-HF GPU: Grid K1 VM: vcpu: 4 core in a socket memory: 4GB GPU Driver: 332.76 OS: Windows 7 Ultimate 64bit VDI: XenDesktop 7.6 VDA with HDX 3D pro Any advice will be greatly appreciated!
I have set up a PoC server, passthrough the K1 card to VM, but the performance is so poor, running Unigine Valley only get about 1/4 FPS of my laptop (equip with GTX 850M).

The barematal:

cpu: E5-2620
memory: 32GB
motherboard: Supermicro X9DRG-HF
GPU: Grid K1



VM:

vcpu: 4 core in a socket
memory: 4GB
GPU Driver: 332.76
OS: Windows 7 Ultimate 64bit
VDI: XenDesktop 7.6 VDA with HDX 3D pro


Any advice will be greatly appreciated!

#1
Posted 04/08/2015 09:32 AM   
I think that is expected. Grid K1 is low-level (and high priced) card (4x entry level Kepler GK107 very like "GT 630"/"Quadro K600"/"Quadro K1000M"). Add also virtualization (time-shared units with frame rate limiter if you use vGPU instead passthrough), encoding, power management and other penalties. GTX 850M has more (Maxwell (newer architecture) GM107 - 3x more units, fasters clock for core). http://www.techpowerup.com/gpudb/1699/grid-k1.html http://www.techpowerup.com/gpudb/2538/geforce-gtx-850m.html http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units http://www.videocardbenchmark.net/compare.php?cmp[]=2617&cmp[]=2859
I think that is expected.
Grid K1 is low-level (and high priced) card (4x entry level Kepler GK107 very like "GT 630"/"Quadro K600"/"Quadro K1000M"). Add also virtualization (time-shared units with frame rate limiter if you use vGPU instead passthrough), encoding, power management and other penalties.
GTX 850M has more (Maxwell (newer architecture) GM107 - 3x more units, fasters clock for core).

http://www.techpowerup.com/gpudb/1699/grid-k1.html
http://www.techpowerup.com/gpudb/2538/geforce-gtx-850m.html
http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
http://www.videocardbenchmark.net/compare.php?cmp[]=2617&cmp[]=2859

#2
Posted 04/08/2015 06:51 PM   
Hi, mcerveny Thanks very much for reply, it help me a lot!
Hi, mcerveny

Thanks very much for reply, it help me a lot!

#3
Posted 04/09/2015 02:53 AM   
Hi What are the spec's of your laptop in comparison? (CPU, RAM, disk, screen resolution) and what FPS are you getting? (Hi / Low (it says at the end of the benchmark)) Have you updated the Firmware on the Hypervisor host? Have you tuned the BIOS for Maximum Performance? Have you tuned XenServer for performance? Are there any other VMs running on the XenServer host? How are you connecting to your VM to run the benchmark? What's your network speed (to desk + backend)? Are you connecting over a LAN or WAN? What FPS are you getting (Hi / Low)? What are you using for storage? What screen resolution is the VM running? Any Citrix policies applied? The K1 is capable of running that benchmark, but there are limitations and also things you can do to make it perform better. It's not just a case of throwing a GPU in a server, assigning it to a VM and away you go. There are more variables to consider to get the best out of it. In order for us to help, you'll have to give us a little more information than you have, hence all the questions above ;-) Regards Ben
Hi

What are the spec's of your laptop in comparison? (CPU, RAM, disk, screen resolution) and what FPS are you getting? (Hi / Low (it says at the end of the benchmark))

Have you updated the Firmware on the Hypervisor host?
Have you tuned the BIOS for Maximum Performance?
Have you tuned XenServer for performance?
Are there any other VMs running on the XenServer host?
How are you connecting to your VM to run the benchmark?
What's your network speed (to desk + backend)?
Are you connecting over a LAN or WAN?
What FPS are you getting (Hi / Low)?
What are you using for storage?
What screen resolution is the VM running?
Any Citrix policies applied?

The K1 is capable of running that benchmark, but there are limitations and also things you can do to make it perform better. It's not just a case of throwing a GPU in a server, assigning it to a VM and away you go. There are more variables to consider to get the best out of it. In order for us to help, you'll have to give us a little more information than you have, hence all the questions above ;-)

Regards

Ben

#4
Posted 04/09/2015 11:40 AM   
Hi, Benji Thanks for you reply! The spec's of laptop: [code]CPU: Intel Core I7-4710MQ RAM: 8GB disk: WDC WD10SPCX 7.2K SATA disk OS: Windows 7 Ultimate screen resolution: 1920 * 1080 with full screen mode (the same as the VM)[/code] 1. Have you updated the Firmware on the Hypervisor host? No, would the BIOS be a issue? [code] BIOS Information Vendor: American Megatrends Inc. Version: 3.0b Release Date: 01/02/2014 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 12288 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 3.11[/code] 2. Have you tuned the BIOS for Maximum Performance? Yes! I disabled the Hyper Threading too (whether disable or not, all with no luck) 3. Have you tuned XenServer for performance? Yes, I also tuned the Turbo Mode on 4. Are there any other VMs running on the XenServer host? No, only 1 VM running on the XenServer 5. How are you connecting to your VM to run the benchmark? Citrix Receiver on the laptop, XenDesktop 7.6 VDA + HDX 3D Pro on the VM side 6. What's your network speed (to desk + backend)? 1GB Ethernet 7. Are you connecting over a LAN or WAN? LAN 8. What FPS are you getting (Hi / Low)? On the laptop, I get 34.1 FPS, and on the VM there is 9.1 FPS (I disabled the FRL)! 9. What are you using for storage? Intel SSDSC2BB12, I also check the IO load on the Dom0 when benchmark running [code] Linux 3.10.0+2 (localhost) 04/10/2015 avg-cpu: %user %nice %system %iowait %steal %idle 0.26 0.00 0.44 0.01 0.11 99.17 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 39.78 1313.05 836.88 5352310 3411323[/code] 10. What screen resolution is the VM running? 1920 * 1080, with full screen mode 11. Any Citrix policies applied? Yes, I googled for advices, changed none policy to the following but all with no luck [code]Desktop Composition Redirection: Disabled HDX3DPro quality settings: 6553680 ( I changed the min and max value, but doesn't take effect) Lossy compression level: None Lossy compression threshold value: 10240 Kbps Minimum image quality: Very High Moving image compression: Disabled Queuing and tossing: Disabled Target frame rate: 60fps Target minimum frame rate: 20 fps Visual quality: High[/code]
Hi, Benji

Thanks for you reply!


The spec's of laptop:

CPU: Intel Core I7-4710MQ
RAM: 8GB
disk: WDC WD10SPCX 7.2K SATA disk
OS: Windows 7 Ultimate
screen resolution: 1920 * 1080 with full screen mode (the same as the VM)



1. Have you updated the Firmware on the Hypervisor host?
No, would the BIOS be a issue?

BIOS Information
Vendor: American Megatrends Inc.
Version: 3.0b
Release Date: 01/02/2014
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 12288 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 3.11



2. Have you tuned the BIOS for Maximum Performance?
Yes! I disabled the Hyper Threading too (whether disable or not, all with no luck)

3. Have you tuned XenServer for performance?
Yes, I also tuned the Turbo Mode on

4. Are there any other VMs running on the XenServer host?
No, only 1 VM running on the XenServer

5. How are you connecting to your VM to run the benchmark?
Citrix Receiver on the laptop, XenDesktop 7.6 VDA + HDX 3D Pro on the VM side

6. What's your network speed (to desk + backend)?
1GB Ethernet

7. Are you connecting over a LAN or WAN?
LAN

8. What FPS are you getting (Hi / Low)?
On the laptop, I get 34.1 FPS, and on the VM there is 9.1 FPS (I disabled the FRL)!

9. What are you using for storage?
Intel SSDSC2BB12, I also check the IO load on the Dom0 when benchmark running


Linux 3.10.0+2 (localhost)      04/10/2015

avg-cpu: %user %nice %system %iowait %steal %idle
0.26 0.00 0.44 0.01 0.11 99.17

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 39.78 1313.05 836.88 5352310 3411323



10. What screen resolution is the VM running?
1920 * 1080, with full screen mode

11. Any Citrix policies applied?
Yes, I googled for advices, changed none policy to the following but all with no luck


Desktop Composition Redirection:  Disabled
HDX3DPro quality settings: 6553680 ( I changed the min and max value, but doesn't take effect)
Lossy compression level: None
Lossy compression threshold value: 10240 Kbps
Minimum image quality: Very High
Moving image compression: Disabled
Queuing and tossing: Disabled
Target frame rate: 60fps
Target minimum frame rate: 20 fps
Visual quality: High

#5
Posted 04/10/2015 03:22 AM   
Hi Weizhang Ok, the laptop has a much better spec than your VM: *CPU - 2.5GHz compared to 2.0GHz (2.0GHz is very slow these days) (Yes, I realize there’s more to it than pure MHz, but this should not be overlooked) *RAM - 8GB compared to 4GB *GPU - GTX 850 compared to K1 Passthrough And to top it all off, it’s all running locally without a Hypervisor in the way. Unless otherwise advised by the server hardware vendor, NVIDIA or Citrix, yes, you should absolutely be running the latest firmware / BIOS. The Hypervisor relies on the BIOS being configured appropriately, otherwise it cannot make use of any of the performance or specific features. Firmware should be up to date for functionality. When tuning the BIOS for performance, don’t forget the cooling. If not configured correctly, this can throttle back the overall performance on some servers. Personally, I always leave Hyperthreading enabled and have always had positive results. Although there are some recommendations out there to disable it, but this is application specific and you should check with the application vendor about which is best. But certainly for Unigine benchmarks, leaving enabled will be fine. 1GB to desk is ok, nothing less than that though… and make sure it’s hardwired for consistency. So the FPS generated by the VM there is pretty low. Is that a maximum FPS? If you’re running Passthrough, disabling FRL won’t make any difference, this is only for vGPU profiles. Ignore the FPS for a second… What did the benchmark actually look like when you ran it on your laptop? At over 30FPS, I’m guessing it ran quite smoothly and looked ok? So why the need to try and run it at 60FPS on the VM? What I’m getting at here, is that just because there’s an option to run it at 60FPS, doesn’t mean it needs to be run that high. Don’t focus on the numbers, focus on the quality of what you’re seeing and the user experience and whether it’s good enough. You can lose (disable) most of those Citrix policies, as with your current configuration they won’t really be helping. You’re looking to conserve bandwidth, not try to consume it with 60FPS :-) When you run the Unigine benchmark, what settings are you configuring? Do you run it as default, or do you configure the quality? I'd be expecting around the 20 – 25FPS without much tuning (although differing hardware setups will vary those results) and although not exactly “amazing!”, is much more watchable than your 9.1FPS will be :-) Don’t get me wrong, it will never be in the same league as a GRID K2 (which will push FPS strait into the hundreds on this particular benchmark) as they are designed for completely different things, but as a basic GPU, if used for the correct tasks, is certainly adequate. Regards Ben
Hi Weizhang

Ok, the laptop has a much better spec than your VM:

*CPU - 2.5GHz compared to 2.0GHz (2.0GHz is very slow these days) (Yes, I realize there’s more to it than pure MHz, but this should not be overlooked)
*RAM - 8GB compared to 4GB
*GPU - GTX 850 compared to K1 Passthrough

And to top it all off, it’s all running locally without a Hypervisor in the way.

Unless otherwise advised by the server hardware vendor, NVIDIA or Citrix, yes, you should absolutely be running the latest firmware / BIOS.

The Hypervisor relies on the BIOS being configured appropriately, otherwise it cannot make use of any of the performance or specific features. Firmware should be up to date for functionality. When tuning the BIOS for performance, don’t forget the cooling. If not configured correctly, this can throttle back the overall performance on some servers.

Personally, I always leave Hyperthreading enabled and have always had positive results. Although there are some recommendations out there to disable it, but this is application specific and you should check with the application vendor about which is best. But certainly for Unigine benchmarks, leaving enabled will be fine.

1GB to desk is ok, nothing less than that though… and make sure it’s hardwired for consistency.

So the FPS generated by the VM there is pretty low. Is that a maximum FPS?

If you’re running Passthrough, disabling FRL won’t make any difference, this is only for vGPU profiles.

Ignore the FPS for a second… What did the benchmark actually look like when you ran it on your laptop? At over 30FPS, I’m guessing it ran quite smoothly and looked ok? So why the need to try and run it at 60FPS on the VM? What I’m getting at here, is that just because there’s an option to run it at 60FPS, doesn’t mean it needs to be run that high. Don’t focus on the numbers, focus on the quality of what you’re seeing and the user experience and whether it’s good enough.

You can lose (disable) most of those Citrix policies, as with your current configuration they won’t really be helping. You’re looking to conserve bandwidth, not try to consume it with 60FPS :-)

When you run the Unigine benchmark, what settings are you configuring? Do you run it as default, or do you configure the quality? I'd be expecting around the 20 – 25FPS without much tuning (although differing hardware setups will vary those results) and although not exactly “amazing!”, is much more watchable than your 9.1FPS will be :-) Don’t get me wrong, it will never be in the same league as a GRID K2 (which will push FPS strait into the hundreds on this particular benchmark) as they are designed for completely different things, but as a basic GPU, if used for the correct tasks, is certainly adequate.

Regards

Ben

#6
Posted 04/10/2015 09:36 AM   
[quote="Benji"] You can lose (disable) most of those Citrix policies, as with your current configuration they won’t really be helping. You’re looking to conserve bandwidth, not try to consume it with 60FPS :-) [/quote] The target FPS has no bearing on bandwidth, unless you're actually achieving that level in the VM. Where Weizhang is only achieving 9fps it will only be transmitting 9fps. What that level of setting allows for is. 1. Ensuring that what the VM is generating is what the end user see's as it's the same as the FRL and the max the protocol allows. 2. It helps to mitigate lag with software cursors. 60fps is around 15ms, 30fps around 33ms, 10fps is 100ms. So there's good reason to set that value there. The policy looks like one I posted here, and on the Citrix forums (where there's explanation of the reasoning for each) and there's a large amount in there for fallback purposes in the event a user connects in legacy mode due to client capability. The key values though are DCR - Off Visual Quality - High Target FPS - 60 Though none of those will affect benchmark scores. The Unigene benchmark will be affected by CPU clock speed (it's single threaded) and GPU resources. What applications are going to be delivered? Unigene Heaven and Valley don't reflect typical enterprise usage, so I'd suggest picking a more appropriate benchmarking tool, and a set of measures that more truly represents the intended use case.
Benji said:

You can lose (disable) most of those Citrix policies, as with your current configuration they won’t really be helping. You’re looking to conserve bandwidth, not try to consume it with 60FPS :-)



The target FPS has no bearing on bandwidth, unless you're actually achieving that level in the VM. Where Weizhang is only achieving 9fps it will only be transmitting 9fps.

What that level of setting allows for is.

1. Ensuring that what the VM is generating is what the end user see's as it's the same as the FRL and the max the protocol allows.
2. It helps to mitigate lag with software cursors. 60fps is around 15ms, 30fps around 33ms, 10fps is 100ms.

So there's good reason to set that value there.

The policy looks like one I posted here, and on the Citrix forums (where there's explanation of the reasoning for each) and there's a large amount in there for fallback purposes in the event a user connects in legacy mode due to client capability.

The key values though are

DCR - Off
Visual Quality - High
Target FPS - 60

Though none of those will affect benchmark scores. The Unigene benchmark will be affected by CPU clock speed (it's single threaded) and GPU resources.

What applications are going to be delivered? Unigene Heaven and Valley don't reflect typical enterprise usage, so I'd suggest picking a more appropriate benchmarking tool, and a set of measures that more truly represents the intended use case.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#7
Posted 04/10/2015 09:53 PM   
Hi, all Thanks for all your kind help! The 9.1 FPS is the average FPS, and I used the default configuration when running Valley, maybe the resolution too high (1920 * 1080) to get the better user experience and I overused the K1! We propose to delivery a app which would handle about one million triangular facets (the human 3D model), and maybe need a more powerful GPU, would the K2 or K5 be the choice?
Hi, all

Thanks for all your kind help!

The 9.1 FPS is the average FPS, and I used the default configuration when running Valley, maybe the resolution too high (1920 * 1080) to get the better user experience and I overused the K1!

We propose to delivery a app which would handle about one million triangular facets (the human 3D model), and maybe need a more powerful GPU, would the K2 or K5 be the choice?

#8
Posted 04/13/2015 02:02 AM   
Jason - That's fair enough, I stand corrected. Every day's a school day :-) That's a good point, if the GPU's not hitting the desired FPS, then at this stage the bandwidth has nothing to do with it. I overlooked that bit, and was thinking of the issues I've experienced with the K2 and bandwidth. Weizhang - As Jason mentions above, it's best to test with applications that are going to be at least similar if not the actual application that is going to be used. Otherwise you can get caught up in chasing things that bear no relevance on production usage. If you're going to be delivering anything like the human 3D model, you'll absolutely want the K2. Although try it with the K1 so you have a reference between the 2 cards (Have a play with this: [url]http://www.nvidia.co.uk/coolstuff/demos#!/lifelike-human-face-rendering[/url]). You may also want to revise your server specs if that's possible at this stage... 1920x1080 is not too high for the K1 if you use it for its intended purpose, but the K1s purpose is not to run benchmarks like that, that's the K2s job ;-) That being said, the screen resolution plays a massive part of the overall requirements and experience (drop Unigine Valley down to 1024x768 and see the difference). What screen resolution do you plan to run in production? Regards Ben
Jason - That's fair enough, I stand corrected. Every day's a school day :-)

That's a good point, if the GPU's not hitting the desired FPS, then at this stage the bandwidth has nothing to do with it. I overlooked that bit, and was thinking of the issues I've experienced with the K2 and bandwidth.

Weizhang - As Jason mentions above, it's best to test with applications that are going to be at least similar if not the actual application that is going to be used. Otherwise you can get caught up in chasing things that bear no relevance on production usage. If you're going to be delivering anything like the human 3D model, you'll absolutely want the K2. Although try it with the K1 so you have a reference between the 2 cards (Have a play with this: http://www.nvidia.co.uk/coolstuff/demos#!/lifelike-human-face-rendering). You may also want to revise your server specs if that's possible at this stage...

1920x1080 is not too high for the K1 if you use it for its intended purpose, but the K1s purpose is not to run benchmarks like that, that's the K2s job ;-) That being said, the screen resolution plays a massive part of the overall requirements and experience (drop Unigine Valley down to 1024x768 and see the difference). What screen resolution do you plan to run in production?

Regards

Ben

#9
Posted 04/13/2015 07:37 AM   
Hi Benji, It's a pity that the app is on developing, I need to ensure that this platform could fulfil our performance demand. I will try the new benchmark tools, and thanks for you kind help:)
Hi Benji,

It's a pity that the app is on developing, I need to ensure that this platform could fulfil our performance demand.

I will try the new benchmark tools, and thanks for you kind help:)

#10
Posted 04/14/2015 01:39 AM   
Sorry for repeat reply :)
Sorry for repeat reply :)

#11
Posted 04/14/2015 01:44 AM   
No worries, happy to try and help where I can :-) If you plan on running 3D models though, I'd definitely look at a revised server spec at this stage, even if it's just to rule it out, it's good to know what all the options are before production. Let us know how you get on with your testing... Regards Ben
No worries, happy to try and help where I can :-)

If you plan on running 3D models though, I'd definitely look at a revised server spec at this stage, even if it's just to rule it out, it's good to know what all the options are before production.

Let us know how you get on with your testing...

Regards

Ben

#12
Posted 04/14/2015 11:06 AM   
Scroll To Top

Add Reply