NVIDIA
Horizon - Video playback performance
Hello, I have performance problems with playback of videos in my VDI machines. I'm using Blast Extreme accelerated by a Nvidia P40 GPU. Playback is always a little bit stuttering. I usually get a better result from my Linux VDI pool that isn't GPU accelerated. If I check with youtube the problem seems to be related with encoding since I don't se dropped frames reported by the video player. What is your experience in fullscreen video playback with the P series? Cristiano
Hello,

I have performance problems with playback of videos in my VDI machines. I'm using Blast Extreme accelerated by a Nvidia P40 GPU. Playback is always a little bit stuttering. I usually get a better result from my Linux VDI pool that isn't GPU accelerated. If I check with youtube the problem seems to be related with encoding since I don't se dropped frames reported by the video player.

What is your experience in fullscreen video playback with the P series?

Cristiano

#1
Posted 02/26/2018 12:13 PM   
Hi, I'm having no issues and super smooth playback even with 4k and Youtube with P40. Which browser are you using? If you would using a Maxwell board I would assume the issue is related to decoding but Pascal already supports VP9 hardware decoding (used from Youtube) so that this cannot be the issue. What OS and resolution are you using? Regards Simon
Hi,

I'm having no issues and super smooth playback even with 4k and Youtube with P40. Which browser are you using?
If you would using a Maxwell board I would assume the issue is related to decoding but Pascal already supports VP9 hardware decoding (used from Youtube) so that this cannot be the issue.
What OS and resolution are you using?

Regards

Simon

#2
Posted 02/26/2018 01:16 PM   
Hello Simon, I'm using W10 64 1709 with 3 cpu and 5GB of Ram, storage is all flash. The profile is just a P40-1Q since I'm going for density. I've tried with edge firefox and chrome, aeme issue. Decoding the video from youtube shouldn't be the issue, I don't see dropped frames. The issue is not just youtube, every video app is a little bit jerky. I also tried the Nvidia lab, that is available with registration and playback is smooth there. I have also an SR open with VMware, so I'm working on multiple fronts. regards Cristiano
Hello Simon,

I'm using W10 64 1709 with 3 cpu and 5GB of Ram, storage is all flash. The profile is just a P40-1Q since I'm going for density. I've tried with edge firefox and chrome, aeme issue. Decoding the video from youtube shouldn't be the issue, I don't see dropped frames. The issue is not just youtube, every video app is a little bit jerky. I also tried the Nvidia lab, that is available with registration and playback is smooth there. I have also an SR open with VMware, so I'm working on multiple fronts.

regards

Cristiano

#3
Posted 02/26/2018 02:14 PM   
Hi Cristiano, could you please change the scheduler from default (equal share) to best effort for your deployment and test once again? I'm curious to hear if it makes any difference. http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy-all-gpus regards Simon
Hi Cristiano,

could you please change the scheduler from default (equal share) to best effort for your deployment and test once again? I'm curious to hear if it makes any difference.

http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy-all-gpus

regards

Simon

#4
Posted 02/27/2018 08:03 AM   
Hello Simon, Wen I will upgrade to 5.2 I will try that, since it requires a restart. Right now is difficult for me to find a windows for downtime Regards Cristiano
Hello Simon,

Wen I will upgrade to 5.2 I will try that, since it requires a restart. Right now is difficult for me to find a windows for downtime

Regards

Cristiano

#5
Posted 02/28/2018 04:53 PM   
Hello Simon, didn't switch mode, but GPU wise it seems ok, those are the statistics: GPU Session Process Codec H V Average Average # Idx Id Id Type Res Res FPS Latency(us) 1 11 996 H.264 1710 1008 92 4469 1 11 996 H.264 1710 1008 77 5875 1 11 996 H.264 1710 1008 82 4556 1 11 996 H.264 1710 1008 73 4962 1 11 996 H.264 1710 1008 71 4394 1 11 996 H.264 1710 1008 80 4421 1 11 996 H.264 1710 1008 82 4129 1 11 996 H.264 1710 1008 93 3806 1 11 996 H.264 1710 1008 83 4308 1 11 996 H.264 1710 1008 76 4450 so it seems that encoding is fine. Are 80/90 FPS not a little too much? Also decoding should be fine as youtube doesn't report dropped frames. Regards Cristiano
Hello Simon,


didn't switch mode, but GPU wise it seems ok, those are the statistics:

GPU Session Process Codec H V Average Average
# Idx Id Id Type Res Res FPS Latency(us)

1 11 996 H.264 1710 1008 92 4469
1 11 996 H.264 1710 1008 77 5875
1 11 996 H.264 1710 1008 82 4556
1 11 996 H.264 1710 1008 73 4962
1 11 996 H.264 1710 1008 71 4394
1 11 996 H.264 1710 1008 80 4421
1 11 996 H.264 1710 1008 82 4129
1 11 996 H.264 1710 1008 93 3806
1 11 996 H.264 1710 1008 83 4308
1 11 996 H.264 1710 1008 76 4450


so it seems that encoding is fine. Are 80/90 FPS not a little too much?
Also decoding should be fine as youtube doesn't report dropped frames.

Regards

Cristiano

#6
Posted 03/05/2018 09:14 AM   
Hi Cristiano, it depends on the scheduler. As default scheduler has no FRL it renders more than 60fps. You can switch to best effort to have FRL=60fps which is more than sufficient. And I agree that the GPU cannot be the issue in your case as we render more than enough frames :) Regards Simon
Hi Cristiano,

it depends on the scheduler. As default scheduler has no FRL it renders more than 60fps.
You can switch to best effort to have FRL=60fps which is more than sufficient. And I agree that the GPU cannot be the issue in your case as we render more than enough frames :)

Regards

Simon

#7
Posted 03/05/2018 11:56 AM   
I have the same problem many years (gracefully ignored by Nvidia). It is not problem of video decoder (nvdec) or 3d renderer (DX/OpenGL) but the problem is in interaction between "Compositing window manager" (DWM.exe) (with composer like AERO or newer) and public NVidia Capture SDK. It simple does not trigger "new frame" event in public NVidia Capture SDK (function like NvFBCToSysGrabFrame()) and rendered frame is lost. I do not known which version of public or NDA NVidia Capture SDK is used in your Blast Extreme version. I have not any usable version of public NVidia Capture SDK because K1/K2/K520/K340 are not supported any more. Check FPS problems (also in video) - [url]https://gridforums.nvidia.com/default/topic/1149/#4247[/url].
I have the same problem many years (gracefully ignored by Nvidia). It is not problem of video decoder (nvdec) or 3d renderer (DX/OpenGL) but the problem is in interaction between "Compositing window manager" (DWM.exe) (with composer like AERO or newer) and public NVidia Capture SDK. It simple does not trigger "new frame" event in public NVidia Capture SDK (function like NvFBCToSysGrabFrame()) and rendered frame is lost.
I do not known which version of public or NDA NVidia Capture SDK is used in your Blast Extreme version. I have not any usable version of public NVidia Capture SDK because K1/K2/K520/K340 are not supported any more. Check FPS problems (also in video) - https://gridforums.nvidia.com/default/topic/1149/#4247.

#8
Posted 03/05/2018 06:30 PM   
Try using the GRID 5.1 drivers instead.
Try using the GRID 5.1 drivers instead.

#9
Posted 03/10/2018 01:40 AM   
I spotted better results in latest GRID release if I set parameter "sw_vsync_enabled=1" for virtual machine (see [url]https://gridforums.nvidia.com/default/topic/258/[/url]). Try it ...
I spotted better results in latest GRID release if I set parameter "sw_vsync_enabled=1" for virtual machine (see https://gridforums.nvidia.com/default/topic/258/). Try it ...

#10
Posted 03/11/2018 08:21 AM   
I`m experiencing the same problem that Christiano mentioned before. We got Tesla P40 GPU`s in an Horizon environment in version 7.4.0 on ESXi Hosts (HPE DL380G10) in version 6.5U1. I´ve already switched the scheduler to best effort, but did not see a difference. I am using 2 full hd monitors and the blast extreme protocol. The vGPU profile i am using is "grid_p40-2q". This is the output i get from the hypervisor shell when i run "nvidia-smi vgpu -es": This output was generated while playing a YouTube video through firefox. At the first "7 fps" lines the video was not startet, it startet later when those fps counters rose. It seems like the GPU is rendering way too much frames, but i thought it is limited due to the best effort scheduler... # GPU vGPU Session Process Codec H V Average Average # Idx Id Id Id Type Res Res FPS Latency(us) 0 230381 34 2264 H.264 1920 1080 7 333 0 230381 35 2264 H.264 1920 1080 7 443 0 230381 34 2264 H.264 1920 1080 7 333 0 230381 35 2264 H.264 1920 1080 7 339 0 230381 34 2264 H.264 1920 1080 7 332 0 230381 35 2264 H.264 1920 1080 7 348 0 230381 34 2264 H.264 1920 1080 15 958 0 230381 35 2264 H.264 1920 1080 8 838 0 230381 34 2264 H.264 1920 1080 126 735 0 230381 35 2264 H.264 1920 1080 43 1253 0 230381 34 2264 H.264 1920 1080 1188 646 0 230381 35 2264 H.264 1920 1080 30 516 0 230381 34 2264 H.264 1920 1080 122 706 0 230381 35 2264 H.264 1920 1080 27 475 0 230381 34 2264 H.264 1920 1080 108 1160 0 230381 35 2264 H.264 1920 1080 12 473 0 230381 34 2264 H.264 1920 1080 78 1234 0 230381 35 2264 H.264 1920 1080 7 279 0 230381 34 2264 H.264 1920 1080 58 1167 0 230381 35 2264 H.264 1920 1080 7 229 Any advice on how to fix this?
I`m experiencing the same problem that Christiano mentioned before.

We got Tesla P40 GPU`s in an Horizon environment in version 7.4.0 on ESXi Hosts (HPE DL380G10) in version 6.5U1.
I´ve already switched the scheduler to best effort, but did not see a difference.
I am using 2 full hd monitors and the blast extreme protocol.
The vGPU profile i am using is "grid_p40-2q".

This is the output i get from the hypervisor shell when i run "nvidia-smi vgpu -es":
This output was generated while playing a YouTube video through firefox.
At the first "7 fps" lines the video was not startet, it startet later when those fps counters rose.
It seems like the GPU is rendering way too much frames, but i thought it is limited due to the best effort scheduler...

# GPU vGPU Session Process Codec H V Average Average
# Idx Id Id Id Type Res Res FPS Latency(us)
0 230381 34 2264 H.264 1920 1080 7 333
0 230381 35 2264 H.264 1920 1080 7 443
0 230381 34 2264 H.264 1920 1080 7 333
0 230381 35 2264 H.264 1920 1080 7 339
0 230381 34 2264 H.264 1920 1080 7 332
0 230381 35 2264 H.264 1920 1080 7 348
0 230381 34 2264 H.264 1920 1080 15 958
0 230381 35 2264 H.264 1920 1080 8 838
0 230381 34 2264 H.264 1920 1080 126 735
0 230381 35 2264 H.264 1920 1080 43 1253
0 230381 34 2264 H.264 1920 1080 1188 646
0 230381 35 2264 H.264 1920 1080 30 516
0 230381 34 2264 H.264 1920 1080 122 706
0 230381 35 2264 H.264 1920 1080 27 475
0 230381 34 2264 H.264 1920 1080 108 1160
0 230381 35 2264 H.264 1920 1080 12 473
0 230381 34 2264 H.264 1920 1080 78 1234
0 230381 35 2264 H.264 1920 1080 7 279
0 230381 34 2264 H.264 1920 1080 58 1167
0 230381 35 2264 H.264 1920 1080 7 229


Any advice on how to fix this?

#11
Posted 06/07/2018 09:38 AM   
Hi prinz, could you run "nvidia-smi encodersessions" within the VM? For me it works as expected with P40-2Q and best effort playing a FullHD FULL screen youtube video: # GPU Session Process Codec H V Average Average # Idx Id Id Type Res Res FPS Latency(us) 0 6 1516 H.264 1920 1200 59 2664 0 6 1516 H.264 1920 1200 37 2762 0 6 1516 H.264 1920 1200 56 2965 0 6 1516 H.264 1920 1200 58 2887 0 6 1516 H.264 1920 1200 60 2851 0 6 1516 H.264 1920 1200 59 3063 0 6 1516 H.264 1920 1200 60 2993 0 6 1516 H.264 1920 1200 60 3104 Regards Simon
Hi prinz,

could you run "nvidia-smi encodersessions" within the VM?

For me it works as expected with P40-2Q and best effort playing a FullHD FULL screen youtube video:

# GPU Session Process Codec H V Average Average
# Idx Id Id Type Res Res FPS Latency(us)
0 6 1516 H.264 1920 1200 59 2664
0 6 1516 H.264 1920 1200 37 2762
0 6 1516 H.264 1920 1200 56 2965
0 6 1516 H.264 1920 1200 58 2887
0 6 1516 H.264 1920 1200 60 2851
0 6 1516 H.264 1920 1200 59 3063
0 6 1516 H.264 1920 1200 60 2993
0 6 1516 H.264 1920 1200 60 3104

Regards

Simon

#12
Posted 06/08/2018 08:54 AM   
I wonder about the effectiveness of the 1Q profile for full 4K video. I've been looking at the difference between P6-1Q, P6-2Q and P6-4Q profiles and video performance. 4Q is super smooth, but with 2Q there is roughly a 25% drop in performance and with a 1Q profile a further 25% drop from there. I have default scheduler settings and have put this down to simple frame buffer capacity but something doesn't sit right with me in that the software encoding engine (Blast in this case) is still doing the same number of pixels (the same video is being used) so why should the frame buffer make such a difference?
I wonder about the effectiveness of the 1Q profile for full 4K video. I've been looking at the difference between P6-1Q, P6-2Q and P6-4Q profiles and video performance. 4Q is super smooth, but with 2Q there is roughly a 25% drop in performance and with a 1Q profile a further 25% drop from there.

I have default scheduler settings and have put this down to simple frame buffer capacity but something doesn't sit right with me in that the software encoding engine (Blast in this case) is still doing the same number of pixels (the same video is being used) so why should the frame buffer make such a difference?

#13
Posted 06/08/2018 07:04 PM   
Hello Simon, i ran the command "nvidia-smi encodersessions" within the VM while playing a YouTube 1080p video and got that output: # GPU Session Process Codec H V Average Average # Idx Id Id Type Res Res FPS Latency(us) 0 9 1056 H.264 1920 1080 0 0 0 10 1056 H.264 1920 1080 31 840 0 9 1056 H.264 1920 1080 0 0 0 10 1056 H.264 1920 1080 35 768 0 9 1056 H.264 1920 1080 0 0 0 10 1056 H.264 1920 1080 31 805 0 9 1056 H.264 1920 1080 696 1232 0 10 1056 H.264 1920 1080 31 703 0 9 1056 H.264 1920 1080 36 741 0 9 1056 H.264 1920 1080 143 1190 0 10 1056 H.264 1920 1080 32 708 0 9 1056 H.264 1920 1080 722 1113 0 10 1056 H.264 1920 1080 32 750 0 9 1056 H.264 1920 1080 119 1137 0 10 1056 H.264 1920 1080 36 781 0 9 1056 H.264 1920 1080 145 1144 0 10 1056 H.264 1920 1080 38 811 0 9 1056 H.264 1920 1080 116 1170 0 10 1056 H.264 1920 1080 34 723 If i connect to the desktop-pool with only one active monitor and watch the exact same video in fullscreen, i get that output: # GPU Session Process Codec H V Average Average # Idx Id Id Type Res Res FPS Latency(us) 0 7 13076 H.264 1920 1080 740 1339 0 7 13076 H.264 1920 1080 749 1324 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 110 999 0 7 13076 H.264 1920 1080 130 1201 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 121 1111 0 7 13076 H.264 1920 1080 256 3903 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 0 7 13076 H.264 1920 1080 0 0 On the hypervisor it looks like this when i play the video with one active monitor: # GPU vGPU Session Process Codec H V Average Average # Idx Id Id Id Type Res Res FPS Latency(us) 0 855153 17 13076 H.264 1920 1080 115 1469 0 855153 17 13076 H.264 1920 1080 115 1372 0 855153 17 13076 H.264 1920 1080 144 6922 0 855153 17 13076 H.264 1920 1080 128 1252 0 855153 17 13076 H.264 1920 1080 120 1168 0 855153 17 13076 H.264 1920 1080 116 1657 0 855153 17 13076 H.264 1920 1080 1736 135 0 855153 17 13076 H.264 1920 1080 115 1240 0 855153 17 13076 H.264 1920 1080 122 1337 0 855153 17 13076 H.264 1920 1080 124 1346 If i run "nvidia-smi vgpu -q" on the hypervisor, I can see the line "Frame Rate Limit: 60FPS", but it does not limit anything if i get this right... Regards, Dominik
Hello Simon,

i ran the command "nvidia-smi encodersessions" within the VM while playing a YouTube 1080p video and got that output:

# GPU Session Process Codec H V Average Average
# Idx Id Id Type Res Res FPS Latency(us)
0 9 1056 H.264 1920 1080 0 0
0 10 1056 H.264 1920 1080 31 840
0 9 1056 H.264 1920 1080 0 0
0 10 1056 H.264 1920 1080 35 768
0 9 1056 H.264 1920 1080 0 0
0 10 1056 H.264 1920 1080 31 805
0 9 1056 H.264 1920 1080 696 1232
0 10 1056 H.264 1920 1080 31 703
0 9 1056 H.264 1920 1080 36 741
0 9 1056 H.264 1920 1080 143 1190
0 10 1056 H.264 1920 1080 32 708
0 9 1056 H.264 1920 1080 722 1113
0 10 1056 H.264 1920 1080 32 750
0 9 1056 H.264 1920 1080 119 1137
0 10 1056 H.264 1920 1080 36 781
0 9 1056 H.264 1920 1080 145 1144
0 10 1056 H.264 1920 1080 38 811
0 9 1056 H.264 1920 1080 116 1170
0 10 1056 H.264 1920 1080 34 723

If i connect to the desktop-pool with only one active monitor and watch the exact same video in fullscreen, i get that output:

# GPU Session Process Codec H V Average Average
# Idx Id Id Type Res Res FPS Latency(us)
0 7 13076 H.264 1920 1080 740 1339
0 7 13076 H.264 1920 1080 749 1324
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 110 999
0 7 13076 H.264 1920 1080 130 1201
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 121 1111
0 7 13076 H.264 1920 1080 256 3903
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0
0 7 13076 H.264 1920 1080 0 0

On the hypervisor it looks like this when i play the video with one active monitor:

# GPU vGPU Session Process Codec H V Average Average
# Idx Id Id Id Type Res Res FPS Latency(us)
0 855153 17 13076 H.264 1920 1080 115 1469
0 855153 17 13076 H.264 1920 1080 115 1372
0 855153 17 13076 H.264 1920 1080 144 6922
0 855153 17 13076 H.264 1920 1080 128 1252
0 855153 17 13076 H.264 1920 1080 120 1168
0 855153 17 13076 H.264 1920 1080 116 1657
0 855153 17 13076 H.264 1920 1080 1736 135
0 855153 17 13076 H.264 1920 1080 115 1240
0 855153 17 13076 H.264 1920 1080 122 1337
0 855153 17 13076 H.264 1920 1080 124 1346

If i run "nvidia-smi vgpu -q" on the hypervisor, I can see the line "Frame Rate Limit: 60FPS", but it does not limit anything if i get this right...

Regards,
Dominik

#14
Posted 06/11/2018 05:15 AM   
Hi Dominik, I agree this looks like you don't have FRL in place. Could you please double check that you're running Best Effort scheduler? Try the latest GRID6.1 package and you should have Best Effort by default... Regards Simon
Hi Dominik,

I agree this looks like you don't have FRL in place. Could you please double check that you're running Best Effort scheduler? Try the latest GRID6.1 package and you should have Best Effort by default...

Regards

Simon

#15
Posted 06/17/2018 07:24 AM   
Scroll To Top

Add Reply