NVIDIA
Dell 730 with Single Tesla - Receiving ESXi Purple Screen
Hello, Currently we are encountering ESXi purple screens when using the M60 0Q profile to run multiple video streams across multiple Windows 7 VMs (6 streams per desktop). Each desktop (Windows VM) is running the GRID 2.0 Windows driver (this is stated in a separate NVIDIA post as the correct driver to be using). The purple screen occurs when the 12 streams are running for about 5 minutes. We have also seen the purple screen occur sooner (within minutes) when running 18 streams across three desktops. Thanks, MHL
Hello,
Currently we are encountering ESXi purple screens when using the M60 0Q profile to run multiple video streams across multiple Windows 7 VMs (6 streams per desktop). Each desktop (Windows VM) is running the GRID 2.0 Windows driver (this is stated in a separate NVIDIA post as the correct driver to be using). The purple screen occurs when the 12 streams are running for about 5 minutes. We have also seen the purple screen occur sooner (within minutes) when running 18 streams across three desktops.

Thanks,
MHL

#1
Posted 03/16/2016 11:55 PM   
Hello MHL, can you please explain what application you are using, what a stream is in this context and what your build looks like? Thanks, Erik Erik Bohnhorst | GRID Performance Architect NVIDIA Corporation
Hello MHL,

can you please explain what application you are using, what a stream is in this context and what your build looks like?

Thanks,
Erik

Erik Bohnhorst | GRID Performance Architect
NVIDIA Corporation

#2
Posted 03/17/2016 11:13 PM   
Hello, Yes, we are using Genetec Security Center 5.3 to generate the video streams. We receive the purple screen when receiving 12 streams across two thin clients (6 streams per Windows 7 client). Here is some additional data from the dump files: 2016-03-16T22:07:54.568Z cpu0:36485)WARNING: NMI: 911: NMI received; attempting to diagnose...^[[0m NVRM: GPU at 0000:84:00.0 has fallen off the bus. NVRM: GPU is on Board 0323015037010. 2016-03-16T22:07:54.568Z cpu0:36485)World: 9729: PRDA 0x418040000000 ss 0x4018 ds 0x4018 es 0x4018 fs 0x0 gs 0x0 2016-03-16T22:07:54.568Z cpu0:36485)World: 9731: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff) 2016-03-16T22:07:54.568Z cpu0:36485)World: 9732: CR0 0x8005003b CR3 0x20feb42000 CR4 0x42660 NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded. NVRM: GPU at 0000:85:00.0 has fallen off the bus. NVRM: GPU is on Board 0323015037010. NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded. We initially encountered the purple screen using GRID 2.1 (NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_352.70-1OEM.600.0.0.2494585.vib and 354.56_grid_win8_win7_64bit_international.exe Windows drivers). We then tried using GRID 2.0 Windows drivers (354.13_grid_win8_win7_64bit_international.exe), while using the same VIB and encountered the same purple screen again.
Hello,
Yes, we are using Genetec Security Center 5.3 to generate the video streams. We receive the purple screen when receiving 12 streams across two thin clients (6 streams per Windows 7 client).

Here is some additional data from the dump files:

2016-03-16T22:07:54.568Z cpu0:36485)WARNING: NMI: 911: NMI received; attempting to diagnose...^[[0m
NVRM: GPU at 0000:84:00.0 has fallen off the bus.
NVRM: GPU is on Board 0323015037010.
2016-03-16T22:07:54.568Z cpu0:36485)World: 9729: PRDA 0x418040000000 ss 0x4018 ds 0x4018 es 0x4018 fs 0x0 gs 0x0
2016-03-16T22:07:54.568Z cpu0:36485)World: 9731: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff)
2016-03-16T22:07:54.568Z cpu0:36485)World: 9732: CR0 0x8005003b CR3 0x20feb42000 CR4 0x42660
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
NVRM: GPU at 0000:85:00.0 has fallen off the bus.
NVRM: GPU is on Board 0323015037010.
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.

We initially encountered the purple screen using GRID 2.1 (NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_352.70-1OEM.600.0.0.2494585.vib and 354.56_grid_win8_win7_64bit_international.exe Windows drivers). We then tried using GRID 2.0 Windows drivers (354.13_grid_win8_win7_64bit_international.exe), while using the same VIB and encountered the same purple screen again.

#3
Posted 03/21/2016 07:57 PM   
Encountered another purple screen with 14 streams across 2 thin clients (attaching screen shot). Should we update to GRID 2.2? Thanks, Marty
Encountered another purple screen with 14 streams across 2 thin clients (attaching screen shot). Should we update to GRID 2.2? Thanks, Marty

#4
Posted 03/22/2016 08:02 PM   
As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc. Why did you select the 0Q profile? does the same issue occur with a 1Q or larger profile? What does 14 streams across 2 thin clients mean? 14 video streams being decoded into 2 VM's with 0Q profiles, then each VM being remotely connected to from the thin clients? 14 discrete VM's with 0Q profiles being delivered to 2 thin clients? Something else?
As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc.

Why did you select the 0Q profile? does the same issue occur with a 1Q or larger profile?

What does 14 streams across 2 thin clients mean?

14 video streams being decoded into 2 VM's with 0Q profiles, then each VM being remotely connected to from the thin clients?
14 discrete VM's with 0Q profiles being delivered to 2 thin clients?
Something else?

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#5
Posted 03/24/2016 08:37 AM   
1. As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc. a. I assume you are referring to the ESXi build? If so, ESXi 6.0.0, 3380124. Each VM has 8 GB of memory with 6 vCPUs (1 socket / 6 cores). The M60 has been added to each VM as a shared PCI device via the VMware vSphere 6.0 web client. I'm not sure what you mean by "VDI stack". 2. Why did you select the 0Q profile? Does the same issue occur with a 1Q or larger profile? a. We selected 0Q because it (in addition to 0b), allows the greatest number of vGPUs for the entire board (32). We are trying to determine the maximum number of video streams that can be reached across the maximum number of VMs. No, we have not tried other profiles. 3. What does 14 streams across 2 thin clients mean? a. It means each thin client (running Windows 7 Enterprise), is attached to 7 surveillance cameras that are each streaming their own surveillance video stream, via Genetec Security Center version 5.3.
1. As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc.
a. I assume you are referring to the ESXi build? If so, ESXi 6.0.0, 3380124. Each VM has 8 GB of memory with 6 vCPUs (1 socket / 6 cores). The M60 has been added to each VM as a shared PCI device via the VMware vSphere 6.0 web client. I'm not sure what you mean by "VDI stack".

2. Why did you select the 0Q profile? Does the same issue occur with a 1Q or larger profile?
a. We selected 0Q because it (in addition to 0b), allows the greatest number of vGPUs for the entire board (32). We are trying to determine the maximum number of video streams that can be reached across the maximum number of VMs. No, we have not tried other profiles.

3. What does 14 streams across 2 thin clients mean?
a. It means each thin client (running Windows 7 Enterprise), is attached to 7 surveillance cameras that are each streaming their own surveillance video stream, via Genetec Security Center version 5.3.

#6
Posted 03/24/2016 02:56 PM   
Just checking...any updates available yet to my responses above? Thanks. Marty
Just checking...any updates available yet to my responses above? Thanks. Marty

#7
Posted 04/01/2016 02:34 PM   
MHL, These forums are not staffed full time, and are not a formal route to support so responses take time, for GRID 2.0 you have the support portal that you gained access to when you purchased the licenses. This forum is maintained by engineers, developers and architects across Nvidia that answer questions if and when they have time. IF you have urgent support needs, you should follow the support route. To follow up on your responses. [quote="MHL"]1. As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc. a. I assume you are referring to the ESXi build? If so, ESXi 6.0.0, 3380124. Each VM has 8 GB of memory with 6 vCPUs (1 socket / 6 cores). The M60 has been added to each VM as a shared PCI device via the VMware vSphere 6.0 web client. I'm not sure what you mean by "VDI stack". [/quote] You haven't told us what remoting protocol you are using to access the VM? Horizon VIEW, RDP, RemoteFX or something else? [quote="MHL"] 2. Why did you select the 0Q profile? Does the same issue occur with a 1Q or larger profile? a. We selected 0Q because it (in addition to 0b), allows the greatest number of vGPUs for the entire board (32). We are trying to determine the maximum number of video streams that can be reached across the maximum number of VMs. No, we have not tried other profiles. [/quote] You've selected the smallest profile, with 512MB RAM and are attempting to run 6 streams. It's likely you have inadequate resources allocated into the VM. My first recommendation is 1. Reduce the number of streams 2. Increase the available frame buffer. 512MB is really only aimed at users of core desktop applications such as MS Office, web browsers and the like, it's not intended for users of applications that demand more from the GPU. [quote="MHL"] 3. What does 14 streams across 2 thin clients mean? a. It means each thin client (running Windows 7 Enterprise), is attached to 7 surveillance cameras that are each streaming their own surveillance video stream, via Genetec Security Center version 5.3. [/quote] Looking at the system requirements for the application, they suggest at least a K620 with 2GB graphics memory, which suggests that the application requires more graphics memory / frame buffer than GPU resource. "Minimum of 2 GB of video RAM recommended." I'd suggest, based on this and your experiences above, increasing to the 2Q profile and repeating the test.
MHL,

These forums are not staffed full time, and are not a formal route to support so responses take time, for GRID 2.0 you have the support portal that you gained access to when you purchased the licenses. This forum is maintained by engineers, developers and architects across Nvidia that answer questions if and when they have time. IF you have urgent support needs, you should follow the support route.

To follow up on your responses.

MHL said:1. As per Erik's request, can you describe your build, what VDI stack you're using, VM config etc.
a. I assume you are referring to the ESXi build? If so, ESXi 6.0.0, 3380124. Each VM has 8 GB of memory with 6 vCPUs (1 socket / 6 cores). The M60 has been added to each VM as a shared PCI device via the VMware vSphere 6.0 web client. I'm not sure what you mean by "VDI stack".


You haven't told us what remoting protocol you are using to access the VM? Horizon VIEW, RDP, RemoteFX or something else?


MHL said:
2. Why did you select the 0Q profile? Does the same issue occur with a 1Q or larger profile?
a. We selected 0Q because it (in addition to 0b), allows the greatest number of vGPUs for the entire board (32). We are trying to determine the maximum number of video streams that can be reached across the maximum number of VMs. No, we have not tried other profiles.


You've selected the smallest profile, with 512MB RAM and are attempting to run 6 streams. It's likely you have inadequate resources allocated into the VM. My first recommendation is
1. Reduce the number of streams
2. Increase the available frame buffer.

512MB is really only aimed at users of core desktop applications such as MS Office, web browsers and the like, it's not intended for users of applications that demand more from the GPU.

MHL said:
3. What does 14 streams across 2 thin clients mean?
a. It means each thin client (running Windows 7 Enterprise), is attached to 7 surveillance cameras that are each streaming their own surveillance video stream, via Genetec Security Center version 5.3.


Looking at the system requirements for the application, they suggest at least a K620 with 2GB graphics memory, which suggests that the application requires more graphics memory / frame buffer than GPU resource.

"Minimum of 2 GB of video RAM recommended."

I'd suggest, based on this and your experiences above, increasing to the 2Q profile and repeating the test.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#8
Posted 04/03/2016 10:06 AM   
Thanks. We will go down the support portal route.
Thanks. We will go down the support portal route.

#9
Posted 04/05/2016 04:19 PM   
Did you test with the applications recommended configuration of 2GB Frame Buffer?
Did you test with the applications recommended configuration of 2GB Frame Buffer?

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#10
Posted 04/05/2016 08:09 PM   
Worth checkign the known issues in the KB that could cause PSOD (purple screen of death when these occur: http://nvidia.custhelp.com/app/answers/list/st/5/kw/grid%20psod/page/1
Worth checkign the known issues in the KB that could cause PSOD (purple screen of death when these occur: http://nvidia.custhelp.com/app/answers/list/st/5/kw/grid%20psod/page/1

#11
Posted 05/25/2016 08:20 PM   
Scroll To Top

Add Reply