NVIDIA
GRID K1 on Dell R720 not detected in BIOS nor VMware, broken?
New graphics card GRID K1 not detected by BIOS when installed. Broken? The setting to disable Embeded Video Controller is grayed out, indicating no extra GPU is detected. In VMware no card is found, drivers will not load due to missing NIVIDA card. Risercard slot is tested with another simple graphics card, works perfect. GPU Installation kit power cable is connected from risercard socket to GPU. Memory Mapped I/O above 4GB is Disabled. No errors in iDRAC log. No errors during POST. Dell support indicates that redundant 1100 W power supplies is needed. Is that really necessary to detect the card? We have two 900 W now. Current powerusage 170W. Info: Dell PowerEdge R720 CPU 2 x Intel Xeon E5-2680 v2 BIOS version 2.5.2 Firmware 2.20.20.20 Next step is to test the K1 in another R720 server. No high hopes for that to work... What could be wrong except the graphics card? Please help.
New graphics card GRID K1 not detected by BIOS when installed. Broken?

The setting to disable Embeded Video Controller is grayed out, indicating no extra GPU is detected.
In VMware no card is found, drivers will not load due to missing NIVIDA card.

Risercard slot is tested with another simple graphics card, works perfect.

GPU Installation kit power cable is connected from risercard socket to GPU.
Memory Mapped I/O above 4GB is Disabled.
No errors in iDRAC log. No errors during POST.


Dell support indicates that redundant 1100 W power supplies is needed. Is that really necessary to detect the card? We have two 900 W now.
Current powerusage 170W.

Info:
Dell PowerEdge R720
CPU 2 x Intel Xeon E5-2680 v2

BIOS version 2.5.2
Firmware 2.20.20.20

Next step is to test the K1 in another R720 server. No high hopes for that to work...

What could be wrong except the graphics card? Please help.

#1
Posted 11/11/2015 12:33 PM   
The R720 does require the dual 1100W PSU's for certification and support. You'll need to resolve that before you can get formal support either from Dell or Nvidia. Where in "VMware" are you looking? It should appear as a resource to be confgured for PCI Passthrough, unless you've installed the vGPU .vib. If you've installed the vGPU .vib, have you followed the troubleshooting instructions in the documentation, checking that services are started and determining whether nvidia-smi provides any output.
The R720 does require the dual 1100W PSU's for certification and support. You'll need to resolve that before you can get formal support either from Dell or Nvidia.


Where in "VMware" are you looking? It should appear as a resource to be confgured for PCI Passthrough, unless you've installed the vGPU .vib.

If you've installed the vGPU .vib, have you followed the troubleshooting instructions in the documentation, checking that services are started and determining whether nvidia-smi provides any output.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#2
Posted 11/11/2015 02:06 PM   
So, the redundant 1100 W is mostly for making sure all possible configurations will work. I understand and will try to get that solved. In wmware the command esxcli hardware pci list -c 0x0300 -m 0xff should return a NVIDIA graphics card. It does not. If the hardware can't be recognized by that command no driver module will load and no resource will be available. nvidia-smi and gpuvm returns nothing. esxcli system module load –m nvidia returns error and the vmkernellog is quite clear that there is no adapter: 2015-11-11T16:23:27.428Z cpu9:106029)Loading module nvidia ... 2015-11-11T16:23:27.433Z cpu9:106029)Elf: 1861: module nvidia has license NVIDIA 2015-11-11T16:23:27.544Z cpu9:106029)module heap: Initial heap size: 8388608, max heap size: 68476928 2015-11-11T16:23:27.544Z cpu9:106029)vmklnx_module_mempool_init: Mempool max 68476928 being used for module: 4205 2015-11-11T16:23:27.544Z cpu9:106029)vmk_MemPoolCreate passed for 2048 pages 2015-11-11T16:23:27.544Z cpu9:106029)module heap: using memType 2 2015-11-11T16:23:27.544Z cpu9:106029)module heap vmklnx_nvidia: creation succeeded. id = 0x411332245000 NVRM: vmk_MemPoolCreate passed for 4194304 pages. NVRM: No NVIDIA graphics adapter found! The error is clearly hardware and errors in vmware is all based on the fact that there is no card detected. Is there any way to examine what is actually present in a PCI-slot? And get status? I'm not experienced in that low level hardware access. I'm sure there is someone out there just waiting to show off mad PCI skills. ;-)
So, the redundant 1100 W is mostly for making sure all possible configurations will work. I understand and will try to get that solved.

In wmware the command
esxcli hardware pci list -c 0x0300 -m 0xff
should return a NVIDIA graphics card. It does not.
If the hardware can't be recognized by that command no driver module will load and no resource will be available.

nvidia-smi and gpuvm returns nothing.

esxcli system module load –m nvidia
returns error and the vmkernellog is quite clear that there is no adapter:

2015-11-11T16:23:27.428Z cpu9:106029)Loading module nvidia ...
2015-11-11T16:23:27.433Z cpu9:106029)Elf: 1861: module nvidia has license NVIDIA
2015-11-11T16:23:27.544Z cpu9:106029)module heap: Initial heap size: 8388608, max heap size: 68476928
2015-11-11T16:23:27.544Z cpu9:106029)vmklnx_module_mempool_init: Mempool max 68476928 being used for module: 4205
2015-11-11T16:23:27.544Z cpu9:106029)vmk_MemPoolCreate passed for 2048 pages
2015-11-11T16:23:27.544Z cpu9:106029)module heap: using memType 2
2015-11-11T16:23:27.544Z cpu9:106029)module heap vmklnx_nvidia: creation succeeded. id = 0x411332245000
NVRM: vmk_MemPoolCreate passed for 4194304 pages.
NVRM: No NVIDIA graphics adapter found!

The error is clearly hardware and errors in vmware is all based on the fact that there is no card detected.

Is there any way to examine what is actually present in a PCI-slot? And get status?
I'm not experienced in that low level hardware access. I'm sure there is someone out there just waiting to show off mad PCI skills. ;-)

#3
Posted 11/11/2015 03:38 PM   
Card tested on other Dell PowerEdge R720 server with dual 750 W PSU. Same result. New 1100 W PSU:s requested from supplier.
Card tested on other Dell PowerEdge R720 server with dual 750 W PSU. Same result.
New 1100 W PSU:s requested from supplier.

#4
Posted 11/12/2015 09:09 AM   
Hi All. We are experiencing the exact same problem and wondering if replacing/adding additional power supply was the solution for you guys? We are too 5.5U3 and Grid K1 card, attempting to virtualize GPU. Please suggest. Many thanks.
Hi All. We are experiencing the exact same problem and wondering if replacing/adding additional power supply was the solution for you guys? We are too 5.5U3 and Grid K1 card, attempting to virtualize GPU. Please suggest. Many thanks.

#5
Posted 03/09/2017 03:01 AM   
Regardless of the above or what anyone else decides to do, if you're just getting in to GPU virtualisation, it's best to get the basics in place, not cut corners and do it correctly from the outset. For a Dell R720 with GPU, you should: Install 2x 1100w PSUs. Install the GPU Enablement Kit (to allow sufficient airflow / cooling). Update the R720 firmware to latest version. Update the R720 BIOS to latest version. Configure BIOS for Maximum Performance. Configure MMIO accordingly. If it doesn't work after that, come back to us Regards
Regardless of the above or what anyone else decides to do, if you're just getting in to GPU virtualisation, it's best to get the basics in place, not cut corners and do it correctly from the outset.

For a Dell R720 with GPU, you should:

Install 2x 1100w PSUs.
Install the GPU Enablement Kit (to allow sufficient airflow / cooling).
Update the R720 firmware to latest version.
Update the R720 BIOS to latest version.
Configure BIOS for Maximum Performance.
Configure MMIO accordingly.

If it doesn't work after that, come back to us

Regards

#6
Posted 04/04/2017 01:44 PM   
Scroll To Top

Add Reply