NVIDIA
P40 and gpumodeswitch
The documentation seems to indicate that the P40 does not require a gpumodeswitch. However, after installing the NVIDIA GRID VIB, I see the following (dmesg): 2017-11-07T00:15:17.689Z cpu15:69668)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.73 Mon Aug 21 15:16:25 PDT 2017 2017-11-07T00:15:17.689Z cpu15:69668) 2017-11-07T00:15:17.689Z cpu15:69668)Device: 191: Registered driver 'nvidia' from 91 2017-11-07T00:15:17.690Z cpu15:69668)Mod: 4968: Initialization of nvidia succeeded with module ID 91. 2017-11-07T00:15:17.690Z cpu15:69668)nvidia loaded successfully. 2017-11-07T00:15:17.691Z cpu13:66219)IOMMU: 2176: Device 0000:3b:00.0 placed in new domain 0x4304cc3e8af0. 2017-11-07T00:15:17.691Z cpu13:66219)DMA: 945: Protecting DMA engine 'NVIDIADmaEngine'. Putting parent PCI device 0000:3b:00.0 in IOMMU domain 0x4304cc3e8af0. 2017-11-07T00:15:17.691Z cpu13:66219)DMA: 646: DMA Engine 'NVIDIADmaEngine' created using mapper 'DMAIOMMU'. 2017-11-07T00:15:17.691Z cpu13:66219)NVRM: This is a 64-bit BAR mapped above 16 TB by the system NVRM: BIOS or the VMware ESXi kernel. This PCI I/O region assigned NVRM: to your NVIDIA device is not supported by the kernel. NVRM: BAR1 is 32768M @ 0x3820$ This is with vSphere 6.5 Enterprise Plus. I am unable to install the gpumodeswitch VIB to even try it out... Am I missing a step on the install?
The documentation seems to indicate that the P40 does not require a gpumodeswitch. However, after installing the NVIDIA GRID VIB, I see the following (dmesg):

2017-11-07T00:15:17.689Z cpu15:69668)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.73 Mon Aug 21 15:16:25 PDT 2017
2017-11-07T00:15:17.689Z cpu15:69668)
2017-11-07T00:15:17.689Z cpu15:69668)Device: 191: Registered driver 'nvidia' from 91
2017-11-07T00:15:17.690Z cpu15:69668)Mod: 4968: Initialization of nvidia succeeded with module ID 91.
2017-11-07T00:15:17.690Z cpu15:69668)nvidia loaded successfully.
2017-11-07T00:15:17.691Z cpu13:66219)IOMMU: 2176: Device 0000:3b:00.0 placed in new domain 0x4304cc3e8af0.
2017-11-07T00:15:17.691Z cpu13:66219)DMA: 945: Protecting DMA engine 'NVIDIADmaEngine'. Putting parent PCI device 0000:3b:00.0 in IOMMU domain 0x4304cc3e8af0.
2017-11-07T00:15:17.691Z cpu13:66219)DMA: 646: DMA Engine 'NVIDIADmaEngine' created using mapper 'DMAIOMMU'.
2017-11-07T00:15:17.691Z cpu13:66219)NVRM: This is a 64-bit BAR mapped above 16 TB by the system
NVRM: BIOS or the VMware ESXi kernel. This PCI I/O region assigned
NVRM: to your NVIDIA device is not supported by the kernel.
NVRM: BAR1 is 32768M @ 0x3820$

This is with vSphere 6.5 Enterprise Plus. I am unable to install the gpumodeswitch VIB to even try it out...

Am I missing a step on the install?

#1
Posted 11/07/2017 01:23 AM   
Check your BIOS settings. You need to modify the IOMMU settings to support the big BAR size of 8GB... Which hardware do you use? Check with the OEM.
Check your BIOS settings. You need to modify the IOMMU settings to support the big BAR size of 8GB...

Which hardware do you use? Check with the OEM.

#2
Posted 11/07/2017 07:57 PM   
Thank you for pointing me in the right direction. On a Dell R740xd server, I had to do the following in the BIOS -> Integrated Devices section: SR-IOV Global Enable -> Enabled (Default was Disabled) Memory Mapped I/O Base -> 512 GB (Default was 56 TB) I appear to be on my way now, as nvidia-smi is returning values now.
Thank you for pointing me in the right direction. On a Dell R740xd server, I had to do the following in the BIOS -> Integrated Devices section:

SR-IOV Global Enable -> Enabled (Default was Disabled)
Memory Mapped I/O Base -> 512 GB (Default was 56 TB)

I appear to be on my way now, as nvidia-smi is returning values now.

#3
Posted 11/08/2017 06:53 AM   
Perfect. Thanks for sharing. I'm sure other customers will run into the same issue with this new hardware. I will try to get in contact with DELL to have this a default config for GPU enabled systems. Regards Simon
Perfect. Thanks for sharing. I'm sure other customers will run into the same issue with this new hardware. I will try to get in contact with DELL to have this a default config for GPU enabled systems.

Regards

Simon

#4
Posted 11/08/2017 01:00 PM   
Thank you for this I had exactly the same problem with a Dell R740 and NVIDIA Telsa P40.
Thank you for this I had exactly the same problem with a Dell R740 and NVIDIA Telsa P40.

#5
Posted 05/23/2018 09:18 AM   
Thank you for sharing. I had the same problem and solved it with the above bios changes. After changing: SR-IOV Global Enable to Enabled Memory Mapped I/O Base to 512 GB The server recognizes the GPU card and the nvidia-smi command works fine. But... When trying to power on a vm with shared pci device, I get the following error: could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vgpu 'grid_p40-2q' Has anyone of you encountered this issue?
Thank you for sharing. I had the same problem and solved it with the above bios changes.
After changing:
SR-IOV Global Enable to Enabled
Memory Mapped I/O Base to 512 GB
The server recognizes the GPU card and the nvidia-smi command works fine.
But... When trying to power on a vm with shared pci device, I get the following error:
could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vgpu 'grid_p40-2q'

Has anyone of you encountered this issue?

#6
Posted 07/26/2018 12:11 PM   
You need to disable ECC on the P40 first
You need to disable ECC on the P40 first

#7
Posted 07/26/2018 12:39 PM   
Yeh... you're right, got it right now. Trying... Thank you
Yeh... you're right, got it right now. Trying...
Thank you

#8
Posted 07/26/2018 12:45 PM   
Scroll To Top

Add Reply