Artificial Intelligence Computing Leadership from NVIDIA
VMware sanity check
Hello all, I am trying to setup a vGPU cluster using Tesla v100 32GB GPUs on several HP ProLiant DL380 Gen10 running ESXI 6.7u3. So far I have been able to setup the vSAN and install the NVIDIA .vib version 11.2, and can successfully run `nvidia-smi` on each host. I now attempting to configure a VM, adding a single I have tried adding a single `V100D-16C` vGPU to a VM and installing the 450.89 grid driver with and without DKMS, however cannot seem to load the kernel module. Dmesg tell me that the "PCI I/O region assigned to your device is invalid: NVRM ..." and the supposed address of the GPU. I get the same error on Debian Buster, and Ubuntu Server bionic and focal. I have checked the hidden settings in the HP BIOS, and is enabled "PCI Express 64-bit BAR Support". ECC should be working fine on a "C" style VGPU. Is there anything I am missing?
Hello all,

I am trying to setup a vGPU cluster using Tesla v100 32GB GPUs on several HP ProLiant DL380 Gen10 running ESXI 6.7u3. So far I have been able to setup the vSAN and install the NVIDIA .vib version 11.2, and can successfully run `nvidia-smi` on each host.

I now attempting to configure a VM, adding a single I have tried adding a single `V100D-16C` vGPU to a VM and installing the 450.89 grid driver with and without DKMS, however cannot seem to load the kernel module. Dmesg tell me that the "PCI I/O region assigned to your device is invalid: NVRM ..." and the supposed address of the GPU. I get the same error on Debian Buster, and Ubuntu Server bionic and focal.

I have checked the hidden settings in the HP BIOS, and is enabled "PCI Express 64-bit BAR Support". ECC should be working fine on a "C" style VGPU.

Is there anything I am missing?

#1
Posted 11/19/2020 10:04 PM   
Scroll To Top

Add Reply