Artificial Intelligence Computing Leadership from NVIDIA
Is it possible to present multiple vGPU's to a single VM from a Tesla T4 card on ESXi 6.7?
Hi guy's anyone know why I cant present more than one vGPU to my VM Server 2019 (VM hardware version 15)? I can only load 1 and the VM will start fine, if I load a second vGPU the VM fails to start with error: Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_t4-4c'. Spec background: 1 X Tesla T4 16GB card in Dell vXrail V570, vCenter 6.7 Enterprise plus license running single 64GB Mem VM (server 2019). The GRID driver installed successfully. Host Graphics device is using Shared Direct Vendor shared passthrough graphics. Host ECC has been disabled. The VM has these two settings on/off makes no difference pciPassthru.use64bitMMIO=TRUE pciPassthru.64bitMMIOSizeGB=64 Below is the output of a few of our favourite commands. [root@vxesxi5:~] nvidia-smi -i 00000000:3B:00.0 -e 0 ECC support is already Disabled for GPU 00000000:3B:00.0. All done. [root@vxesxi5:~] nvidia-smi Wed Jul 8 09:56:45 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.87 Driver Version: 440.87 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:3B:00.0 Off | Off | | N/A 38C P8 17W / 70W | 79MiB / 16383MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ [root@vxesxi5:~] esxcli software vib list | grep -i nvidia NVIDIA-VMware_ESXi_6.7_Host_Driver 440.87-1OEM.670.0.0.8169922 NVIDIA VMwareAccepted 2020-07-08 [root@vxesxi5:~] lspci -n | grep 10de 0000:3b:00.0 Class 0302: 10de:1eb8 [vmgfx0] Any help appreciated, this is doing my head in!! I understood this card could present up to 4 vGPU's to a single VM.
Hi guy's anyone know why I cant present more than one vGPU to my VM Server 2019 (VM hardware version 15)?

I can only load 1 and the VM will start fine, if I load a second vGPU the VM fails to start with error: Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_t4-4c'.

Spec background: 1 X Tesla T4 16GB card in Dell vXrail V570, vCenter 6.7 Enterprise plus license running single 64GB Mem VM (server 2019). The GRID driver installed successfully. Host Graphics device is using Shared Direct Vendor shared passthrough graphics.

Host ECC has been disabled. The VM has these two settings on/off makes no difference pciPassthru.use64bitMMIO=TRUE
pciPassthru.64bitMMIOSizeGB=64

Below is the output of a few of our favourite commands.


[root@vxesxi5:~] nvidia-smi -i 00000000:3B:00.0 -e 0
ECC support is already Disabled for GPU 00000000:3B:00.0.
All done.
[root@vxesxi5:~] nvidia-smi
Wed Jul 8 09:56:45 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.87 Driver Version: 440.87 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:3B:00.0 Off | Off |
| N/A 38C P8 17W / 70W | 79MiB / 16383MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[root@vxesxi5:~] esxcli software vib list | grep -i nvidia
NVIDIA-VMware_ESXi_6.7_Host_Driver 440.87-1OEM.670.0.0.8169922 NVIDIA VMwareAccepted 2020-07-08
[root@vxesxi5:~] lspci -n | grep 10de
0000:3b:00.0 Class 0302: 10de:1eb8 [vmgfx0]

Any help appreciated, this is doing my head in!! I understood this card could present up to 4 vGPU's to a single VM.

#1
Posted 07/08/2020 11:33 AM   
[root@vxesxi5:~] nvidia-smi -q ==============NVSMI LOG============== Timestamp : Wed Jul 8 11:36:19 2020 Driver Version : 440.87 CUDA Version : Not Found Attached GPUs : 1 GPU 00000000:3B:00.0 Product Name : Tesla T4 Product Brand : Tesla Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Enabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1561120009254 GPU UUID : GPU-166df7a5-7a83-f1ac-bc58-313305b331d5 Minor Number : 0 VBIOS Version : 90.04.38.00.03 MultiGPU Board : No Board ID : 0x3b00 GPU Part Number : 900-2G183-0100-001 Inforom Version Image Version : G183.0200.00.02 OEM Object : 1.1 ECC Object : 5.0 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : Host VGPU Host VGPU Mode : Non SR-IOV IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x3B Device : 0x00 Domain : 0x0000 Device Id : 0x1EB810DE Bus Id : 00000000:3B:00.0 Sub System Id : 0x12A210DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 16383 MiB Used : 86 MiB Free : 16297 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Disabled Pending : Disabled ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending Page Blacklist : No Temperature GPU Current Temp : 38 C GPU Shutdown Temp : 96 C GPU Slowdown Temp : 93 C GPU Max Operating Temp : 85 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 17.26 W Power Limit : 70.00 W Default Power Limit : 70.00 W Enforced Power Limit : 70.00 W Min Power Limit : 60.00 W Max Power Limit : 70.00 W Clocks Graphics : 300 MHz SM : 300 MHz Memory : 405 MHz Video : 540 MHz Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Default Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Max Clocks Graphics : 1590 MHz SM : 1590 MHz Memory : 5001 MHz Video : 1470 MHz Max Customer Boost Clocks Graphics : 1590 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
[root@vxesxi5:~] nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Wed Jul 8 11:36:19 2020
Driver Version : 440.87
CUDA Version : Not Found

Attached GPUs : 1
GPU 00000000:3B:00.0
Product Name : Tesla T4
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561120009254
GPU UUID : GPU-166df7a5-7a83-f1ac-bc58-313305b331d5
Minor Number : 0
VBIOS Version : 90.04.38.00.03
MultiGPU Board : No
Board ID : 0x3b00
GPU Part Number : 900-2G183-0100-001
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x3B
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:3B:00.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 16383 MiB
Used : 86 MiB
Free : 16297 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Temperature
GPU Current Temp : 38 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 17.26 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

#2
Posted 07/08/2020 11:36 AM   
Hi SSD You need to have multiple (physical) GPUs to be able to use Multi-vGPU. So in your case, you'd need more than 1 T4 in your Server. You could then allocate 2, 3, 4 etc ... T4-16* Profiles to a single VM. Even if it were possible, adding more than 1 vGPU from the same physical GPU wouldn't do anything, as it's the same physical GPU. Make sure you're not confusing having Multiple GPUs on a single VM, vs Multiple VMs on a single GPU ... Regards MG
Hi SSD

You need to have multiple (physical) GPUs to be able to use Multi-vGPU. So in your case, you'd need more than 1 T4 in your Server. You could then allocate 2, 3, 4 etc ... T4-16* Profiles to a single VM.

Even if it were possible, adding more than 1 vGPU from the same physical GPU wouldn't do anything, as it's the same physical GPU.

Make sure you're not confusing having Multiple GPUs on a single VM, vs Multiple VMs on a single GPU ...

Regards

MG

#3
Posted 07/08/2020 11:52 AM   
Ah ok, I'm used to the GRID K1 cards having multiple GPUs. So this solution we have bought cant present multiple vGPU's to a single VM then, so this means we either buy more T4 cards to split up the processing or try to present other VMs to share the GPU, but as you say they will fight for contention/resources if both VM's are hitting it at the same time correcct? So sounds like its not the preferred GPU card for this solution then, can you recommend card that would do what I'm asking of it?
Ah ok, I'm used to the GRID K1 cards having multiple GPUs.

So this solution we have bought cant present multiple vGPU's to a single VM then, so this means we either buy more T4 cards to split up the processing or try to present other VMs to share the GPU, but as you say they will fight for contention/resources if both VM's are hitting it at the same time correcct?

So sounds like its not the preferred GPU card for this solution then, can you recommend card that would do what I'm asking of it?

#4
Posted 07/08/2020 12:22 PM   
Hi MG, so the solution for us is to replace the single Tesla T4 card with 2 X RTX-5000 cards in the Host and simply do a direct pasthrough assignment presenting them to the 1 VM. Thanks for your assistance to date. Regards, SSD
Hi MG, so the solution for us is to replace the single Tesla T4 card with 2 X RTX-5000 cards in the Host and simply do a direct pasthrough assignment presenting them to the 1 VM.

Thanks for your assistance to date.

Regards,
SSD

#5
Posted 07/09/2020 06:11 AM   
Scroll To Top

Add Reply