Artificial Intelligence Computing Leadership from NVIDIA
vGPU of Telsa T4 not seen on ESX 6.7
Hi, On a ESX 6.7, I installed this drivers: NVIDIA-VMware_ESXi_6.7_Host_Driver-440.53-1OEM.670.0.0.8169922.x86_64.vib but I'm not able to have nvidia on my vms, and the commande nvidia-smi vgpu answer is : [root@localhost:/vmfs] nvidia-smi vgpu Not supported devices in vGPU mode However, the nvidia-smi command says: [root@localhost:/vmfs] nvidia-smi Thu Feb 13 09:36:42 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 | | N/A 37C P8 17W / 70W | 92MiB / 15359MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2104612 G Xorg 5MiB | +-----------------------------------------------------------------------------+ Could it be a problem with the driver ? If that's the case, is there another driver I have to use ? Thanx for you help ! John
Hi,

On a ESX 6.7, I installed this drivers:
NVIDIA-VMware_ESXi_6.7_Host_Driver-440.53-1OEM.670.0.0.8169922.x86_64.vib

but I'm not able to have nvidia on my vms,

and the commande nvidia-smi vgpu answer is :
[root@localhost:/vmfs] nvidia-smi vgpu
Not supported devices in vGPU mode

However, the nvidia-smi command says:
[root@localhost:/vmfs] nvidia-smi
Thu Feb 13 09:36:42 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 |
| N/A 37C P8 17W / 70W | 92MiB / 15359MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2104612 G Xorg 5MiB |
+-----------------------------------------------------------------------------+


Could it be a problem with the driver ?
If that's the case, is there another driver I have to use ?

Thanx for you help !
John

#1
Posted 02/13/2020 09:37 AM   
Which Host? vCenter settings checked for shared direct?
Which Host? vCenter settings checked for shared direct?

#2
Posted 02/14/2020 09:51 AM   
The computer is a supermicro. Supermicro has already said the computer is compatible with the Telsa T4. The vm is a windows 10. Thanx
The computer is a supermicro. Supermicro has already said the computer is compatible with the Telsa T4.
The vm is a windows 10.

Thanx

#3
Posted 03/11/2020 09:27 AM   
"vCenter settings checked for shared direct?" ?? I don't understand. And I thought that vcenter isn't working with the 6.7 (?)..
"vCenter settings checked for shared direct?" ?? I don't understand. And I thought that vcenter isn't working with the 6.7 (?)..

#4
Posted 03/11/2020 11:20 AM   
By the way, the result of the nvidia-smi -q command is: ==============NVSMI LOG============== Timestamp : Wed Mar 11 16:21:30 2020 Driver Version : 440.53 CUDA Version : Not Found Attached GPUs : 1 GPU 00000000:5E:00.0 Product Name : Tesla T4 Product Brand : Tesla Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Enabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1322419111424 GPU UUID : GPU-34d6d925-61d7-ca33-9c9b-34420d8614c9 Minor Number : 0 VBIOS Version : 90.04.38.00.03 MultiGPU Board : No Board ID : 0x5e00 GPU Part Number : 900-2G183-0000-001 Inforom Version Image Version : G183.0200.00.02 OEM Object : 1.1 ECC Object : 5.0 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : Host VSGA Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x5E Device : 0x00 Domain : 0x0000 Device Id : 0x1EB810DE Bus Id : 00000000:5E:00.0 Sub System Id : 0x12A210DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 15359 MiB Used : 92 MiB Free : 15267 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending Page Blacklist : No Temperature GPU Current Temp : 47 C GPU Shutdown Temp : 96 C GPU Slowdown Temp : 93 C GPU Max Operating Temp : 85 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 18.36 W Power Limit : 70.00 W Default Power Limit : 70.00 W Enforced Power Limit : 70.00 W Min Power Limit : 60.00 W Max Power Limit : 70.00 W Clocks Graphics : 300 MHz SM : 300 MHz Memory : 405 MHz Video : 540 MHz Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Default Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Max Clocks Graphics : 1590 MHz SM : 1590 MHz Memory : 5001 MHz Video : 1470 MHz Max Customer Boost Clocks Graphics : 1590 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes Process ID : 2100289 Type : G Name : Xorg Used GPU Memory : 5 MiB
By the way, the result of the nvidia-smi -q command is:


==============NVSMI LOG==============

Timestamp : Wed Mar 11 16:21:30 2020
Driver Version : 440.53
CUDA Version : Not Found

Attached GPUs : 1
GPU 00000000:5E:00.0
Product Name : Tesla T4
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1322419111424
GPU UUID : GPU-34d6d925-61d7-ca33-9c9b-34420d8614c9
Minor Number : 0
VBIOS Version : 90.04.38.00.03
MultiGPU Board : No
Board ID : 0x5e00
GPU Part Number : 900-2G183-0000-001
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VSGA
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x5E
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:5E:00.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 15359 MiB
Used : 92 MiB
Free : 15267 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Temperature
GPU Current Temp : 47 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 18.36 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 2100289
Type : G
Name : Xorg
Used GPU Memory : 5 MiB

#5
Posted 03/12/2020 09:25 AM   
You need to change the GPU mode to "Shared Direct" in vCenter. Otherwise it won't work. Please follow our documentation
You need to change the GPU mode to "Shared Direct" in vCenter. Otherwise it won't work. Please follow our documentation

#6
Posted 03/12/2020 10:26 AM   
OK Thanx, but I don't know how to do that (?)
OK Thanx, but I don't know how to do that (?)

#7
Posted 03/12/2020 11:48 AM   
OK. I get it ! (sorry, a matter of RFM..) By the way, is the DirectShared different than the usual passthrough ? Because of what we want to do is vGPU shared instead of simple passing thru.. !! All the Best
OK. I get it ! (sorry, a matter of RFM..)

By the way, is the DirectShared different than the usual passthrough ? Because of what we want to do is vGPU shared instead of simple passing thru.. !!

All the Best

#8
Posted 03/12/2020 01:24 PM   
What we want to do is associate some vm (at least 8) on one server with vGPU T4_2Q profile using Nvidia GRID vDWS type lisense.
What we want to do is associate some vm (at least 8) on one server with vGPU T4_2Q profile using Nvidia GRID vDWS type lisense.

#9
Posted 03/12/2020 02:11 PM   
Apparently the virtual mod isn't active !! How can I do it ?
Apparently the virtual mod isn't active !!
How can I do it ?

#10
Posted 03/12/2020 04:01 PM   
Shared direct is the vGPU mode. You should now be able to add a vGPU profile to a VM. Be aware that you must not add Passthrough devices!!!
Shared direct is the vGPU mode. You should now be able to add a vGPU profile to a VM. Be aware that you must not add Passthrough devices!!!

#11
Posted 03/13/2020 11:52 AM   
The point is that doing that (deactivate the relay), when I create a vm, I do not see any vGPU, or I don't know how to add a vGPU profile ! In other hand, when I create a new VM, what should I do to add vGPU ?
The point is that doing that (deactivate the relay), when I create a vm, I do not see any vGPU, or I don't know how to add a vGPU profile !
In other hand, when I create a new VM, what should I do to add vGPU ?

#12
Posted 03/13/2020 01:58 PM   
Hi, just follow our documentation: https://docs.nvidia.com/grid/10.0/grid-software-quick-start-guide/index.html#attaching-grid-vgpu-profile-vm You need to make sure that you have an Enterprise Plus license in place to add vGPU profiles. Regards Simon
Hi,
just follow our documentation: https://docs.nvidia.com/grid/10.0/grid-software-quick-start-guide/index.html#attaching-grid-vgpu-profile-vm
You need to make sure that you have an Enterprise Plus license in place to add vGPU profiles.

Regards
Simon

#13
Posted 03/14/2020 10:23 AM   
The point is that I do not see any shared PCI device option, never ! Do I have to activate direct shared ? And how if it's the case ? (I didn't find the way to do so with the vsphere web UI in the host options ...)
The point is that I do not see any shared PCI device option, never !
Do I have to activate direct shared ? And how if it's the case ? (I didn't find the way to do so with the vsphere web UI in the host options ...)

#14
Posted 03/14/2020 02:01 PM   
By the way, thanx Simon ! And the nvidia-smi vgpu on the hypervisor says: [root@localhost:~] nvidia-smi vgpu Not supported devices in vGPU mode What is wrong ?!
By the way, thanx Simon !

And the nvidia-smi vgpu on the hypervisor says:

[root@localhost:~] nvidia-smi vgpu
Not supported devices in vGPU mode

What is wrong ?!

#15
Posted 03/14/2020 03:35 PM   
Scroll To Top

Add Reply