NVIDIA Logo - Artificial Intelligence Computing Leadership from NVIDIA
vSphere 6.7, Linux Guest, V100 vGPU and memory used problem
[b]Overview of my system and envirotment:[/b] Ubuntu 18.10 VM running on VMware VSphere 6.7 hypervisor. The host provides one GPU, Nvidia Tesla V100-PCIE-32GB (We do not use PCI passthrough but we use Nvidia vGPU technology). The Ubuntu VM is configured with one "NVIDIA GRID vGPU" device with grid_v100d-16q profile. Also at Ubuntu VM fresh boot, 1GB (of 16GB) of GPU Memory is always used (in FB). I cannot find a reason (and I can not see running processes). I have read about Processes that may be not listed but I'm very new to "Nvidia vGPU" tecnology and I fear it may be related to tricky misconfiguration. I like to execute GPU accelerated CUDA processes on this Ubuntu VM, I suppose that 'used' memory is not memory available for computations... for this reason I like to reduce FB used space. [b]Question:[/b] Why is 1GB used with no reason? Where should I look for proofs? [b]There are sub-questions:[/b] May this memory usage be related to wrong Xorg configurations? Can be a problem with the selected vGPU profile (16q)? Regards [b]Other details about my current host/VM configuration:[/b] vSphere and Ubuntu Linux Nvidia drivers come from NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31.zip provided package. Installed Linux driver (filename): NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31/NVIDIA-Linux-x86_64-418.70-grid.run Rebooting does not solve the issue, 1GB is always used. nvidia-smi output for the vSphere host: [code]: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.66 Driver Version: 418.66 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off | | N/A 34C P0 26W / 250W | 16402MiB / 32767MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2338177 C+G XXX-ubuntu-XXX 16352MiB | +-----------------------------------------------------------------------------+[/code] nvidia-smi output for Guest Ubuntu VM: [code]+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.70 Driver Version: 418.70 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID V100D-16Q On | 00000000:02:02.0 Off | N/A | | N/A N/A P0 N/A / N/A | 1040MiB / 16384MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ [/code] The chosen profile for the "NVIDIA GRID vGPU" device is grid_v100d-16q. This is the configuration in the gridd.conf file of Ubuntu VM (interesting parts): [code]# Description: Set Feature to be enabled # Data type: integer # Possible values: # 0 => for unlicensed state # 1 => for GRID vGPU # 2 => for Quadro Virtual Datacenter Workstation # 4 => for NVIDIA vComputeServer # All other values reserved FeatureType=4 # Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings # Data type: boolean # Possible values: TRUE or FALSE, default is FALSE #EnableUI=TRUE # Description: Set license borrow period in minutes # Data type: integer # Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day) #LicenseInterval=1440 # Description: Set license linger period in minutes # Data type: integer # Possible values: 0 to 10080 mins(7 days), default is 0 mins #LingerInterval=10[/code] I have rebooted the VM lots of times, and cannot found processes that consumes GPU. For your convenience the full nvidia-smi -q output: [code]==============NVSMI LOG============== Timestamp : Wed Jul 3 12:26:42 2019 Driver Version : 418.70 CUDA Version : 10.1 Attached GPUs : 1 GPU 00000000:02:02.0 Product Name : GRID V100D-16Q Product Brand : Grid Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-0edbdf6b-28c2-11b2-812a-b44a2987914b Minor Number : 0 VBIOS Version : 00.00.00.00.00 MultiGPU Board : No Board ID : 0x202 GPU Part Number : N/A Inforom Version Image Version : N/A OEM Object : N/A ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization mode : VGPU GRID Licensed Product Product Name : Quadro Virtual Data Center Workstation License Status : Licensed IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x02 Device : 0x02 Domain : 0x0000 Device Id : 0x1DB610DE Bus Id : 00000000:02:02.0 Sub System Id : 0x12C310DE GPU Link Info PCIe Generation Max : N/A Current : N/A Link Width Max : N/A Current : N/A Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : N/A Replay Number Rollovers : N/A Tx Throughput : N/A Rx Throughput : N/A Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons : N/A FB Memory Usage Total : 16384 MiB Used : 1040 MiB Free : 15344 MiB BAR1 Memory Usage Total : 256 MiB Used : 0 MiB Free : 256 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending : N/A Temperature GPU Current Temp : N/A GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A GPU Max Operating Temp : N/A Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : N/A Power Draw : N/A Power Limit : N/A Default Power Limit : N/A Enforced Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 135 MHz SM : 135 MHz Memory : 877 MHz Video : 555 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : N/A SM : N/A Memory : N/A Video : N/A Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None [/code]
Overview of my system and envirotment:

Ubuntu 18.10 VM running on VMware VSphere 6.7 hypervisor. The host provides one GPU, Nvidia Tesla V100-PCIE-32GB (We do not use PCI passthrough but we use Nvidia vGPU technology). The Ubuntu VM is configured with one "NVIDIA GRID vGPU" device with grid_v100d-16q profile.

Also at Ubuntu VM fresh boot, 1GB (of 16GB) of GPU Memory is always used (in FB). I cannot find a reason (and I can not see running processes). I have read about Processes that may be not listed but I'm very new to "Nvidia vGPU" tecnology and I fear it may be related to tricky misconfiguration.

I like to execute GPU accelerated CUDA processes on this Ubuntu VM, I suppose that 'used' memory is not memory available for computations... for this reason I like to reduce FB used space.

Question:

Why is 1GB used with no reason? Where should I look for proofs?

There are sub-questions:

May this memory usage be related to wrong Xorg configurations?
Can be a problem with the selected vGPU profile (16q)?

Regards

Other details about my current host/VM configuration:

vSphere and Ubuntu Linux Nvidia drivers come from NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31.zip provided package.

Installed Linux driver (filename): NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31/NVIDIA-Linux-x86_64-418.70-grid.run

Rebooting does not solve the issue, 1GB is always used.

nvidia-smi output for the vSphere host:

:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.66 Driver Version: 418.66 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off |
| N/A 34C P0 26W / 250W | 16402MiB / 32767MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2338177 C+G XXX-ubuntu-XXX 16352MiB |
+-----------------------------------------------------------------------------+



nvidia-smi output for Guest Ubuntu VM:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.70 Driver Version: 418.70 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID V100D-16Q On | 00000000:02:02.0 Off | N/A |
| N/A N/A P0 N/A / N/A | 1040MiB / 16384MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+


The chosen profile for the "NVIDIA GRID vGPU" device is grid_v100d-16q.

This is the configuration in the gridd.conf file of Ubuntu VM (interesting parts):

# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
# 0 => for unlicensed state
# 1 => for GRID vGPU
# 2 => for Quadro Virtual Datacenter Workstation
# 4 => for NVIDIA vComputeServer
# All other values reserved
FeatureType=4

# Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings
# Data type: boolean
# Possible values: TRUE or FALSE, default is FALSE
#EnableUI=TRUE

# Description: Set license borrow period in minutes
# Data type: integer
# Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day)
#LicenseInterval=1440

# Description: Set license linger period in minutes
# Data type: integer
# Possible values: 0 to 10080 mins(7 days), default is 0 mins
#LingerInterval=10



I have rebooted the VM lots of times, and cannot found processes that consumes GPU.

For your convenience the full nvidia-smi -q output:

==============NVSMI LOG==============

Timestamp : Wed Jul 3 12:26:42 2019
Driver Version : 418.70
CUDA Version : 10.1

Attached GPUs : 1
GPU 00000000:02:02.0
Product Name : GRID V100D-16Q
Product Brand : Grid
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-0edbdf6b-28c2-11b2-812a-b44a2987914b
Minor Number : 0
VBIOS Version : 00.00.00.00.00
MultiGPU Board : No
Board ID : 0x202
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : VGPU
GRID Licensed Product
Product Name : Quadro Virtual Data Center Workstation
License Status : Licensed
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x02
Device : 0x02
Domain : 0x0000
Device Id : 0x1DB610DE
Bus Id : 00000000:02:02.0
Sub System Id : 0x12C310DE
GPU Link Info
PCIe Generation
Max : N/A
Current : N/A
Link Width
Max : N/A
Current : N/A
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : N/A
Replay Number Rollovers : N/A
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons : N/A
FB Memory Usage
Total : 16384 MiB
Used : 1040 MiB
Free : 15344 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 0 MiB
Free : 256 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : N/A
GPU Shutdown Temp : N/A
GPU Slowdown Temp : N/A
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Video : N/A
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

#1
Posted 07/04/2019 09:31 AM   
Why do you think this is an issue? Operating system itself needs FB to run properly... I don't see any kind of issue from your description so would like to better understand. I never tested 18.10 yet as this is not supported at all but I don't think this is related to 18.10. [Edit] Quick test with 18.04 shows the same result. 1050MB reserved... regards Simon
Why do you think this is an issue?
Operating system itself needs FB to run properly...
I don't see any kind of issue from your description so would like to better understand. I never tested 18.10 yet as this is not supported at all but I don't think this is related to 18.10.

[Edit] Quick test with 18.04 shows the same result. 1050MB reserved...

regards
Simon

#2
Posted 07/04/2019 10:22 AM   
Thanks, You are right, the use case is important. In this Ubuntu VM I like to execute CUDA accelerated processes. I thought that 1GB used in FB, means 1GB less for my GPU accelerated processes. Can CUDA accelerated processes take advantage also of this 'used' 1GB space? Also, does 1GB framebuffer look quite high? Where, in Ubuntu Linux guest configuration can I look in order to reduce the FB usage? Regards, [Edit] Interesting indeed, that you see the same reserved memory with Ubuntu 18.04
Thanks,
You are right, the use case is important. In this Ubuntu VM I like to execute CUDA accelerated processes.

I thought that 1GB used in FB, means 1GB less for my GPU accelerated processes. Can CUDA accelerated processes take advantage also of this 'used' 1GB space?

Also, does 1GB framebuffer look quite high?
Where, in Ubuntu Linux guest configuration can I look in order to reduce the FB usage?
Regards,

[Edit] Interesting indeed, that you see the same reserved memory with Ubuntu 18.04

#3
Posted 07/04/2019 10:36 AM   
Scroll To Top

Add Reply