Artificial Intelligence Computing Leadership from NVIDIA
vGPU of Telsa T4 not seen on ESX 6.7
Please run nvidia-smi without vgpu command and post the output.
Please run nvidia-smi without vgpu command and post the output.

#16
Posted 03/15/2020 12:36 PM   
here you are.. [root@localhost:~] nvidia-smi Mon Mar 16 03:25:04 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 | | N/A 38C P8 17W / 70W | 92MiB / 15359MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2100302 G Xorg 5MiB | +-----------------------------------------------------------------------------+
here you are..

[root@localhost:~] nvidia-smi
Mon Mar 16 03:25:04 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 |
| N/A 38C P8 17W / 70W | 92MiB / 15359MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2100302 G Xorg 5MiB |
+-----------------------------------------------------------------------------+

#17
Posted 03/16/2020 03:23 AM   
One other point: Due to pandemic situation in France we experiment som difficulties to comunicate and then obtain some information concerning that problem, wich is starting to be urgent (!). Also I should notice that I tried the 440.53 version of the driver, and also the 430.83 (which is more recent than the 440 (according to NVIDIA's site), with exactly the same results ! the result of the nvidia-smi -a command is: (which says that the cGPU mode is VSGA wich is not what we want (!)) ==============NVSMI LOG============== Timestamp : Mon Mar 16 17:24:11 2020 Driver Version : 430.83 CUDA Version : Not Found Attached GPUs : 1 GPU 00000000:5E:00.0 Product Name : Tesla T4 Product Brand : Tesla Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Enabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1322419111424 GPU UUID : GPU-34d6d925-61d7-ca33-9c9b-34420d8614c9 Minor Number : 0 VBIOS Version : 90.04.38.00.03 MultiGPU Board : No Board ID : 0x5e00 GPU Part Number : 900-2G183-0000-001 Inforom Version Image Version : G183.0200.00.02 OEM Object : 1.1 ECC Object : 5.0 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : Host VSGA Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x5E Device : 0x00 Domain : 0x0000 Device Id : 0x1EB810DE Bus Id : 00000000:5E:00.0 Sub System Id : 0x12A210DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 15359 MiB Used : 92 MiB Free : 15267 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending Page Blacklist : No Temperature GPU Current Temp : 39 C GPU Shutdown Temp : 96 C GPU Slowdown Temp : 93 C GPU Max Operating Temp : 85 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 17.41 W Power Limit : 70.00 W Default Power Limit : 70.00 W Enforced Power Limit : 70.00 W Min Power Limit : 60.00 W Max Power Limit : 70.00 W Clocks Graphics : 300 MHz SM : 300 MHz Memory : 405 MHz Video : 540 MHz Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Default Applications Clocks Graphics : 585 MHz Memory : 5001 MHz Max Clocks Graphics : 1590 MHz SM : 1590 MHz Memory : 5001 MHz Video : 1470 MHz Max Customer Boost Clocks Graphics : 1590 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes Process ID : 2100900 Type : G Name : Xorg Used GPU Memory : 5 MiB Please we will apreciate some quick help in that matter ! Thanx
One other point:
Due to pandemic situation in France we experiment som difficulties to comunicate and then obtain some information concerning that problem, wich is starting to be urgent (!).

Also I should notice that I tried the 440.53 version of the driver, and also the 430.83 (which is more recent than the 440 (according to NVIDIA's site), with exactly the same results !

the result of the nvidia-smi -a command is: (which says that the cGPU mode is VSGA wich is not what we want (!))

==============NVSMI LOG==============

Timestamp : Mon Mar 16 17:24:11 2020
Driver Version : 430.83
CUDA Version : Not Found

Attached GPUs : 1
GPU 00000000:5E:00.0
Product Name : Tesla T4
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1322419111424
GPU UUID : GPU-34d6d925-61d7-ca33-9c9b-34420d8614c9
Minor Number : 0
VBIOS Version : 90.04.38.00.03
MultiGPU Board : No
Board ID : 0x5e00
GPU Part Number : 900-2G183-0000-001
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VSGA
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x5E
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:5E:00.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 15359 MiB
Used : 92 MiB
Free : 15267 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 17.41 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 2100900
Type : G
Name : Xorg
Used GPU Memory : 5 MiB


Please we will apreciate some quick help in that matter !
Thanx

#18
Posted 03/16/2020 05:30 PM   
Hi John It's not a problem with the driver. On your vSphere Host, uninstall the 430.83 (don't upgrade), reboot the Host and re-install 440.53 so you're running the most up to date version. That takes care of that. Once the driver has been reinstalled, make sure vCenter is still configured to "Shared Direct". Now that side of the install is taken care of and there's no need to revisit it. Have you made any changes to the Server BIOS? If no, please review these articles to make sure your BIOS is configured correctly: NVIDIA Support Article: https://nvidia.custhelp.com/app/answers/detail/a_id/4119/~/incorrect-bios-settings-on-a-server-when-used-with-a-hypervisor-can-cause-mmio Use Page 23: https://images.nvidia.com/content/pdf/vgpu/guides/vgpu-deployment-guide-horizon-on-vsphere-final.pdf Please also confirm that you are running [b]Enterprise Plus[/b] licensing on your vSphere Hosts? (vCenter is fine with [b]Standard[/b] licensing) ... Let us know how you get on Regards MG
Hi John

It's not a problem with the driver.

On your vSphere Host, uninstall the 430.83 (don't upgrade), reboot the Host and re-install 440.53 so you're running the most up to date version. That takes care of that.

Once the driver has been reinstalled, make sure vCenter is still configured to "Shared Direct". Now that side of the install is taken care of and there's no need to revisit it.

Have you made any changes to the Server BIOS? If no, please review these articles to make sure your BIOS is configured correctly:

NVIDIA Support Article: https://nvidia.custhelp.com/app/answers/detail/a_id/4119/~/incorrect-bios-settings-on-a-server-when-used-with-a-hypervisor-can-cause-mmio

Use Page 23: https://images.nvidia.com/content/pdf/vgpu/guides/vgpu-deployment-guide-horizon-on-vsphere-final.pdf

Please also confirm that you are running Enterprise Plus licensing on your vSphere Hosts? (vCenter is fine with Standard licensing) ...

Let us know how you get on

Regards

MG

#19
Posted 03/16/2020 06:01 PM   
Hi MG, It's still not working.. 1. I uninstall the 430.83 2. reboot 3. install the 440.53 4. I don't know how to make sur it's working in Shared Direct 5. the only bios thing I did (before to install ESX 6.7 the first time) was to check that the SR-IOV was enabled (and actually that was it). Then no changed to the factury bios. I read the bios link of your post but, because of the confinement of peapole in France (COVID-19), I'm not able to go to my work place in front of the server that time and I don't know when I will be able to do so... I can for sure confirm I'm using the Enterprise Plus version of ESX " VMware vSphere with Operations Management 6 Enterprise Plus " But stil: [root@localhost:~] [b]nvidia-smi[/b] Mon Mar 16 21:06:43 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 | | N/A 37C P8 17W / 70W | 92MiB / 15359MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2100312 G Xorg 5MiB | +-----------------------------------------------------------------------------+ [b]means still no GRID ![/b] and also: [b][root@localhost:~] nvidia-smi vgpu Not supported devices in vGPU mode[/b] I'm lost !! Help.. Regards
Hi MG,

It's still not working..
1. I uninstall the 430.83
2. reboot
3. install the 440.53
4. I don't know how to make sur it's working in Shared Direct
5. the only bios thing I did (before to install ESX 6.7 the first time) was to check that the SR-IOV was enabled (and actually that was it). Then no changed to the factury bios.

I read the bios link of your post but, because of the confinement of peapole in France (COVID-19), I'm not able to go to my work place in front of the server that time and I don't know when I will be able to do so...

I can for sure confirm I'm using the Enterprise Plus version of ESX
" VMware vSphere with Operations Management 6 Enterprise Plus "

But stil:
[root@localhost:~] nvidia-smi
Mon Mar 16 21:06:43 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.53 Driver Version: 440.53 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:5E:00.0 Off | 0 |
| N/A 37C P8 17W / 70W | 92MiB / 15359MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2100312 G Xorg 5MiB |
+-----------------------------------------------------------------------------+
means still no GRID !

and also:
[root@localhost:~] nvidia-smi vgpu
Not supported devices in vGPU mode



I'm lost !! Help..
Regards

#20
Posted 03/16/2020 09:07 PM   
Hi Ok, great. So you're running the correct vGPU driver and you're running the correct vSphere licensing. There's no need to change anything there again. Forget about them and look at other areas. The BIOS is a really important step. I'm not saying it's [u]the[/u] issue, but it could certainly cause [u]an[/u] issue and does need to be set correctly if it isn't already. Do you not have any out of band management for the Server hardware? Is there no way to remotely get to the BIOS on server boot? How do you remotely power it on? No remote management Console to watch it's progress? Regarding vCenter, first I want you to double check something ... 1: Log into vCenter as administrator. 2: Locate the vSphere Host that has the T4 installed and select it. 3: In the centre, click [b]Configure[/b], under [b]Hardware[/b] select [b]PCI Devices[/b]. 4: In the tab that says [b]Passthrough-enabled devices[/b], make sure the T4 is [b]not[/b] listed here. 5: If it is listed here, it must be removed. Click [b]All PCI devices[/b] and [b]Configure Passthrough[/b]. Scroll down and unselect the T4 (there are multiple T4 options in here, none of them should be selected). 6: Reboot the Host afterwards. 7: Once rebooted, using the above steps ensure that the T4 is no longer showing in the [b]Passthrough-enabled devices[/b]. Now follow these steps: 1: Log into vCenter as administrator (if you aren't already). 2: Locate the vSphere Host that has the T4 installed and select it. 3: In the centre, click [b]Configure[/b], under [b]Hardware[/b] select [b]Graphics[/b]. 4: You now have 2 Tabs ([b]Graphics Devices[/b] and [b]Host Graphics[/b]), select [b]Host Graphics[/b]. 5: Select the [b]Edit[/b] tab to the right. 6: In the Window that opens there are 2 sets of 2 options, at the moment you're only interested in the tops ones ([b]Shared[/b] and [b]Shared Direct[/b]). You must select [b]Shared Direct[/b]. 7: Reboot the vSphere Host. Once it comes back up, using the steps above make sure that [b]Shared Direct[/b] has been accepted and is still set. 8: Connect to the Host using SSH and run nvidia-smi vgpu. If vGPU is available, it will list the T4. Try that and see how you get on Regards MG
Hi

Ok, great. So you're running the correct vGPU driver and you're running the correct vSphere licensing. There's no need to change anything there again. Forget about them and look at other areas.

The BIOS is a really important step. I'm not saying it's the issue, but it could certainly cause an issue and does need to be set correctly if it isn't already. Do you not have any out of band management for the Server hardware? Is there no way to remotely get to the BIOS on server boot? How do you remotely power it on? No remote management Console to watch it's progress?

Regarding vCenter, first I want you to double check something ...

1: Log into vCenter as administrator.
2: Locate the vSphere Host that has the T4 installed and select it.
3: In the centre, click Configure, under Hardware select PCI Devices.
4: In the tab that says Passthrough-enabled devices, make sure the T4 is not listed here.
5: If it is listed here, it must be removed. Click All PCI devices and Configure Passthrough. Scroll down and unselect the T4 (there are multiple T4 options in here, none of them should be selected).
6: Reboot the Host afterwards.
7: Once rebooted, using the above steps ensure that the T4 is no longer showing in the Passthrough-enabled devices.

Now follow these steps:

1: Log into vCenter as administrator (if you aren't already).
2: Locate the vSphere Host that has the T4 installed and select it.
3: In the centre, click Configure, under Hardware select Graphics.
4: You now have 2 Tabs (Graphics Devices and Host Graphics), select Host Graphics.
5: Select the Edit tab to the right.
6: In the Window that opens there are 2 sets of 2 options, at the moment you're only interested in the tops ones (Shared and Shared Direct). You must select Shared Direct.
7: Reboot the vSphere Host. Once it comes back up, using the steps above make sure that Shared Direct has been accepted and is still set.
8: Connect to the Host using SSH and run nvidia-smi vgpu. If vGPU is available, it will list the T4.

Try that and see how you get on

Regards

MG

#21
Posted 03/17/2020 08:15 AM   
Hi MG, This project is in POC stage and then, unfortunetly, I don't have the IPMI (supermicro) or IDRAC (HP) to remotely get the bios. For the other point, I don't have vcenter (I won't be able to install it on my windows 10). Then I just can use the hypervisor web-UI and then, I didn't find with that interface, the way to configure hardware PCI-device (that was what I said when I told about the shared direct; I'm not sure to have activated later..). Is there a way to do so using the web-UI ? And if not, how can I install vCenter on my windows 10 ? Thanx Regards John
Hi MG,

This project is in POC stage and then, unfortunetly, I don't have the IPMI (supermicro) or IDRAC (HP) to remotely get the bios.

For the other point, I don't have vcenter (I won't be able to install it on my windows 10). Then I just can use the hypervisor web-UI and then, I didn't find with that interface, the way to configure hardware PCI-device (that was what I said when I told about the shared direct; I'm not sure to have activated later..).

Is there a way to do so using the web-UI ? And if not, how can I install vCenter on my windows 10 ?

Thanx
Regards

John

#22
Posted 03/17/2020 08:33 AM   
apparently, vSphere client is no longer available for ESX later than 6.0 and I'm using 6.7 version..
apparently, vSphere client is no longer available for ESX later than 6.0 and I'm using 6.7 version..

#23
Posted 03/17/2020 08:53 AM   
only the web-UI version..
only the web-UI version..

#24
Posted 03/17/2020 08:54 AM   
Hi You need vCenter for vGPU. I don't mean any offence, but it sounds like you've not used VMware before, or at the very least are not familiar with it or its components. With that in mind, as you've mentioned this is [b]urgent[/b], I would strongly advise that you forget about using vGPU and just use the T4 in Passthrough to a single RDSH VM and give all users access to that. That's the quickest way to build a usable VM and give multiple users a platform to work from (if that is your objective). You can get vCenter installed and look at vGPU after that's done when your users are working. Passthrough doesn't need vCenter and you can configure that directly on the vSphere Host. If this is an acceptable alternative, then reverse the steps I mentioned above about checking for [b]Passthrough-enabled devices[/b]. Please note, that the T4 will still require a license in Passthrough, so make sure you have the NVIDIA License Server up and running. If you still want to go down the vGPU route, then you will need vCenter. In which case, deploy it on another physical Server in your environment. Download the vCenter .iso, this will then allow you to deploy a vCenter VSA to your other Server, but you'll need to know how to configure it. As you're unfamiliar with VMware and you've stated this is a priority, I would advise you just run the T4 in Passthrough and go down that route for the time being, unless there is a specific reason for wanting vGPU in the current situation. Regards MG
Hi

You need vCenter for vGPU.

I don't mean any offence, but it sounds like you've not used VMware before, or at the very least are not familiar with it or its components. With that in mind, as you've mentioned this is urgent, I would strongly advise that you forget about using vGPU and just use the T4 in Passthrough to a single RDSH VM and give all users access to that. That's the quickest way to build a usable VM and give multiple users a platform to work from (if that is your objective). You can get vCenter installed and look at vGPU after that's done when your users are working.

Passthrough doesn't need vCenter and you can configure that directly on the vSphere Host. If this is an acceptable alternative, then reverse the steps I mentioned above about checking for Passthrough-enabled devices. Please note, that the T4 will still require a license in Passthrough, so make sure you have the NVIDIA License Server up and running.

If you still want to go down the vGPU route, then you will need vCenter. In which case, deploy it on another physical Server in your environment. Download the vCenter .iso, this will then allow you to deploy a vCenter VSA to your other Server, but you'll need to know how to configure it.

As you're unfamiliar with VMware and you've stated this is a priority, I would advise you just run the T4 in Passthrough and go down that route for the time being, unless there is a specific reason for wanting vGPU in the current situation.

Regards

MG

#25
Posted 03/17/2020 08:59 AM   
Hi MG, To be perfectly clear, the POC was already done but using a KVM hypervisor. Before to deploy it in production stage, the company-IT ask us to do the same but with VMWare. We have to create a windows 10 vm, using vGPU, give the result to IT-team which will configure it in term of security group compliance for the next deployment (we are planing to deploy 15 vm on theses 3 ESX server). As you say, I'm not familiar with vmware (and actualy our ESX experts have just litle knowledge on grid and vGPU, unfortunetely..). But I don't want to give up this project. The goal is to give the oportunity to our coleague to start specific applications (gaz industry) and to use kind of VNC (DCV actually) to see the screen on their computers. I will try the vCenter way... Many thanx for your advises. John
Hi MG,

To be perfectly clear, the POC was already done but using a KVM hypervisor. Before to deploy it in production stage, the company-IT ask us to do the same but with VMWare. We have to create a windows 10 vm, using vGPU, give the result to IT-team which will configure it in term of security group compliance for the next deployment (we are planing to deploy 15 vm on theses 3 ESX server). As you say, I'm not familiar with vmware (and actualy our ESX experts have just litle knowledge on grid and vGPU, unfortunetely..).
But I don't want to give up this project. The goal is to give the oportunity to our coleague to start specific applications (gaz industry) and to use kind of VNC (DCV actually) to see the screen on their computers.

I will try the vCenter way... Many thanx for your advises.

John

#26
Posted 03/17/2020 11:42 AM   
Hi, I already advised to check your vCenter settings in my first post. If you are not familiar with VMWare it is very hard to give you proper advise.
Hi, I already advised to check your vCenter settings in my first post. If you are not familiar with VMWare it is very hard to give you proper advise.

#27
Posted 03/17/2020 11:54 AM   
Hi The reason I suggested using Passthrough [b]for now[/b], is because as you're unfamiliar with VMware (which is no problem at all, there's plenty of technologies I'm unfamiliar with) it will be the quickest way to give your users access to a GPU accelerated desktop and allow them to work, albeit from the same RDSH VM. While they're working on that RDSH VM, you can then look at how you install and set up vCenter and then in the background, build the Windows 10 VMs that you'll migrate them to, rather than give them nothing in the short term until you've resolved this issue. This approach is the easiest and quickest way to bring up service :-) When you say "gaz industry", which Applications are your users running? Petrel? Kingdom? ... Regards MG
Hi

The reason I suggested using Passthrough for now, is because as you're unfamiliar with VMware (which is no problem at all, there's plenty of technologies I'm unfamiliar with) it will be the quickest way to give your users access to a GPU accelerated desktop and allow them to work, albeit from the same RDSH VM. While they're working on that RDSH VM, you can then look at how you install and set up vCenter and then in the background, build the Windows 10 VMs that you'll migrate them to, rather than give them nothing in the short term until you've resolved this issue.

This approach is the easiest and quickest way to bring up service :-)

When you say "gaz industry", which Applications are your users running? Petrel? Kingdom? ...

Regards

MG

#28
Posted 03/17/2020 12:17 PM   
Petrel, Techlog, eclipse, gocad ... (Petrel is a very graphical consuming !)
Petrel, Techlog, eclipse, gocad ... (Petrel is a very graphical consuming !)

#29
Posted 03/17/2020 01:56 PM   
Hi Very interesting .... Depending on how your testing goes, you may want to seriously consider looking at other GPUs. Although it may work, the T4 isn't really suited to seismic interpretation or eclipse runs, and certainly not with multiple users running on it. Personally, I've found Petrel to be CPU limited, so it's worth making sure you have plenty of high speed CPU Cores for each VM as well, but I guess this depends on your workflow and which modules you're using. The minimum GPU I'd be looking at for these type of workloads would be a P40, then either an RTX 6000 or 8000 if you need more performance than that. If you're doing more Computational processing than 3D, then a V100 / V100S may be a better option than an RTX (although a V100 / V100S will do a great job with 3D as well if needed, and an RTX will do a great job with Computational, but they do have their more specific use cases). Obviously once vCenter is sorted and you're able to use vGPU, you can put multiple users on all of those so they're not a 1:1 relationship. How are you testing the performance? Seagull? Regards MG
Hi

Very interesting ....

Depending on how your testing goes, you may want to seriously consider looking at other GPUs. Although it may work, the T4 isn't really suited to seismic interpretation or eclipse runs, and certainly not with multiple users running on it.

Personally, I've found Petrel to be CPU limited, so it's worth making sure you have plenty of high speed CPU Cores for each VM as well, but I guess this depends on your workflow and which modules you're using.

The minimum GPU I'd be looking at for these type of workloads would be a P40, then either an RTX 6000 or 8000 if you need more performance than that. If you're doing more Computational processing than 3D, then a V100 / V100S may be a better option than an RTX (although a V100 / V100S will do a great job with 3D as well if needed, and an RTX will do a great job with Computational, but they do have their more specific use cases). Obviously once vCenter is sorted and you're able to use vGPU, you can put multiple users on all of those so they're not a 1:1 relationship.

How are you testing the performance? Seagull?

Regards

MG

#30
Posted 03/17/2020 03:28 PM   
Scroll To Top

Add Reply