NVIDIA
ESXi 6.5 + Tesla M60 - Not working anymore after driver update
Hi and hello, we have several XL190 Gen8 servers with Tesla M60 adapters running vSphere 6.5. The cards have not been in use until now - we are preparing for a PoC. The Adapters were listed in vSphere client and by nvidia-smi: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.106 Driver Version: 367.106 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 0000:89:00.0 Off | Off | | N/A 36C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 0000:8A:00.0 Off | Off | | N/A 31C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ No symptoms they were in compute mode, we also had a VM running and using the card. Then we updated the driver to this version: NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 What we did was a procedure that worked well in our other datacenter: - Host -> maintenance - esxcli software vib remove -n NVIDIA-VMware_ESXi_6.5_Host_Driver Removal Result Message: Operation finished successfully. Reboot Required: false VIBs Installed: VIBs Removed: NVIDIA_bootbank_NVIDIA-VMware_ESXi_6.5_Host_Driver_367.106-1OEM.650.0.0.4598673 VIBs Skipped: - reboot - installed new driver NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 - reboot But after that: [root@VI:~] nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running [root@VI:~] esxcli software vib list | grep -i nvidia NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-12-13 In vSphere client the "Graphics Adapter" has changed from "NVIDIA Tesla M60" to "GM204GL [Tesla M60]" [root@VI:~] lspci -n | grep 10de 0000:89:00.0 Class 0300: 10de:13f2 0000:8a:00.0 Class 0300: 10de:13f2 This seems to show it the card is still in grapohics mode, IIRC. Please help! Kind regards ZPPO
Hi and hello,

we have several XL190 Gen8 servers with Tesla M60 adapters running vSphere 6.5.
The cards have not been in use until now - we are preparing for a PoC.
The Adapters were listed in vSphere client and by nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.106 Driver Version: 367.106 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:89:00.0 Off | Off |
| N/A 36C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 0000:8A:00.0 Off | Off |
| N/A 31C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

No symptoms they were in compute mode, we also had a VM running and using the card.


Then we updated the driver to this version:
NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673

What we did was a procedure that worked well in our other datacenter:

- Host -> maintenance

- esxcli software vib remove -n NVIDIA-VMware_ESXi_6.5_Host_Driver
Removal Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed:
VIBs Removed: NVIDIA_bootbank_NVIDIA-VMware_ESXi_6.5_Host_Driver_367.106-1OEM.650.0.0.4598673
VIBs Skipped:

- reboot

- installed new driver NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673

- reboot

But after that:

[root@VI:~] nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

[root@VI:~] esxcli software vib list | grep -i nvidia
NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-12-13

In vSphere client the "Graphics Adapter" has changed from "NVIDIA Tesla M60" to "GM204GL [Tesla M60]"

[root@VI:~] lspci -n | grep 10de
0000:89:00.0 Class 0300: 10de:13f2
0000:8a:00.0 Class 0300: 10de:13f2

This seems to show it the card is still in grapohics mode, IIRC.

Please help!


Kind regards
ZPPO

#1
Posted 12/18/2018 04:10 PM   
[quote=""] - installed new driver NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 [/quote] Here is your issue! You cannot run the old kepler host driver on Maxwell boards!!!! vGPU needs software licenses and therefore you only get the correct drivers in the GRID license portal where you need to create an account first: nvidia.com/grideval regards Simon
said:

- installed new driver NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673



Here is your issue! You cannot run the old kepler host driver on Maxwell boards!!!!
vGPU needs software licenses and therefore you only get the correct drivers in the GRID license portal where you need to create an account first:
nvidia.com/grideval

regards

Simon

#2
Posted 12/18/2018 05:59 PM   
Hi Simon, thanks for your reply! The driver's release notes included in NVIDIA-vGPU-kepler-vSphere-6.5-367.128-370.28.zip tell me it would work and they even contain a part regarding M60: "GRID SOFTWARE FOR VMWARE VSPHERE VERSION 367.128/370.28 RN-07347-001 _v4.7 | July 2018 Release Notes ... 2.1. Supported NVIDIA GPUs and Validated Server Platforms This release of NVIDIA GRID software provides support for the following NVIDIA GPUs on VMware vSphere, running on validated server hardware platforms: ? GRID K1 ? GRID K2 ? Tesla M6 ? Tesla M10 ? Tesla M60 For a list of validated server platforms, refer to NVIDIA GRID Certified Servers. Tesla M60 and M6 GPUs support compute mode and graphics mode. GRID vGPU requires GPUs that support both modes to operate in graphics mode. Recent Tesla M60 GPUs and M6 GPUs are supplied in graphics mode. However, your GPU might be in compute mode if it is an older Tesla M60 GPU or M6 GPU, or if its mode has previously been changed. To configure the mode of Tesla M60 and M6 GPUs, use the gpumodeswitch tool provided with GRID software releases..." So why would this info be incorrect? Regards, ZPPO
Hi Simon,

thanks for your reply!

The driver's release notes included in
NVIDIA-vGPU-kepler-vSphere-6.5-367.128-370.28.zip
tell me it would work and they even contain a part regarding M60:

"GRID SOFTWARE FOR VMWARE
VSPHERE VERSION 367.128/370.28
RN-07347-001 _v4.7 | July 2018
Release Notes

...

2.1. Supported NVIDIA GPUs and Validated Server
Platforms
This release of NVIDIA GRID software provides support for the following NVIDIA
GPUs on VMware vSphere, running on validated server hardware platforms:
? GRID K1
? GRID K2
? Tesla M6
? Tesla M10
? Tesla M60
For a list of validated server platforms, refer to NVIDIA GRID Certified Servers.
Tesla M60 and M6 GPUs support compute mode and graphics mode. GRID vGPU
requires GPUs that support both modes to operate in graphics mode.
Recent Tesla M60 GPUs and M6 GPUs are supplied in graphics mode. However, your
GPU might be in compute mode if it is an older Tesla M60 GPU or M6 GPU, or if its
mode has previously been changed.
To configure the mode of Tesla M60 and M6 GPUs, use the gpumodeswitch tool
provided with GRID software releases..."

So why would this info be incorrect?


Regards,
ZPPO

#3
Posted 12/19/2018 11:35 AM   
Well, you shouldn't mix-up different things. There is the same driver available for Kepler and Maxwell (but only in the GRID license portal). I assume they just didn't take the effort to create a seperate release note for Kepler only as it is stated very clear on the public download page that the driver you downloaded is only for GRID K1/K2. The release notes above are mentioning the R367 branch, which for sure also supports Maxwell GPUs. Once again: vGPU starting with Maxwell generation requires software licensing and therefore you need to create an account in our GRID licensing portal in order to download the appropriate software! Regards Simon
Well,
you shouldn't mix-up different things. There is the same driver available for Kepler and Maxwell (but only in the GRID license portal). I assume they just didn't take the effort to create a seperate release note for Kepler only as it is stated very clear on the public download page that the driver you downloaded is only for GRID K1/K2.
The release notes above are mentioning the R367 branch, which for sure also supports Maxwell GPUs.
Once again:
vGPU starting with Maxwell generation requires software licensing and therefore you need to create an account in our GRID licensing portal in order to download the appropriate software!

Regards

Simon

#4
Posted 12/21/2018 07:17 AM   
Scroll To Top

Add Reply