NVIDIA
Supported Servers with Tesla M60 & ESXi 6.0 (Dell PowerEdge R720xd)
I want to confirm or find a list of supported servers for the Tesla M60 card with ESXi 6.0. I've installed on a Dell PowerEdge R720 the 352.54 vib, and am getting the "nvidia-smi has failed because it couldn't communicate with the nvidia driver. make sure that the latest nvidia driver is installed and running." message. I've validated that using vmkload_mod - l that the driver appears to not be starting either. I've checked dmesg, but am unsure as to what to look for to indicate any error, but am not seeing much. I've also checked to see if the card shows up using "lspci | grep -i vga" and other variations, but do not see the card, which is why I am suspecting that an R720 will not work. The driver installs fine, and I've done the necessary steps of Maintenance mode, install, reboot, and exit maintenance mode, now multiple times to no avail. I have an R730, that I can try next, but I want to validate that it is worth the effort first. I'm hoping it's a misconfiguration, and that I've missed a cable connection in the chassis, as I really would like to get this 720 to work. Please let me know.
I want to confirm or find a list of supported servers for the Tesla M60 card with ESXi 6.0. I've installed on a Dell PowerEdge R720 the 352.54 vib, and am getting the "nvidia-smi has failed because it couldn't communicate with the nvidia driver. make sure that the latest nvidia driver is installed and running." message. I've validated that using vmkload_mod - l that the driver appears to not be starting either. I've checked dmesg, but am unsure as to what to look for to indicate any error, but am not seeing much. I've also checked to see if the card shows up using "lspci | grep -i vga" and other variations, but do not see the card, which is why I am suspecting that an R720 will not work.

The driver installs fine, and I've done the necessary steps of Maintenance mode, install, reboot, and exit maintenance mode, now multiple times to no avail. I have an R730, that I can try next, but I want to validate that it is worth the effort first.

I'm hoping it's a misconfiguration, and that I've missed a cable connection in the chassis, as I really would like to get this 720 to work. Please let me know.

#1
Posted 12/30/2015 10:38 PM   
http://www.nvidia.com/object/grid-certified-servers.html You can filter the list of certified servers by card type. R720 is not certified by Dell, R730 is, as long as you have the relevant psu's, power cable etc. Best to check with Dell on the specific requirements to retrofit the card.

http://www.nvidia.com/object/grid-certified-servers.html


You can filter the list of certified servers by card type.


R720 is not certified by Dell, R730 is, as long as you have the relevant psu's, power cable etc. Best to check with Dell on the specific requirements to retrofit the card.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#2
Posted 12/31/2015 10:15 AM   
Jason, Thank you for that. That is extremely helpful. Now for the next part, is there a difference in supported R730's? i.e. an R730xd versus an R730? You have listed an R730, and I can get my hands on one of those in the future, but I have an R730xd now, that appears to be exhibiting the same behavior. Would some form of logs be useful?
Jason,

Thank you for that. That is extremely helpful.

Now for the next part, is there a difference in supported R730's? i.e. an R730xd versus an R730? You have listed an R730, and I can get my hands on one of those in the future, but I have an R730xd now, that appears to be exhibiting the same behavior. Would some form of logs be useful?

#3
Posted 12/31/2015 03:44 PM   
You should check with Dell. It's possible that it's simply a BIOS issue or may well not be supported in the xd chassis. Do you have the enablement kit for the R730 including power cables and the relevant PSU's?
You should check with Dell. It's possible that it's simply a BIOS issue or may well not be supported in the xd chassis.

Do you have the enablement kit for the R730 including power cables and the relevant PSU's?

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#4
Posted 01/01/2016 11:39 AM   
Could you direct me to information on the "enablement kit" ?
Could you direct me to information on the "enablement kit" ?

#5
Posted 01/04/2016 08:04 PM   
You need to speak to Dell. Most servers don't ship with the required PSU, PCIe risers, cables etc and some may require modified heatsinks or airflow baffles. Each OEM has a different set of additional components which may be required for retrofit. In some cases it's just a power cable, in others it's a complete set of PSU's, risers, baffles, heatsinks and cables so depending on what you already have, depends on what you need to acquire. The OEM (in your case Dell) are the best people to ask for the details of what you require.
You need to speak to Dell.

Most servers don't ship with the required PSU, PCIe risers, cables etc and some may require modified heatsinks or airflow baffles. Each OEM has a different set of additional components which may be required for retrofit. In some cases it's just a power cable, in others it's a complete set of PSU's, risers, baffles, heatsinks and cables so depending on what you already have, depends on what you need to acquire.

The OEM (in your case Dell) are the best people to ask for the details of what you require.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#6
Posted 01/05/2016 09:03 AM   
JRR, Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem. Thanks David
JRR,

Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem.

Thanks

David

#7
Posted 05/17/2016 11:20 AM   
JRR, Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem. Thanks David
JRR,

Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem.

Thanks

David

#8
Posted 05/17/2016 11:21 AM   
I know for the R720xd Dell chose not to certify whereas they did the R720 - the R720xd has some extra room for storage which makes everything else a bit more squashed and affected the thermal cooling iirc... For anyone with a new M60 - I would storngly advise checkign it is in Graphics and not compute mode as per: http://nvidia.custhelp.com/app/answers/detail/a_id/4106/kw/m60 You might want to search the KB database for other reasons nvidia-smi fails: http://nvidia.custhelp.com/app/answers/detail/a_id/4119/kw/nvidia-smi BUT as you are dealing with a possibly unsupported server I think as JAson suggests you really need to talk to DELL and your hypervisor vendor as you could be left unsupported even if it works.
I know for the R720xd Dell chose not to certify whereas they did the R720 - the R720xd has some extra room for storage which makes everything else a bit more squashed and affected the thermal cooling iirc...

For anyone with a new M60 - I would storngly advise checkign it is in Graphics and not compute mode as per: http://nvidia.custhelp.com/app/answers/detail/a_id/4106/kw/m60

You might want to search the KB database for other reasons nvidia-smi fails: http://nvidia.custhelp.com/app/answers/detail/a_id/4119/kw/nvidia-smi

BUT as you are dealing with a possibly unsupported server I think as JAson suggests you really need to talk to DELL and your hypervisor vendor as you could be left unsupported even if it works.

#9
Posted 05/17/2016 02:44 PM   
Hi Rachel, I was able to install the gpumodeswitch vib in my ESX6.0u2 host and I was able to successfully change the mode over to graphics. Oddly the gpumodeswitch is able to see the cards. After the mode was changed I installed the software, rebooted the system and receive nothing when I run "vmkload_mod -l | grep nvidia" David
Hi Rachel,

I was able to install the gpumodeswitch vib in my ESX6.0u2 host and I was able to successfully change the mode over to graphics. Oddly the gpumodeswitch is able to see the cards. After the mode was changed I installed the software, rebooted the system and receive nothing when I run "vmkload_mod -l | grep nvidia"

David

#10
Posted 05/17/2016 03:07 PM   
As it is uncertified I think you need to go back to the server OEM and talk to them. I'm afraid with uncertified configurations this can happen.
As it is uncertified I think you need to go back to the server OEM and talk to them. I'm afraid with uncertified configurations this can happen.

#11
Posted 05/17/2016 03:20 PM   
Hi All, I have two questions. I was able to install the GRID 3.0 VIB into ESXi 6.0 U2, (NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib) no issue and everything came up properly. However after the installation, the guide that I should use the gpumodeswitch to switch modes. Interestingly the instruction in the switchmode doc said to remove any NVIDIA drivers - which was bit weird. But I did that. I tried to install the modeswitch vib (NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib) and it gave me an InstallationError, saying the vib does not contain a signature. I lowered the acceptance level to community supported, but no luck to install. any thoughts. Thanks Segreen
Hi All,

I have two questions. I was able to install the GRID 3.0 VIB into ESXi 6.0 U2, (NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib) no issue and everything came up properly. However after the installation, the guide that I should use the gpumodeswitch to switch modes.

Interestingly the instruction in the switchmode doc said to remove any NVIDIA drivers - which was bit weird. But I did that.

I tried to install the modeswitch vib (NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib) and it gave me an InstallationError, saying the vib does not contain a signature. I lowered the acceptance level to community supported, but no luck to install.

any thoughts.

Thanks
Segreen

#12
Posted 05/22/2016 12:09 AM   
Did you follow the process in the documentation exactly? 1. Put the ESXi host into maintenance mode. # vim-cmd hostsvc/maintenance_mode_enter 2. If an NVIDIA driver is already installed on the ESXi host, remove the driver. a) Get the name of the VIB package that contains the NVIDIA driver. # esxcli software vib list | grep -i nvidia b) Remove the VIB package that contains the NVIDIA driver. # esxcli software vib remove -n NVIDIA-driver-package NVIDIA-driver-package is the VIB package name that you got in the previous step. 3. Run the esxcli command to install the VIB. # esxcli software vib install -v /NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib 4. Take the host out of maintenance mode. # vim-cmd hostsvc/maintenance_mode_exit 5. Reboot the ESXi host. There are several versions of the modeswitch utility and the .vib version does require the removal of the vGPU Manager. I personally would not recommend using the .vib version unless you are unable to use the bootable iso tool using the host remote managment software. The reason being that the ISO is a much simpler tool to work with and only requires 2 reboots, one to start the ISO and one to switch back to the hypervisor. Using the .vib you need to remove vGPU manager restart install mode switch .vib restart switch mode restart remove modeswitch .vib restart install vGPU .vib restart That's 4 extra restarts to allow you to stay within ESXi. I find it so much faster to simply boot to the ISO.
Did you follow the process in the documentation exactly?

1. Put the ESXi host into maintenance mode.
# vim-cmd hostsvc/maintenance_mode_enter
2. If an NVIDIA driver is already installed on the ESXi host, remove the driver.
a) Get the name of the VIB package that contains the NVIDIA driver.
# esxcli software vib list | grep -i nvidia
b) Remove the VIB package that contains the NVIDIA driver.
# esxcli software vib remove -n NVIDIA-driver-package
NVIDIA-driver-package is the VIB package name that you got in the previous step.
3. Run the esxcli command to install the VIB.
# esxcli software vib install -v /NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib
4. Take the host out of maintenance mode.
# vim-cmd hostsvc/maintenance_mode_exit
5. Reboot the ESXi host.

There are several versions of the modeswitch utility and the .vib version does require the removal of the vGPU Manager.

I personally would not recommend using the .vib version unless you are unable to use the bootable iso tool using the host remote managment software. The reason being that the ISO is a much simpler tool to work with and only requires 2 reboots, one to start the ISO and one to switch back to the hypervisor.

Using the .vib you need to

remove vGPU manager
restart
install mode switch .vib
restart
switch mode
restart
remove modeswitch .vib
restart
install vGPU .vib
restart

That's 4 extra restarts to allow you to stay within ESXi. I find it so much faster to simply boot to the ISO.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#13
Posted 05/22/2016 08:45 AM   
Scroll To Top

Add Reply