NVIDIA
[SOLVED] M10 with ESXi 6.5 - vGPU: Device not supported
Hello All, at the moment we are evaluating the vGPU feature for a customer to get a 3D-CAD-VDI-Environment up an running. But for now I'm stuck with the basic installtion/configuration of the NVIDIA-driver. I installed the newest version (NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673.vib) sucessfully. Also the output of 'nvidia-smi' looks quite good: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.73 Driver Version: 384.73 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A | | N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A | | N/A 40C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A | | N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla M10 On | 00000000:0D:00.0 Off | N/A | | N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 68391 G Xorg 4MiB | | 1 68412 G Xorg 4MiB | | 2 68428 G Xorg 4MiB | | 3 68446 G Xorg 4MiB | +-----------------------------------------------------------------------------+ But when I try to run a VM with a vGPU assigned to, it won't start. After some research it seems that the M10 is not supported: [root@HV04:~] nvidia-smi vgpu #0, Device not supported #1, Device not supported #2, Device not supported #3, Device not supported Not supported on the device(s) So what can I do now? Can somebody please help or give a hint? Cheers Benjamin
Hello All,

at the moment we are evaluating the vGPU feature for a customer to get a 3D-CAD-VDI-Environment up an running.
But for now I'm stuck with the basic installtion/configuration of the NVIDIA-driver.

I installed the newest version (NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673.vib) sucessfully.
Also the output of 'nvidia-smi' looks quite good:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A |
| N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A |
| N/A 40C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A |
| N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M10 On | 00000000:0D:00.0 Off | N/A |
| N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68391 G Xorg 4MiB |
| 1 68412 G Xorg 4MiB |
| 2 68428 G Xorg 4MiB |
| 3 68446 G Xorg 4MiB |
+-----------------------------------------------------------------------------+



But when I try to run a VM with a vGPU assigned to, it won't start.
After some research it seems that the M10 is not supported:

[root@HV04:~] nvidia-smi vgpu
#0, Device not supported
#1, Device not supported
#2, Device not supported
#3, Device not supported
Not supported on the device(s)



So what can I do now? Can somebody please help or give a hint?

Cheers
Benjamin

#1
Posted 10/11/2017 06:15 PM   
Hi Is it a brand new M10? If yes, have you changed the GPU from "Compute Mode" to "Graphics Mode"? Regards
Hi

Is it a brand new M10?

If yes, have you changed the GPU from "Compute Mode" to "Graphics Mode"?

Regards

#2
Posted 10/11/2017 06:55 PM   
Hi, thanks for your reply. Yes it is a brand new M10. I haven't checked for the GPU-Mode, because every documentation just mentions this for M60 and M6, not for M10. But now I've tried it, with no success: [b]-----------Begin cli output-----------[/b] [root@HV04:~] gpumodeswitch --listgpumodes NVIDIA GPU Mode Switch Utility Version 1.23.0 Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved. ERROR: Read card info failed by using character device based. [root@HV04:~] gpumodeswitch --gpumode graphics --auto NVIDIA GPU Mode Switch Utility Version 1.23.0 Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved. ERROR: Read card info failed by using character device based. [b]-----------End cli output-----------[/b] But, because I had to remove the Host_Driver Package to use the gpuswitch-tool, I reinstalled it afterwards. Then I tested it BEFORE a reboot: [b]-----------Begin cli output-----------[/b] [root@HV04:~] nvidia-smi Thu Oct 12 09:43:54 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.73 Driver Version: 384.73 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A | | N/A 37C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A | | N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A | | N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla M10 Off | 00000000:0D:00.0 Off | N/A | | N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 68925 G Xorg 4MiB | | 1 68945 G Xorg 4MiB | | 2 68965 G Xorg 4MiB | +-----------------------------------------------------------------------------+ [root@HV04:~] nvidia-smi vgpu #0, Device not supported #1, Device not supported #2, Device not supported Thu Oct 12 09:44:00 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.73 Driver Version: 384.73 | |-------------------------------+--------------------------------+------------+ | GPU Name | Bus-Id | GPU-Util | | vGPU ID Name | VM ID VM Name | vGPU-Util | |===============================+================================+============| | 3 Tesla M10 | 00000000:0D:00.0 | 0% | +-------------------------------+--------------------------------+------------+ [root@HV04:~] nvidia-smi vgpu -s #0, Device not supported #1, Device not supported #2, Device not supported GPU 00000000:0D:00.0 GRID M10-0B GRID M10-0Q GRID M10-1A GRID M10-1B GRID M10-1Q GRID M10-2A GRID M10-2Q GRID M10-4A GRID M10-4Q GRID M10-8A GRID M10-8Q [b]-----------End cli output-----------[/b] It seems that now one GPU-Core is running fine. But after reboot everything is back as it was before - all of the GPUs are in "not supported"-state. I have got the impression, that it has something to do with the Xorg-Process. BR Benjamin
Hi,

thanks for your reply.
Yes it is a brand new M10.

I haven't checked for the GPU-Mode, because every documentation just mentions this for M60 and M6, not for M10.
But now I've tried it, with no success:

-----------Begin cli output-----------
[root@HV04:~] gpumodeswitch --listgpumodes

NVIDIA GPU Mode Switch Utility Version 1.23.0
Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved.


ERROR: Read card info failed by using character device based.

[root@HV04:~] gpumodeswitch --gpumode graphics --auto

NVIDIA GPU Mode Switch Utility Version 1.23.0
Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved.


ERROR: Read card info failed by using character device based.
-----------End cli output-----------

But, because I had to remove the Host_Driver Package to use the gpuswitch-tool, I reinstalled it afterwards.
Then I tested it BEFORE a reboot:


-----------Begin cli output-----------

[root@HV04:~] nvidia-smi
Thu Oct 12 09:43:54 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A |
| N/A 37C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A |
| N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A |
| N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M10 Off | 00000000:0D:00.0 Off | N/A |
| N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68925 G Xorg 4MiB |
| 1 68945 G Xorg 4MiB |
| 2 68965 G Xorg 4MiB |
+-----------------------------------------------------------------------------+
[root@HV04:~] nvidia-smi vgpu
#0, Device not supported
#1, Device not supported
#2, Device not supported
Thu Oct 12 09:44:00 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------+--------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|===============================+================================+============|
| 3 Tesla M10 | 00000000:0D:00.0 | 0% |
+-------------------------------+--------------------------------+------------+


[root@HV04:~] nvidia-smi vgpu -s
#0, Device not supported
#1, Device not supported
#2, Device not supported
GPU 00000000:0D:00.0
GRID M10-0B
GRID M10-0Q
GRID M10-1A
GRID M10-1B
GRID M10-1Q
GRID M10-2A
GRID M10-2Q
GRID M10-4A
GRID M10-4Q
GRID M10-8A
GRID M10-8Q
-----------End cli output-----------

It seems that now one GPU-Core is running fine.
But after reboot everything is back as it was before - all of the GPUs are in "not supported"-state.
I have got the impression, that it has something to do with the Xorg-Process.

BR
Benjamin

#3
Posted 10/12/2017 09:57 AM   
Just use the .iso and boot the server from that? No need to remove any .vibs then. So 1 of the GPUs is now working and the other 3 aren't? ... Try running the changemode utility again and verify all 4 GPUs using the utility afterwards. Regards
Just use the .iso and boot the server from that? No need to remove any .vibs then.

So 1 of the GPUs is now working and the other 3 aren't? ... Try running the changemode utility again and verify all 4 GPUs using the utility afterwards.

Regards

#4
Posted 10/12/2017 11:20 AM   
Hi, it seems not related to the GPU-mode-settings by the cli-tool "gpumodeswitch". Because of the last steps/results I change my google-search and found quit a helpful Post (by User "jmain"): https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713 After changing the Setting in vCenter (flash-version!), everything is up and running now. Thanks for your Help! BR Benjamin PS @Nvidia Please update your documentation (as told nearly a year ago).
Hi,

it seems not related to the GPU-mode-settings by the cli-tool "gpumodeswitch".

Because of the last steps/results I change my google-search and found quit a helpful Post (by User "jmain"):
https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713

After changing the Setting in vCenter (flash-version!), everything is up and running now.

Thanks for your Help!

BR
Benjamin

PS @Nvidia
Please update your documentation (as told nearly a year ago).

#5
Posted 10/12/2017 12:03 PM   
@Ben: There is no GPUModeSwitch for M10. It is clearly documented that this is not necessary for M10 as this is pure graphics board. BTW, even Tesla M60 is delivered in Graphics mode for more than 1 year now so that it shouldn't be necessary to use GPUmodeSwitch at all. @Benjamin: Please let me know which documentation you think is not accurate and I will trigger the right people to update. When I check as example our quick start guide it is documented correctly that GPUmodeSwitch is only valid for M6 and M60... http://docs.nvidia.com/grid/5.0/grid-software-quick-start-guide/index.html Regards Simon
@Ben:

There is no GPUModeSwitch for M10. It is clearly documented that this is not necessary for M10 as this is pure graphics board. BTW, even Tesla M60 is delivered in Graphics mode for more than 1 year now so that it shouldn't be necessary to use GPUmodeSwitch at all.

@Benjamin:
Please let me know which documentation you think is not accurate and I will trigger the right people to update.
When I check as example our quick start guide it is documented correctly that GPUmodeSwitch is only valid for M6 and M60...
http://docs.nvidia.com/grid/5.0/grid-software-quick-start-guide/index.html


Regards

Simon

#6
Posted 10/13/2017 11:43 AM   
[quote="Simon"]@Benjamin: Please let me know which documentation you think is not accurate and I will trigger the right people to update. When I check as example our quick start guide it is documented correctly that GPUmodeSwitch is only valid for M6 and M60... http://docs.nvidia.com/grid/5.0/grid-software-quick-start-guide/index.html [/quote] Yes, I was referring to the quick-start-guide, and yes, you are right, in there the information about GPUmodeSwitch is correct. But this was not the problem nor the solution and not what I meant. Read this post by 'jmain' (already mentioned it before): [url]https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713[/url] There he says, that since an update (especially) for ESXi 6.5, you have to change some GPU settings: [quote="jmain"]Procedure: - Select the ESXi 6.5 host in vCenter 6.5, next select the “Configure” tab and scroll down to “Graphics”. - Highlight each GPUs that you want to use for vGPU and then select the edit icon to modify the Graphics device settings. - Select “Shared Direct” for vGPU - The host will need to be rebooted for the changes to take effect, after that your vGPU VMs should now start normally. [/quote] So this part should be added to the quick-start-guide. Hope I expressed myself a little bit better this time (sorry for that, I'm not a native English speaker). BR Benjamin
Simon said:@Benjamin:
Please let me know which documentation you think is not accurate and I will trigger the right people to update.
When I check as example our quick start guide it is documented correctly that GPUmodeSwitch is only valid for M6 and M60...

http://docs.nvidia.com/grid/5.0/grid-software-quick-start-guide/index.html


Yes, I was referring to the quick-start-guide, and yes, you are right, in there the information about GPUmodeSwitch is correct.
But this was not the problem nor the solution and not what I meant.

Read this post by 'jmain' (already mentioned it before):
https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713
There he says, that since an update (especially) for ESXi 6.5, you have to change some GPU settings:

jmain said:Procedure:
- Select the ESXi 6.5 host in vCenter 6.5, next select the “Configure” tab and scroll down to “Graphics”.
- Highlight each GPUs that you want to use for vGPU and then select the edit icon to modify the Graphics device settings.
- Select “Shared Direct” for vGPU
- The host will need to be rebooted for the changes to take effect, after that your vGPU VMs should now start normally.

So this part should be added to the quick-start-guide.
Hope I expressed myself a little bit better this time (sorry for that, I'm not a native English speaker).

BR
Benjamin

#7
Posted 10/17/2017 12:39 PM   
Hi Benjamin, thanks for your comments and clarification. I will ask if this is something that should be in the quick start guide. We have this documented in the user guide so I don't think there is a need to add this also in the quick start guide. http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html BTW: If you are not aware yet we have this new docu page: http://docs.nvidia.com/grid Best regards Simon
Hi Benjamin,

thanks for your comments and clarification.
I will ask if this is something that should be in the quick start guide. We have this documented in the user guide so I don't think there is a need to add this also in the quick start guide.

http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html


BTW: If you are not aware yet we have this new docu page:


http://docs.nvidia.com/grid


Best regards

Simon

#8
Posted 10/18/2017 11:52 AM   
Scroll To Top

Add Reply