NVIDIA
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
Hello, I have strange problem, all was good and worked well, but at one moment new vm dot start. When I try to check problem, after "nvidia-smi" command I have "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver...". I try reinstall driver once and problem was solved, but at next day I have this problem again and reinstall driver dont help, what can it be ?
Hello, I have strange problem, all was good and worked well, but at one moment new vm dot start. When I try to check problem, after "nvidia-smi" command I have "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver...". I try reinstall driver once and problem was solved, but at next day I have this problem again and reinstall driver dont help, what can it be ?

#1
Posted 12/09/2016 02:41 PM   
You share too few information (IT Crowd - Have You Tried Turning It Off And On Again? -> YouTube). You should check runtime logs. Something like "dmesg | grep NVRM" or "dmesg | grep nvidia" ... I suppose that you have M60 on ESXi6.5 from previous posts. You should be able to use nvidia support directly.
You share too few information (IT Crowd - Have You Tried Turning It Off And On Again? -> YouTube). You should check runtime logs. Something like "dmesg | grep NVRM" or "dmesg | grep nvidia" ...
I suppose that you have M60 on ESXi6.5 from previous posts. You should be able to use nvidia support directly.

#2
Posted 12/09/2016 02:59 PM   
Always worth a search in the Knowledge Base - http://nvidia.custhelp.com/app/home/
Always worth a search in the Knowledge Base - http://nvidia.custhelp.com/app/home/

#3
Posted 12/09/2016 03:31 PM   
In Knowledge Base i dont find nothing what can help me. I have ESXI 6.5 and TESLA M60 [root@localhost:~] dmesg | grep nvidia 2016-12-09T14:02:27.706Z cpu13:66686)Starting service nvidia-vgpu 2016-12-09T14:02:27.706Z cpu13:66686)Activating Jumpstart plugin nvidia-vgpu. 2016-12-09T14:02:27.908Z cpu13:66686)Jumpstart plugin nvidia-vgpu activated. [root@localhost:~] nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. [root@localhost:~] dmesg | grep NVRM I dont know how i can troubleshoot it correctly...
In Knowledge Base i dont find nothing what can help me.

I have ESXI 6.5 and TESLA M60

[root@localhost:~] dmesg | grep nvidia
2016-12-09T14:02:27.706Z cpu13:66686)Starting service nvidia-vgpu
2016-12-09T14:02:27.706Z cpu13:66686)Activating Jumpstart plugin nvidia-vgpu.
2016-12-09T14:02:27.908Z cpu13:66686)Jumpstart plugin nvidia-vgpu activated.
[root@localhost:~] nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

[root@localhost:~] dmesg | grep NVRM

I dont know how i can troubleshoot it correctly...

#4
Posted 12/12/2016 06:50 AM   
[root@localhost:/etc/init.d] esxcli hardware pci list -c 0x300 -m 0xff 0000:07:00.0 Address: 0000:07:00.0 Segment: 0x0000 Bus: 0x07 Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: ASPEED Technology, Inc. Device Name: ASPEED Graphics Family Configured Owner: Unknown Current Owner: VMkernel Vendor ID: 0x1a03 Device ID: 0x2000 SubVendor ID: 0x1043 SubDevice ID: 0x85f9 Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0x30 Interrupt Line: 0x05 IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x3221 Module ID: -1 Module Name: None Chassis: 0 Physical Slot: 4294967295 Slot Description: Passthru Capable: true Parent Device: PCI 0:6:0:0 Dependent Device: PCI 0:6:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:83:00.0 Address: 0000:83:00.0 Segment: 0x0000 Bus: 0x83 Slot: 0x00 Function: 0x0 VMkernel Name: vmgfx0 Vendor Name: NVIDIA Corporation Device Name: NVIDIATesla M60 Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x13f2 SubVendor ID: 0x10de SubDevice ID: 0x115e Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x05 IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x3401 Module ID: 20 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: Chassis slot 8; function 0; relative bdf 01:00.0 Passthru Capable: true Parent Device: PCI 0:130:8:0 Dependent Device: PCI 0:131:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:84:00.0 Address: 0000:84:00.0 Segment: 0x0000 Bus: 0x84 Slot: 0x00 Function: 0x0 VMkernel Name: vmgfx1 Vendor Name: NVIDIA Corporation Device Name: NVIDIATesla M60 Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x13f2 SubVendor ID: 0x10de SubDevice ID: 0x115e Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x05 IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x3401 Module ID: 20 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: Chassis slot 8; function 0; relative bdf 02:00.0 Passthru Capable: true Parent Device: PCI 0:130:16:0 Dependent Device: PCI 0:132:0:0 Reset Method: Bridge reset FPT Sharable: true
[root@localhost:/etc/init.d] esxcli hardware pci list -c 0x300 -m 0xff
0000:07:00.0
Address: 0000:07:00.0
Segment: 0x0000
Bus: 0x07
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: ASPEED Technology, Inc.
Device Name: ASPEED Graphics Family
Configured Owner: Unknown
Current Owner: VMkernel
Vendor ID: 0x1a03
Device ID: 0x2000
SubVendor ID: 0x1043
SubDevice ID: 0x85f9
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0x30
Interrupt Line: 0x05
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3221
Module ID: -1
Module Name: None
Chassis: 0
Physical Slot: 4294967295
Slot Description:
Passthru Capable: true
Parent Device: PCI 0:6:0:0
Dependent Device: PCI 0:6:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:83:00.0
Address: 0000:83:00.0
Segment: 0x0000
Bus: 0x83
Slot: 0x00
Function: 0x0
VMkernel Name: vmgfx0
Vendor Name: NVIDIA Corporation
Device Name: NVIDIATesla M60
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x13f2
SubVendor ID: 0x10de
SubDevice ID: 0x115e
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x05
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3401
Module ID: 20
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: Chassis slot 8; function 0; relative bdf 01:00.0
Passthru Capable: true
Parent Device: PCI 0:130:8:0
Dependent Device: PCI 0:131:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:84:00.0
Address: 0000:84:00.0
Segment: 0x0000
Bus: 0x84
Slot: 0x00
Function: 0x0
VMkernel Name: vmgfx1
Vendor Name: NVIDIA Corporation
Device Name: NVIDIATesla M60
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x13f2
SubVendor ID: 0x10de
SubDevice ID: 0x115e
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x05
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3401
Module ID: 20
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: Chassis slot 8; function 0; relative bdf 02:00.0
Passthru Capable: true
Parent Device: PCI 0:130:16:0
Dependent Device: PCI 0:132:0:0
Reset Method: Bridge reset
FPT Sharable: true

#5
Posted 12/12/2016 07:10 AM   
How i can see active grafic card ?
How i can see active grafic card ?

#6
Posted 12/12/2016 07:11 AM   
I suppose that you configured card as "VM pass-through" eg. vDGA. NVidia driver (and nvidia-smi) in ESXi cannot service vDGA cards. You should configure card as vGPU (look for Soft3D, vSGA, vGPU and vDGA explanation for example [url]http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-horizon-view-graphics-acceleration-deployment.pdf[/url] or [url]http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/horizon/grid-vgpu-deployment-guide.pdf[/url] or [url]http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/348.07/346.68-348.07-nvidia-grid-quick-start-guide.pdf[/url] or newer).
I suppose that you configured card as "VM pass-through" eg. vDGA. NVidia driver (and nvidia-smi) in ESXi cannot service vDGA cards. You should configure card as vGPU (look for Soft3D, vSGA, vGPU and vDGA explanation for example http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-horizon-view-graphics-acceleration-deployment.pdf or http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/horizon/grid-vgpu-deployment-guide.pdf or http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/348.07/346.68-348.07-nvidia-grid-quick-start-guide.pdf or newer).

#7
Posted 12/12/2016 08:42 PM   
Scroll To Top

Add Reply