NVIDIA
M60 vGPU with Xorg "(EE) No devices detected"
We have a new M60 in our Dell R720, VMware ESXi 6.0/vSphere 6. vGPU profiles work fine with a Windows 10 VM, but not in CentOS (tried 6.8 and 7), where Xorg.0.log always says [i](EE) No devices detected.[/i] and exits: [code] [ 13572.243] (II) LoadModule: "glx" [ 13572.243] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [ 13572.248] (II) Module glx: vendor="NVIDIA Corporation" [ 13572.248] compiled for 4.0.2, module version = 1.0.0 [ 13572.248] Module class: X.Org Server Extension [ 13572.248] (II) NVIDIA GLX Module 361.45.09 Tue May 10 08:44:16 PDT 2016 [ 13572.248] (II) LoadModule: "nvidia" [ 13572.249] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so [ 13572.249] (II) Module nvidia: vendor="NVIDIA Corporation" [ 13572.249] compiled for 4.0.2, module version = 1.0.0 [ 13572.249] Module class: X.Org Video Driver [ 13572.249] (II) NVIDIA dlloader X Driver 361.45.09 Tue May 10 08:22:21 PDT 2016 [ 13572.249] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs [ 13572.249] (--) using VT number 7 [ 13572.252] (EE) No devices detected. [ 13572.252] (EE) Fatal server error: [ 13572.252] (EE) no screens found(EE) [ 13572.252] (EE) [/code] This is running 361.45.09 drivers which appear okay on the hypervisor side as well as the guest VM. The guest VM runs nvidia-smi and sees a GRID M60-4Q vGPU profile. xorg.conf was generated by `nvidia-xconfig --enable-all-gpus --use-display-device=none`. Licensing appears correctly set up. The GPU has been put into graphical mode. Kernel module is loaded, and dmesg shows nothing untoward. The R720 was previously running with a K1 and K2, which are now removed to keep things simple. And to reiterate, the Win10 VM works, with OpenGL renderer string showing the M60 vGPU profile. I've exhausted my forum / internet searching. Anyone have ideas to try? nvidia-bug-report.log.gz: [url]https://mft.opentext.com/MFT/Transfer?action=GetFile&name=37b52528-58cc-4060-9cfc-f8d2fb458dcf&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true[/url] vmware.log.gz: [url]https://mft.opentext.com/MFT/Transfer?action=GetFile&name=b9093aab-0050-4c95-a234-373f5874d76b&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true[/url] (URLs will expire on 2016-09-13) Thanks
We have a new M60 in our Dell R720, VMware ESXi 6.0/vSphere 6. vGPU profiles
work fine with a Windows 10 VM, but not in CentOS (tried 6.8 and 7), where Xorg.0.log
always says (EE) No devices detected. and exits:

[ 13572.243] (II) LoadModule: "glx"
[ 13572.243] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 13572.248] (II) Module glx: vendor="NVIDIA Corporation"
[ 13572.248] compiled for 4.0.2, module version = 1.0.0
[ 13572.248] Module class: X.Org Server Extension
[ 13572.248] (II) NVIDIA GLX Module 361.45.09 Tue May 10 08:44:16 PDT 2016
[ 13572.248] (II) LoadModule: "nvidia"
[ 13572.249] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 13572.249] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 13572.249] compiled for 4.0.2, module version = 1.0.0
[ 13572.249] Module class: X.Org Video Driver
[ 13572.249] (II) NVIDIA dlloader X Driver 361.45.09 Tue May 10 08:22:21 PDT 2016
[ 13572.249] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 13572.249] (--) using VT number 7

[ 13572.252] (EE) No devices detected.
[ 13572.252] (EE)
Fatal server error:
[ 13572.252] (EE) no screens found(EE)
[ 13572.252] (EE)


This is running 361.45.09 drivers which appear okay on the hypervisor side as well as the guest VM.
The guest VM runs nvidia-smi and sees a GRID M60-4Q vGPU profile. xorg.conf was generated by `nvidia-xconfig --enable-all-gpus --use-display-device=none`. Licensing appears correctly set up. The GPU has been put into graphical mode. Kernel module is loaded, and dmesg shows nothing untoward. The R720 was previously running with a K1 and K2, which are now removed to keep things simple. And to reiterate, the Win10 VM works, with OpenGL renderer string showing the M60 vGPU profile. I've exhausted my forum / internet searching.

Anyone have ideas to try?

nvidia-bug-report.log.gz:
https://mft.opentext.com/MFT/Transfer?action=GetFile&name=37b52528-58cc-4060-9cfc-f8d2fb458dcf&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true
vmware.log.gz:
https://mft.opentext.com/MFT/Transfer?action=GetFile&name=b9093aab-0050-4c95-a234-373f5874d76b&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true
(URLs will expire on 2016-09-13)

Thanks

#1
Posted 08/30/2016 09:36 PM   
I don't think CentOS is an OS officially supported by VMware (and consequently by NVIDIA) so you might want consider that. CentOS is supported by Citrix Linux VDA at the moment and given it's similarity to RHEL I would expect from our side for it to work. However you should look carefully at which OSs and even versions Vmware/Citrix support. There are a few common setup issues with CentOS / RHEL detailed in our knowledge base, could you have a look at them: http://nvidia.custhelp.com/app/answers/list/st/5/kw/centos%20grid/page/1 And see if anything rings a bell? Rachel
I don't think CentOS is an OS officially supported by VMware (and consequently by NVIDIA) so you might want consider that. CentOS is supported by Citrix Linux VDA at the moment and given it's similarity to RHEL I would expect from our side for it to work. However you should look carefully at which OSs and even versions Vmware/Citrix support.

There are a few common setup issues with CentOS / RHEL detailed in our knowledge base, could you have a look at them: http://nvidia.custhelp.com/app/answers/list/st/5/kw/centos%20grid/page/1

And see if anything rings a bell?

Rachel

#2
Posted 08/31/2016 10:37 AM   
That configuration will run CentOS 7 perfectly well, whilst it may not be "supported" by all vendors in the stack it should run perfectly well. However, First observation - M60 are not certified in the Dell R720, you need the R730 Second - Double check that nouveau is completely disabled and not restarting after you installed the NVIDIA driver. You need to block it in several locations to ensure it's not capturing the hardware and preventing it being detected properly. Also , what remoting solution are you using? You reference vSphere as the underlying hypervisor, but there's no mention of the remoting solution, without which there are no display devices attached. Horizon should add this to xorg.conf when the agent is installed.
That configuration will run CentOS 7 perfectly well, whilst it may not be "supported" by all vendors in the stack it should run perfectly well.

However,

First observation - M60 are not certified in the Dell R720, you need the R730

Second - Double check that nouveau is completely disabled and not restarting after you installed the NVIDIA driver. You need to block it in several locations to ensure it's not capturing the hardware and preventing it being detected properly.

Also , what remoting solution are you using?

You reference vSphere as the underlying hypervisor, but there's no mention of the remoting solution, without which there are no display devices attached. Horizon should add this to xorg.conf when the agent is installed.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#3
Posted 08/31/2016 10:56 AM   
Also, just to confirm that you're using the Linux driver from the bundle downloaded from https://nvidia.flexnetoperations.com/control/nvda/login These are the drivers required for M60. You should also amend the license settings in gridd.conf (though that's not directly relevant to this issue).
Also, just to confirm that you're using the Linux driver from the bundle downloaded from


https://nvidia.flexnetoperations.com/control/nvda/login


These are the drivers required for M60.

You should also amend the license settings in gridd.conf (though that's not directly relevant to this issue).

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#4
Posted 08/31/2016 11:50 AM   
Thanks for the replies. [quote="RachelBerry"] I don't think CentOS is an OS officially supported by VMware (and consequently by NVIDIA) so you might want consider that. [/quote] Ah, thanks. The GRID ReleaseNotes explicitly include CentOS, and that VMWare link does include both 6.x and 7 for ESXi 6 U2, but I guess you're saying vGPU profiles for VMWare are a separate support issue. I've been unable to separate hypervisor support from Horizon support (which we're not using, see below) in the VMWare links I've found. Do you have a more explicit link? Anyway, this is a bit tangential since Jason says it *should* work on CentOS 7. [quote="RachelBerry"] There are a few common setup issues with CentOS / RHEL detailed in our knowledge base, could you have a look at them: http://nvidia.custhelp.com/app/answers/list/st/5/kw/centos%20grid/page/1 [/quote] I've reviewed that (and seen most of those posts already) but don't see anything relevant. [quote="JasonSouthern"] Also, just to confirm that you're using the Linux driver from the bundle downloaded from https://nvidia.flexnetoperations.com/control/nvda/login [/quote] Yes, this is where we got the drivers. [quote="JasonSouthern"] You should also amend the license settings in gridd.conf (though that's not directly relevant to this issue). [/quote] Also done, and the license server sees the VM has a license registered. [quote="JasonSouthern"] First observation - M60 are not certified in the Dell R720, you need the R730 [/quote] This was my mistake, it is the R730. [quote="JasonSouthern"] Second - Double check that nouveau is completely disabled and not restarting after you installed the NVIDIA driver. You need to block it in several locations to ensure it's not capturing the hardware and preventing it being detected properly. [/quote] Yes, nouveau is blacklisted, nvidia module is loaded. [quote="JasonSouthern"] Also , what remoting solution are you using? You reference vSphere as the underlying hypervisor, but there's no mention of the remoting solution, without which there are no display devices attached. Horizon should add this to xorg.conf when the agent is installed. [/quote] This perhaps is the issue. We (OpenText) are an ISV, with our own remoting solution (ETX). Perhaps this is my misunderstanding since previously we ran Bare Metal with K1/K2; Are you saying the Nvidia driver will not load X.org without a special xorg.conf? I.e. nvidia-xconfig --use-display-device=none doesn't work with vGPU, like it does for bare metal? I only want the headless X.org to run, and all remoting is my own. Thanks.
Thanks for the replies.

RachelBerry said:
I don't think CentOS is an OS officially supported by VMware (and consequently by NVIDIA) so you might want consider that.


Ah, thanks. The GRID ReleaseNotes explicitly include CentOS, and that VMWare link does include both 6.x and 7 for ESXi 6 U2, but I guess you're saying vGPU profiles for VMWare are a separate support issue. I've been unable to separate hypervisor support from Horizon support (which we're not using, see below) in the VMWare links I've found. Do you have a more explicit link? Anyway, this is a bit tangential since Jason says it *should* work on CentOS 7.

RachelBerry said:
There are a few common setup issues with CentOS / RHEL detailed in our knowledge base, could you have a look at them: http://nvidia.custhelp.com/app/answers/list/st/5/kw/centos%20grid/page/1



I've reviewed that (and seen most of those posts already) but don't see anything relevant.

JasonSouthern said:
Also, just to confirm that you're using the Linux driver from the bundle downloaded from


https://nvidia.flexnetoperations.com/control/nvda/login



Yes, this is where we got the drivers.

JasonSouthern said:
You should also amend the license settings in gridd.conf (though that's not directly relevant to this issue).


Also done, and the license server sees the VM has a license registered.


JasonSouthern said:
First observation - M60 are not certified in the Dell R720, you need the R730


This was my mistake, it is the R730.

JasonSouthern said:
Second - Double check that nouveau is completely disabled and not restarting after you installed the NVIDIA driver. You need to block it in several locations to ensure it's not capturing the hardware and preventing it being detected properly.


Yes, nouveau is blacklisted, nvidia module is loaded.

JasonSouthern said:
Also , what remoting solution are you using?

You reference vSphere as the underlying hypervisor, but there's no mention of the remoting solution, without which there are no display devices attached. Horizon should add this to xorg.conf when the agent is installed.


This perhaps is the issue. We (OpenText) are an ISV, with our own remoting solution (ETX). Perhaps this is my misunderstanding since previously we ran Bare Metal with K1/K2; Are you saying the Nvidia driver will not load X.org without a special xorg.conf? I.e. nvidia-xconfig --use-display-device=none doesn't work with vGPU, like it does for bare metal?

I only want the headless X.org to run, and all remoting is my own.

Thanks.

#5
Posted 08/31/2016 03:18 PM   
After a bunch of testing, the solution is to not use [i]nvidia-xonfig[/i] to generate xorg.conf. X.org won't start with the generated [i]ServerLayout[/i], [i]Monitor[/i] and [i]Screen[/i] sections (even with [i]UseDisplayDevice "None"[/i]). The device section also needs an explicit [i]BusID[/i] device added. A minimal working config is, e.g.: [code] Section "DRI" Mode 0666 EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GRID M60-4Q" BusID "PCI:2:0:0" EndSection [/code]
After a bunch of testing, the solution is to not use nvidia-xonfig to generate xorg.conf. X.org won't start with the generated ServerLayout, Monitor and Screen sections (even with UseDisplayDevice "None"). The device section also needs an explicit BusID device added.

A minimal working config is, e.g.:

Section "DRI"
Mode 0666
EndSection

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GRID M60-4Q"
BusID "PCI:2:0:0"
EndSection

#6
Posted 09/06/2016 04:00 PM   
[quote="NathanKidd"]After a bunch of testing, the solution is to not use [i]nvidia-xonfig[/i] to generate xorg.conf. X.org won't start with the generated [i]ServerLayout[/i], [i]Monitor[/i] and [i]Screen[/i] sections (even with [i]UseDisplayDevice "None"[/i]). The device section also needs an explicit [i]BusID[/i] device added. A minimal working config is, e.g.: [code] Section "DRI" Mode 0666 EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GRID M60-4Q" BusID "PCI:2:0:0" EndSection [/code] [/quote] I ran into the exact same problem when installing headless xorg for the Amazon g3 instance and this works like magic. Could you share the full working config? Thanks!
NathanKidd said:After a bunch of testing, the solution is to not use nvidia-xonfig to generate xorg.conf. X.org won't start with the generated ServerLayout, Monitor and Screen sections (even with UseDisplayDevice "None"). The device section also needs an explicit BusID device added.

A minimal working config is, e.g.:

Section "DRI"
Mode 0666
EndSection

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GRID M60-4Q"
BusID "PCI:2:0:0"
EndSection




I ran into the exact same problem when installing headless xorg for the Amazon g3 instance and this works like magic. Could you share the full working config? Thanks!

#7
Posted 08/14/2017 05:47 PM   
Scroll To Top

Add Reply