NVIDIA
GRID K2 NOT wokring on XenServer 6.5SP1
We have two XenServer hosts in a K2 pool, but one of them does not work. Only K2P machines can start on it. K200, K220, K240, K260 and K280 machines could not start on it. Error message from XenServer is Internal error: xenopsd internal error: Device.Ioemu_failed("vgpu exited unexpectedly"). I tried two version of friver as 367.43 and 367.64, neither of them worked. View log files and obtain below information: Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.279318] block tda: sector-size: 512/512 capacity: 629145600 Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.478882] device vif428.0 entered promiscuous mode Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.538774] IPv6: ADDRCONF(NETDEV_UP): vif428.0: link is not ready Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526518] xapi7: port 2(vif428.0) entered disabled state Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526628] device vif428.0 left promiscuous mode Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526630] xapi7: port 2(vif428.0) entered disabled state Jan 17 11:24:56 xxxxxxxxxxen032 kernel: [417987.488401] NVRM: RmInitAdapter failed! (0x24:0x40:1035) I want to ask who encountered this kind of issue before, please? Thank you.
We have two XenServer hosts in a K2 pool, but one of them does not work. Only K2P machines can start on it. K200, K220, K240, K260 and K280 machines could not start on it. Error message from XenServer is Internal error: xenopsd internal error: Device.Ioemu_failed("vgpu exited unexpectedly"). I tried two version of friver as 367.43 and 367.64, neither of them worked. View log files and obtain below information:

Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.279318] block tda: sector-size: 512/512 capacity: 629145600
Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.478882] device vif428.0 entered promiscuous mode
Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.538774] IPv6: ADDRCONF(NETDEV_UP): vif428.0: link is not ready
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526518] xapi7: port 2(vif428.0) entered disabled state
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526628] device vif428.0 left promiscuous mode
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526630] xapi7: port 2(vif428.0) entered disabled state
Jan 17 11:24:56 xxxxxxxxxxen032 kernel: [417987.488401] NVRM: RmInitAdapter failed! (0x24:0x40:1035)

I want to ask who encountered this kind of issue before, please?
Thank you.

#1
Posted 01/17/2017 06:37 AM   
[quote="Apple"]Jan 17 11:24:56 xxxxxxxxxxen032 kernel: [417987.488401] NVRM: RmInitAdapter failed! (0x24:0x40:1035) [/quote] I suppose that host/Dom0 driver does not start correctly. The host/Dom0 driver is needed for vGPU virtualization (eg. Kxxx). - NVidia should decode "(0x24:0x40:1035)" error. - You can try "nvidia-smi" or "nvidia-smi --debug=logfile". - You can try to "grep" relevant system logs with "NVRM" or "nvidia" tags. - The similar problem (the same error) is hit by google here [url]https://devtalk.nvidia.com/default/topic/957827/geforce-maxwell-titan-x-and-pascal-titan-x-in-same-machine-/[/url] and it concludes in some HW incompatibility problem. - You can try to compare "pci" resource assignment "lspci -nv | more" (or "lspci -nvv | more") between host machines (search for "10de:11bf" K2 gpu chips and compare "Memory"/"Region" ...). (for example you forgot to enable "64bit" pci resource assignments in BIOS but your server hw+config+BIOSversion are unknown to hint.).
Apple said:Jan 17 11:24:56 xxxxxxxxxxen032 kernel: [417987.488401] NVRM: RmInitAdapter failed! (0x24:0x40:1035)

I suppose that host/Dom0 driver does not start correctly. The host/Dom0 driver is needed for vGPU virtualization (eg. Kxxx).
- NVidia should decode "(0x24:0x40:1035)" error.
- You can try "nvidia-smi" or "nvidia-smi --debug=logfile".
- You can try to "grep" relevant system logs with "NVRM" or "nvidia" tags.
- The similar problem (the same error) is hit by google here https://devtalk.nvidia.com/default/topic/957827/geforce-maxwell-titan-x-and-pascal-titan-x-in-same-machine-/ and it concludes in some HW incompatibility problem.
- You can try to compare "pci" resource assignment "lspci -nv | more" (or "lspci -nvv | more") between host machines (search for "10de:11bf" K2 gpu chips and compare "Memory"/"Region" ...). (for example you forgot to enable "64bit" pci resource assignments in BIOS but your server hw+config+BIOSversion are unknown to hint.).

#2
Posted 01/17/2017 08:51 AM   
I think it might be best to post on the Citrix forums or raise a support case with them as the errors are all xapi... http://discussions.citrix.com/forum/523-gpu-technologies/ might be a good place as the XS team monitor it. Check the host license and host driver and update the host BIOS is the best I can think of.
I think it might be best to post on the Citrix forums or raise a support case with them as the errors are all xapi...


http://discussions.citrix.com/forum/523-gpu-technologies/
might be a good place as the XS team monitor it.

Check the host license and host driver and update the host BIOS is the best I can think of.

#3
Posted 01/17/2017 03:49 PM   
worth checking your hypervisor version on 32-bit (6.2 XS and lower) >4GB MMIO needs disabling: http://discussions.citrix.com/topic/351335-every-4th-guest-fails-with-error-vgpu-exited-unexpectedly/
worth checking your hypervisor version on 32-bit (6.2 XS and lower) >4GB MMIO needs disabling: http://discussions.citrix.com/topic/351335-every-4th-guest-fails-with-error-vgpu-exited-unexpectedly/

#4
Posted 01/17/2017 03:51 PM   
Error can arise if power cable issue: https://support.citrix.com/article/CTX210153
Error can arise if power cable issue: https://support.citrix.com/article/CTX210153

#5
Posted 01/17/2017 03:52 PM   
You can also get this error if host has not been rebooted: http://www.florisvanderploeg.com/upgrading-the-nvidia-grid-vgpu-driver-on-xenserver/
You can also get this error if host has not been rebooted: http://www.florisvanderploeg.com/upgrading-the-nvidia-grid-vgpu-driver-on-xenserver/

#6
Posted 01/17/2017 03:53 PM   
Scroll To Top

Add Reply