NVIDIA
Problem with K1 and Vmware View: Device 8:0.0 is already in use.
I have a Dell R720 with a K1 board in it that I am testing out vDGA in Vmware View 6. My K1 will only give me one of 2 options, either assign all GPUs to PCIe passthrough or none. Not sure if that is the way it is or not. However my problem lies in that when I assign the PCIe passthrough video cards to a VM, the first one will boot fine, and all subsequent VMs will refuse to start and display the error: Device 8:0.0 is already in use. VM 1 is assigned to 7:0.0 VM2 is assigned to 8:0.0 I have tried moving vm2 to 9:0.0 and A:0.0 with the same results, only 1 vm can operate at any given time. Has anyone else had this problem and able to shed some light on it?
I have a Dell R720 with a K1 board in it that I am testing out vDGA in Vmware View 6.

My K1 will only give me one of 2 options, either assign all GPUs to PCIe passthrough or none. Not sure if that is the way it is or not.

However my problem lies in that when I assign the PCIe passthrough video cards to a VM, the first one will boot fine, and all subsequent VMs will refuse to start and display the error: Device 8:0.0 is already in use.

VM 1 is assigned to 7:0.0
VM2 is assigned to 8:0.0

I have tried moving vm2 to 9:0.0 and A:0.0 with the same results, only 1 vm can operate at any given time.

Has anyone else had this problem and able to shed some light on it?

#1
Posted 08/18/2014 09:04 PM   
Hi Jeremy, Can you post some screenshots of the VM settings in vSphere and also the settings for PCI devices under the host too.
Hi Jeremy,

Can you post some screenshots of the VM settings in vSphere and also the settings for PCI devices under the host too.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#2
Posted 08/20/2014 07:25 PM   
Other information: Bios Settings on R720 VT = Enabled Memory Mapped I/O above 4gb = Enabled I/OAT DMA Engine = Enabled PCIe Passthrough [IMG]http://i62.tinypic.com/2e2pbti.png[/IMG] Startup error [IMG]http://i58.tinypic.com/rbjnnk.png[/IMG] VM01 [IMG]http://i59.tinypic.com/afcvuu.png[/IMG] VM06 [IMG]http://i58.tinypic.com/30if1oi.png[/IMG]
Other information:

Bios Settings on R720
VT = Enabled
Memory Mapped I/O above 4gb = Enabled
I/OAT DMA Engine = Enabled

PCIe Passthrough
Image

Startup error
Image

VM01
Image

VM06
Image

#3
Posted 08/21/2014 01:39 AM   
Hi Jeremy, The Hypervisor and VM settings appear correct, but just to eliminate one potential issue can you change this BIOS Setting Memory Mapped I/O above 4gb = Enabled to Disabled and retest please. Thanks
Hi Jeremy,

The Hypervisor and VM settings appear correct, but just to eliminate one potential issue can you change this BIOS Setting

Memory Mapped I/O above 4gb = Enabled

to Disabled and retest please.

Thanks

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#4
Posted 08/23/2014 08:21 AM   
Hi Jeremy, If you look in Device Manager in the VM are you seeing all 4 GPUs showing up there? It sounds like the one VM is claiming all of the GPUs instead of just the one. To confirm, you haven't installed the Nvidia VIB for vSGA, correct? -Mike
Hi Jeremy,

If you look in Device Manager in the VM are you seeing all 4 GPUs showing up there?

It sounds like the one VM is claiming all of the GPUs instead of just the one.

To confirm, you haven't installed the Nvidia VIB for vSGA, correct?

-Mike

Mike Barnett
Sr. Systems Engineer, End User Computing Specialist
VMware
Twitter: MikeBarnett_

#5
Posted 08/23/2014 06:15 PM   
Hey, thanks for the responses, sorry i was out for the weekend. I tried with Memory Mapped IO to Enabled as well, no avail. Confirmed that the VIB is installed. In the device manager for the VM it only shows a single card. Would the card have multiple entries if it is claiming all?
Hey, thanks for the responses, sorry i was out for the weekend.

I tried with Memory Mapped IO to Enabled as well, no avail.

Confirmed that the VIB is installed.

In the device manager for the VM it only shows a single card. Would the card have multiple entries if it is claiming all?

#6
Posted 08/26/2014 01:42 AM   
If you are attempting to use passthrough then you don't want the VIB installed as vSGA is likely claiming some of the GPUs. I would try uninstalling it and then attempt to passthrough again. I would expect all of them to show in the VM if they were all being claimed. -Mike
If you are attempting to use passthrough then you don't want the VIB installed as vSGA is likely claiming some of the GPUs. I would try uninstalling it and then attempt to passthrough again.

I would expect all of them to show in the VM if they were all being claimed.

-Mike

Mike Barnett
Sr. Systems Engineer, End User Computing Specialist
VMware
Twitter: MikeBarnett_

#7
Posted 08/26/2014 05:17 AM   
Things went a bit crazy again, just got a chance to do all those things. Removed VIB, did not change the problem. Only a single VM could grab control of the card still. Maybe a firmware issue? Is there a good way to find the firmware of this card? Any other thoughts would be great.
Things went a bit crazy again, just got a chance to do all those things.

Removed VIB, did not change the problem.

Only a single VM could grab control of the card still.

Maybe a firmware issue?
Is there a good way to find the firmware of this card?

Any other thoughts would be great.

#8
Posted 09/12/2014 04:16 PM   
How long have you had the card? There were firmware updates but those were almost a year or so ago and nothing since. Also, how old is the server? Then some basics, which power supplies do you have in the server? Only one K1, right, and all 6 power pins are connected on the back of the card? I assume you reseated the card?
How long have you had the card? There were firmware updates but those were almost a year or so ago and nothing since. Also, how old is the server? Then some basics, which power supplies do you have in the server? Only one K1, right, and all 6 power pins are connected on the back of the card? I assume you reseated the card?

Regards,

Luke Wignall
Performance Engineering Manager
NVIDIA | Worldwide Sales ­ GRID Computing
http://www.linkedin.com/in/lukewignall/
https://twitter.com/lwignall

#9
Posted 09/13/2014 02:22 PM   
I'm more or less having the same issue in a XenServer 6.2 pool...same messages...same hardware configs.
I'm more or less having the same issue in a XenServer 6.2 pool...same messages...same hardware configs.

#10
Posted 09/15/2014 07:54 PM   
Hello, any solution for this? We have exactly the same problems with all our Grid K1 cards. Now tested in 3x R720 newest BIOS + Grid K1 The hypervisor alway shows "the device is already in use"
Hello,
any solution for this? We have exactly the same problems with all our Grid K1 cards. Now tested in 3x

R720 newest BIOS + Grid K1

The hypervisor alway shows "the device is already in use"

#11
Posted 04/04/2016 10:34 AM   
[quote="inverted_2000"]I'm more or less having the same issue in a XenServer 6.2 pool...same messages...same hardware configs.[/quote] XenServer handles GPU passthrough differently, we'd need to see screenshots / error messages.
inverted_2000 said:I'm more or less having the same issue in a XenServer 6.2 pool...same messages...same hardware configs.


XenServer handles GPU passthrough differently, we'd need to see screenshots / error messages.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#12
Posted 04/04/2016 10:44 AM   
[img]https://picload.org/image/rggaogio/001.png[/img] [img]https://picload.org/image/rggaogic/002.png[/img] [img]https://picload.org/image/rggaogip/003.png[/img] [img]https://picload.org/image/rggaogpg/4.png[/img] vmware.log 2015-11-25T16:44:19.686Z| vmx| I120: PCIPassthru: Failed to register device 0000:08:00.0 error = 0x10 2015-11-25T16:44:19.686Z| vmx| I120: Msg_Post: Error 2015-11-25T16:44:19.686Z| vmx| I120: [msg.pciPassthru.createAdapterFailedDeviceInUse] Device 008:00.0 is already in use. 2015-11-25T16:44:19.686Z| vmx| I120: ---------------------------------------- 2015-11-25T16:44:19.687Z| vmx| I120: Vigor_MessageRevoke: message 'msg.pciPassthru.createAdapterFailedDeviceInUse' (seq 53295) is revoked 2015-11-25T16:44:19.687Z| vmx| I120: Module DevicePowerOn power on failed.
Image
Image
Image
Image

vmware.log
2015-11-25T16:44:19.686Z| vmx| I120: PCIPassthru: Failed to register device 0000:08:00.0 error = 0x10
2015-11-25T16:44:19.686Z| vmx| I120: Msg_Post: Error
2015-11-25T16:44:19.686Z| vmx| I120: [msg.pciPassthru.createAdapterFailedDeviceInUse] Device 008:00.0 is already in use.
2015-11-25T16:44:19.686Z| vmx| I120: ----------------------------------------
2015-11-25T16:44:19.687Z| vmx| I120: Vigor_MessageRevoke: message 'msg.pciPassthru.createAdapterFailedDeviceInUse' (seq 53295) is revoked
2015-11-25T16:44:19.687Z| vmx| I120: Module DevicePowerOn power on failed.

#13
Posted 04/04/2016 10:51 AM   
[quote="Jeremy"] Removed VIB, did not change the problem. [/quote] Just to confirm this, can you run at the ESXi shell esxcli software vib list | grep -i nvidia then also vmkload_mod -l | grep nvidia After that run nvidia-smi and post the output from each here. Whilst this shouldn't make any impact it seems that the hypervisor is blocking access to the PCI devices, so eliminating anything else that could be blocking the resource first should help pin it down.
Jeremy said:

Removed VIB, did not change the problem.



Just to confirm this, can you run at the ESXi shell

esxcli software vib list | grep -i nvidia

then also

vmkload_mod -l | grep nvidia

After that run

nvidia-smi

and post the output from each here.

Whilst this shouldn't make any impact it seems that the hypervisor is blocking access to the PCI devices, so eliminating anything else that could be blocking the resource first should help pin it down.

Jason Southern, Regional Lead for ProVis Sales - EMEA: NVIDIA Ltd.

#14
Posted 04/04/2016 10:54 AM   
We have opend a ticket for this problem @ vmware on 18.11.2015 - the can't find any failure and now say "please contact nvidia" the configuration is correct. We have deinstalled the vibs - this is desscribed in the nvidia docu for vDGA! I can give you the output from: [root@esxi-06:~] esxcli hardware pci list -c 0x0300 -m 0xff 0000:07:00.0 Address: 0000:07:00.0 Segment: 0x0000 Bus: 0x07 Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: NVIDIA Corporation Device Name: GK107GL [GRID K1] Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x0ff2 SubVendor ID: 0x10de SubDevice ID: 0x1012 Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x0f IRQ: 255 Interrupt Vector: 0x41 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x0401 Module ID: 19 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: PCI6; relative bdf 01:00.0 Passthru Capable: true Parent Device: PCI 0:6:8:0 Dependent Device: PCI 0:5:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:08:00.0 Address: 0000:08:00.0 Segment: 0x0000 Bus: 0x08 Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: NVIDIA Corporation Device Name: GK107GL [GRID K1] Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x0ff2 SubVendor ID: 0x10de SubDevice ID: 0x1012 Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x0e IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x0401 Module ID: 19 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: PCI6; relative bdf 02:00.0 Passthru Capable: true Parent Device: PCI 0:6:9:0 Dependent Device: PCI 0:5:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:09:00.0 Address: 0000:09:00.0 Segment: 0x0000 Bus: 0x09 Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: NVIDIA Corporation Device Name: GK107GL [GRID K1] Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x0ff2 SubVendor ID: 0x10de SubDevice ID: 0x1012 Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x0f IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x0401 Module ID: 19 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: PCI6; relative bdf 03:00.0 Passthru Capable: true Parent Device: PCI 0:6:16:0 Dependent Device: PCI 0:5:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:0a:00.0 Address: 0000:0a:00.0 Segment: 0x0000 Bus: 0x0a Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: NVIDIA Corporation Device Name: GK107GL [GRID K1] Configured Owner: VM Passthru Current Owner: VM Passthru Vendor ID: 0x10de Device ID: 0x0ff2 SubVendor ID: 0x10de SubDevice ID: 0x1012 Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0xa1 Interrupt Line: 0x0e IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x0401 Module ID: 19 Module Name: pciPassthru Chassis: 0 Physical Slot: 4294967295 Slot Description: PCI6; relative bdf 04:00.0 Passthru Capable: true Parent Device: PCI 0:6:17:0 Dependent Device: PCI 0:5:0:0 Reset Method: Bridge reset FPT Sharable: true 0000:11:00.0 Address: 0000:11:00.0 Segment: 0x0000 Bus: 0x11 Slot: 0x00 Function: 0x0 VMkernel Name: Vendor Name: Matrox Electronics Systems Ltd. Device Name: G200eR2 Configured Owner: Unknown Current Owner: VMkernel Vendor ID: 0x102b Device ID: 0x0534 SubVendor ID: 0x1028 SubDevice ID: 0x048c Device Class: 0x0300 Device Class Name: VGA compatible controller Programming Interface: 0x00 Revision ID: 0x00 Interrupt Line: 0x0b IRQ: 255 Interrupt Vector: 0x00 PCI Pin: 0x00 Spawned Bus: 0x00 Flags: 0x0221 Module ID: -1 Module Name: None Chassis: 0 Physical Slot: 4294967295 Slot Description: Embedded Video Passthru Capable: true Parent Device: PCI 0:16:0:0 Dependent Device: PCI 0:16:0:0 Reset Method: Bridge reset FPT Sharable: true [root@esxi-06:~] As you can see all cores are presented to the hypervisor correct. The first vm starts with no problems. But if you start the second one the vSphere client only shows "device already in use" and the esxi log this: 2015-11-25T16:44:19.686Z| vmx| I120: PCIPassthru: Failed to register device 0000:08:00.0 error = 0x10 2015-11-25T16:44:19.686Z| vmx| I120: Msg_Post: Error 2015-11-25T16:44:19.686Z| vmx| I120: [msg.pciPassthru.createAdapterFailedDeviceInUse] Device 008:00.0 is already in use. 2015-11-25T16:44:19.686Z| vmx| I120: ---------------------------------------- 2015-11-25T16:44:19.687Z| vmx| I120: Vigor_MessageRevoke: message 'msg.pciPassthru.createAdapterFailedDeviceInUse' (seq 53295) is revoked 2015-11-25T16:44:19.687Z| vmx| I120: Module DevicePowerOn power on failed. I think only few people will have this problem - because Enterprise Plus cust. use vGPU. We have tested this procedere with three identical Dell R720 servers.
We have opend a ticket for this problem @ vmware on 18.11.2015 - the can't find any failure and now say "please contact nvidia" the configuration is correct.

We have deinstalled the vibs - this is desscribed in the nvidia docu for vDGA!

I can give you the output from:
[root@esxi-06:~] esxcli hardware pci list -c 0x0300 -m 0xff
0000:07:00.0
Address: 0000:07:00.0
Segment: 0x0000
Bus: 0x07
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: NVIDIA Corporation
Device Name: GK107GL [GRID K1]
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x0ff2
SubVendor ID: 0x10de
SubDevice ID: 0x1012
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x0f
IRQ: 255
Interrupt Vector: 0x41
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x0401
Module ID: 19
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: PCI6; relative bdf 01:00.0
Passthru Capable: true
Parent Device: PCI 0:6:8:0
Dependent Device: PCI 0:5:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:08:00.0
Address: 0000:08:00.0
Segment: 0x0000
Bus: 0x08
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: NVIDIA Corporation
Device Name: GK107GL [GRID K1]
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x0ff2
SubVendor ID: 0x10de
SubDevice ID: 0x1012
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x0e
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x0401
Module ID: 19
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: PCI6; relative bdf 02:00.0
Passthru Capable: true
Parent Device: PCI 0:6:9:0
Dependent Device: PCI 0:5:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:09:00.0
Address: 0000:09:00.0
Segment: 0x0000
Bus: 0x09
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: NVIDIA Corporation
Device Name: GK107GL [GRID K1]
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x0ff2
SubVendor ID: 0x10de
SubDevice ID: 0x1012
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x0f
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x0401
Module ID: 19
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: PCI6; relative bdf 03:00.0
Passthru Capable: true
Parent Device: PCI 0:6:16:0
Dependent Device: PCI 0:5:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:0a:00.0
Address: 0000:0a:00.0
Segment: 0x0000
Bus: 0x0a
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: NVIDIA Corporation
Device Name: GK107GL [GRID K1]
Configured Owner: VM Passthru
Current Owner: VM Passthru
Vendor ID: 0x10de
Device ID: 0x0ff2
SubVendor ID: 0x10de
SubDevice ID: 0x1012
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0x0e
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x0401
Module ID: 19
Module Name: pciPassthru
Chassis: 0
Physical Slot: 4294967295
Slot Description: PCI6; relative bdf 04:00.0
Passthru Capable: true
Parent Device: PCI 0:6:17:0
Dependent Device: PCI 0:5:0:0
Reset Method: Bridge reset
FPT Sharable: true

0000:11:00.0
Address: 0000:11:00.0
Segment: 0x0000
Bus: 0x11
Slot: 0x00
Function: 0x0
VMkernel Name:
Vendor Name: Matrox Electronics Systems Ltd.
Device Name: G200eR2
Configured Owner: Unknown
Current Owner: VMkernel
Vendor ID: 0x102b
Device ID: 0x0534
SubVendor ID: 0x1028
SubDevice ID: 0x048c
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0x00
Interrupt Line: 0x0b
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x0221
Module ID: -1
Module Name: None
Chassis: 0
Physical Slot: 4294967295
Slot Description: Embedded Video
Passthru Capable: true
Parent Device: PCI 0:16:0:0
Dependent Device: PCI 0:16:0:0
Reset Method: Bridge reset
FPT Sharable: true
[root@esxi-06:~]


As you can see all cores are presented to the hypervisor correct. The first vm starts with no problems. But if you start the second one the vSphere client only shows "device already in use" and the esxi log this:

2015-11-25T16:44:19.686Z| vmx| I120: PCIPassthru: Failed to register device 0000:08:00.0 error = 0x10
2015-11-25T16:44:19.686Z| vmx| I120: Msg_Post: Error
2015-11-25T16:44:19.686Z| vmx| I120: [msg.pciPassthru.createAdapterFailedDeviceInUse] Device 008:00.0 is already in use.
2015-11-25T16:44:19.686Z| vmx| I120: ----------------------------------------
2015-11-25T16:44:19.687Z| vmx| I120: Vigor_MessageRevoke: message 'msg.pciPassthru.createAdapterFailedDeviceInUse' (seq 53295) is revoked
2015-11-25T16:44:19.687Z| vmx| I120: Module DevicePowerOn power on failed.


I think only few people will have this problem - because Enterprise Plus cust. use vGPU. We have tested this procedere with three identical Dell R720 servers.

#15
Posted 04/04/2016 11:02 AM   
Scroll To Top

Add Reply