Artificial Intelligence Computing Leadership from NVIDIA
Tesla K80 Initital Setup Problem
Hi all, I ma new to the Tesla community. I recently set up a machine for CAD combined with CFD and FEA. Machine specs are: Dual Xeon Silver 4216 128 GB Memory Asus WS C621E SAGE main board HDDs are all solid state NVMe Asus GeForce RTX2070 video card PSU: Corsair AX1600i (main) Corsair RX850 (for a little extra power) Latest BIOS installed I bought three Tesla K80s of Amazon. They were used and re-certified so I bought the insurance with them. I am having issues getting the machine to boot after installing the first card. I have Above 4G Decoding enabled and my machine does POST and begins loading windows but hangs up and says it could not start windows. so I am trying to narrow down what the issue could be. 1) No NVIDIA Tesla drivers are loaded yet...cant load them until a TESLA is installed and windows starts 2) Location on main board...The GeForce RTX 2070 is in slot 7 (bottom) and I installed the first Tesla in slot 5. the closest they can physically be located. Other cards will go into slots 1 and 3 once the first card is working 3) Not compatible with mother board or combo of GeForce and Tesla on the same board (god I hope not) 4) Thermal issue...the card was seriously hot. I have rad that these will heat up and need active cooling but it sounded like that's only the case when you are really taxing them. I know I have to add active cooling and I am 3D printing a duct to move allot of air but the duct is not done yet 5) Bad card? I have plenty of power and its wired up right, maybe I just have to wait until the air duct is done to cool it and try again if the heat is causing it to shut down. I have not tried putting it into a different slot. Thanks Scott
Hi all, I ma new to the Tesla community. I recently set up a machine for CAD combined with CFD and FEA. Machine specs are:

Dual Xeon Silver 4216
128 GB Memory
Asus WS C621E SAGE main board
HDDs are all solid state NVMe
Asus GeForce RTX2070 video card
PSU: Corsair AX1600i (main) Corsair RX850 (for a little extra power)
Latest BIOS installed

I bought three Tesla K80s of Amazon. They were used and re-certified so I bought the insurance with them.

I am having issues getting the machine to boot after installing the first card. I have Above 4G Decoding enabled and my machine does POST and begins loading windows but hangs up and says it could not start windows. so I am trying to narrow down what the issue could be.

1) No NVIDIA Tesla drivers are loaded yet...cant load them until a TESLA is installed and windows starts

2) Location on main board...The GeForce RTX 2070 is in slot 7 (bottom) and I installed the first Tesla in slot 5. the closest they can physically be located. Other cards will go into slots 1 and 3 once the first card is working

3) Not compatible with mother board or combo of GeForce and Tesla on the same board (god I hope not)

4) Thermal issue...the card was seriously hot. I have rad that these will heat up and need active cooling but it sounded like that's only the case when you are really taxing them. I know I have to add active cooling and I am 3D printing a duct to move allot of air but the duct is not done yet

5) Bad card?

I have plenty of power and its wired up right, maybe I just have to wait until the air duct is done to cool it and try again if the heat is causing it to shut down. I have not tried putting it into a different slot.

Thanks
Scott

#1
Posted 05/08/2020 05:50 PM   
Hi Unfortunately I don't think that's going to work. You're using a current generation GeForce GPU with a really old Tesla GPU and they have [b]massively[/b] different driver requirements and they're about as far apart in physical architecture as you can get. If you're trying to benefit from Graphics and Compute, using 2 different GPU models is quite an old way of doing it. This used to be called NVIDIA Maximus, but it's not used any more due to advancements in architecture. The last time I used this approach was back in 2015, but I was using a Quadro and Tesla, not GeForce. I was running an M6000 and K80 and that worked. Today, I'd just use a single, current generation GPU which would massively out perform that configuration. You 100% need a fan to cool the K80, it's not optional. Whether you're running workloads on it or it's sat there idle, otherwise you'll damage it. It needs very good air flow - just as any other Passive GPU does. Or, you need an appropriate chassis that's designed for Passive GPUs. Your insurance is unlikely to cover thermal damage. Regardless of whether the system will boot or not, I don't think this will work. I'm pretty sure that you need a Quadro driver to support the Tesla, and you can't use that because you have GeForce in there. Being completely honest, to save you a lot of time, hassle, frustration and being underwhelmed at the K80s performance, here's a bit of guidance ... Return the RTX2070 and 3 K80s and get a refund. Purchase 1 Quadro GP100. These are based on the Pascal architecture. They have very good Graphics and Compute capability in a single GPU and will be much better for what you're trying to do. It has 16GB of HBM2 memory and is a seriously powerful bit of kit!! It's also an "Active" GPU (meaning it has a fan built in) so no need to mess around building your own, and it will use less power / generate less heat. You can purchase GP100s on eBay for a little over £1K if you shop around, which is probably not much more than the cost of all your existing GPUs (plus insurance) together. If you go down this path, make sure you go for the [b]GP[/b]100, not the [b]P[/b]100. The GP100 is Active and has display heads on the back of it so you can connect your monitors. It's designed to go in a Workstation (just like you're doing). The P100 on the other hand is headless and Passive, it's designed to go in a Server. Performance between them is the same, but if you buy the P100, you won't be able to use it unless you use virtualisation. Just something to be aware of. More modern GPUs are available like the GV100 which is built on Volta, but these are more expensive. You could use a Titan V (again Volta), but you're then back to using using the GeForce gaming driver again, not Quadro, so I'd avoid this. The Titan V has less RAM than the GP100 (12GB), but it's still HBM2, so is very fast! But if you're doing CAD / CFD / FEA, then driver support is very important! For reference, NVIDIA's Titan product line is classed as "Prosumer", so it's a step up in terms of hardware from (Consumer) GeForce, but still uses the GeForce driver. The easiest, most cost effective way to do what you're after is use the GP100. That's the best bit of advice I can offer. Regards MG
Hi

Unfortunately I don't think that's going to work.

You're using a current generation GeForce GPU with a really old Tesla GPU and they have massively different driver requirements and they're about as far apart in physical architecture as you can get.

If you're trying to benefit from Graphics and Compute, using 2 different GPU models is quite an old way of doing it. This used to be called NVIDIA Maximus, but it's not used any more due to advancements in architecture. The last time I used this approach was back in 2015, but I was using a Quadro and Tesla, not GeForce. I was running an M6000 and K80 and that worked. Today, I'd just use a single, current generation GPU which would massively out perform that configuration.

You 100% need a fan to cool the K80, it's not optional. Whether you're running workloads on it or it's sat there idle, otherwise you'll damage it. It needs very good air flow - just as any other Passive GPU does. Or, you need an appropriate chassis that's designed for Passive GPUs. Your insurance is unlikely to cover thermal damage.

Regardless of whether the system will boot or not, I don't think this will work. I'm pretty sure that you need a Quadro driver to support the Tesla, and you can't use that because you have GeForce in there.

Being completely honest, to save you a lot of time, hassle, frustration and being underwhelmed at the K80s performance, here's a bit of guidance ... Return the RTX2070 and 3 K80s and get a refund. Purchase 1 Quadro GP100. These are based on the Pascal architecture. They have very good Graphics and Compute capability in a single GPU and will be much better for what you're trying to do. It has 16GB of HBM2 memory and is a seriously powerful bit of kit!! It's also an "Active" GPU (meaning it has a fan built in) so no need to mess around building your own, and it will use less power / generate less heat. You can purchase GP100s on eBay for a little over £1K if you shop around, which is probably not much more than the cost of all your existing GPUs (plus insurance) together. If you go down this path, make sure you go for the GP100, not the P100. The GP100 is Active and has display heads on the back of it so you can connect your monitors. It's designed to go in a Workstation (just like you're doing). The P100 on the other hand is headless and Passive, it's designed to go in a Server. Performance between them is the same, but if you buy the P100, you won't be able to use it unless you use virtualisation. Just something to be aware of.

More modern GPUs are available like the GV100 which is built on Volta, but these are more expensive. You could use a Titan V (again Volta), but you're then back to using using the GeForce gaming driver again, not Quadro, so I'd avoid this. The Titan V has less RAM than the GP100 (12GB), but it's still HBM2, so is very fast! But if you're doing CAD / CFD / FEA, then driver support is very important! For reference, NVIDIA's Titan product line is classed as "Prosumer", so it's a step up in terms of hardware from (Consumer) GeForce, but still uses the GeForce driver.

The easiest, most cost effective way to do what you're after is use the GP100.

That's the best bit of advice I can offer.

Regards

MG

#2
Posted 05/10/2020 11:34 AM   
Thanks for responding. I was a bit afraid this could be the case but also figured I could run the GPUs independently. The GP100 is just about beyond what I'd like to invest in this machine right now. in the states I'm looking at $2700 or so. I have seen several posts on other forums where they were successful and satisfied with running a GTX1080 and K80 together so I figured there is a way to get an RTX2070 and K80 to run together. that was a recent post but if it doesn't work for me I'll rethink and see about going dow n the road you are suggesting. I do appreciate your notes. Cooling: no doubt! At 300W I expected that. I have designed a duct that fits in my case and combines the flow from two 120mm fans to three ports to supply flow to all thee cards and dump about 20 percent to the board controller. Just waiting to get it printed.
Thanks for responding. I was a bit afraid this could be the case but also figured I could run the GPUs independently. The GP100 is just about beyond what I'd like to invest in this machine right now. in the states I'm looking at $2700 or so. I have seen several posts on other forums where they were successful and satisfied with running a GTX1080 and K80 together so I figured there is a way to get an RTX2070 and K80 to run together.

that was a recent post but if it doesn't work for me I'll rethink and see about going dow n the road you are suggesting. I do appreciate your notes.

Cooling: no doubt! At 300W I expected that. I have designed a duct that fits in my case and combines the flow from two 120mm fans to three ports to supply flow to all thee cards and dump about 20 percent to the board controller. Just waiting to get it printed.

#3
Posted 05/13/2020 02:35 AM   
Hi If you were planning to add all 3 K80s in there as well, then that's going to be pretty expensive to cool and run! (unless you have a good electricity rate where you are :-) ) I didn't try with a GeForce GPU / GeForce driver, but if others are saying that they have it working, then it must work. The GP100 route would still be my recommendation, I just think longer term overall it will be better for you and if you wanted more performance at some point in the future, then you could purchase a second one and run them in NVLink. The route you're going down with this setup is very limited and will be expensive to run due to the K80(s). Assuming you have it installed, you could try uninstalling the existing GeForce driver, make sure to install the RTX into the primary PCIe slot as this is what the system will be using for most of the time (as it will be providing the visuals). Install the K80 in the second PCIe slot, boot the system and if possible install the latest NVIDIA driver with the Clean Install box checked just to make sure it's a clean install. I can't think of anything else at this point, as your system gets past POST and starts booting. Try that and see how you get on Regards MG
Hi

If you were planning to add all 3 K80s in there as well, then that's going to be pretty expensive to cool and run! (unless you have a good electricity rate where you are :-) )

I didn't try with a GeForce GPU / GeForce driver, but if others are saying that they have it working, then it must work. The GP100 route would still be my recommendation, I just think longer term overall it will be better for you and if you wanted more performance at some point in the future, then you could purchase a second one and run them in NVLink. The route you're going down with this setup is very limited and will be expensive to run due to the K80(s).

Assuming you have it installed, you could try uninstalling the existing GeForce driver, make sure to install the RTX into the primary PCIe slot as this is what the system will be using for most of the time (as it will be providing the visuals). Install the K80 in the second PCIe slot, boot the system and if possible install the latest NVIDIA driver with the Clean Install box checked just to make sure it's a clean install. I can't think of anything else at this point, as your system gets past POST and starts booting.

Try that and see how you get on

Regards

MG

#4
Posted 05/13/2020 07:54 AM   
Scroll To Top

Add Reply