NVIDIA
PSODs when using grid K1 cards in HP Proliant DL585 G7
Hello, I'm experiencing sever issues when trying to use an nVidia Grid K1 card in an HP Proliant DL585 G7 server. After a few seconds the system ends in a PSOD. Has anyone experienced similar issues? Unfortunely it is not possible to upload pictures in this forum so I have to find a way to share my screenshots. With best regards Augustinus
Hello,

I'm experiencing sever issues when trying to use an nVidia Grid K1 card in an HP Proliant DL585 G7 server. After a few seconds the system ends in a PSOD.

Has anyone experienced similar issues?

Unfortunely it is not possible to upload pictures in this forum so I have to find a way to share my screenshots.


With best regards

Augustinus

#1
Posted 04/07/2014 01:56 PM   
That server is not certified with the GRID cards, see this site for the current list of certified platforms: http://www.nvidia.com/buygrid My assumption is either heat or insufficient power is causing your PSOD, but could also be BIOS issues as well. Critical to use a certified server. There is an attachment option as well as the ability to insert images, check the wysiwyg menu at the top of the comment window.
That server is not certified with the GRID cards, see this site for the current list of certified platforms: http://www.nvidia.com/buygrid


My assumption is either heat or insufficient power is causing your PSOD, but could also be BIOS issues as well. Critical to use a certified server.

There is an attachment option as well as the ability to insert images, check the wysiwyg menu at the top of the comment window.

Regards,

Luke Wignall
Performance Engineering Manager
NVIDIA | Worldwide Sales ­ GRID Computing
http://www.linkedin.com/in/lukewignall/
https://twitter.com/lwignall

#2
Posted 04/07/2014 02:47 PM   
Hello Luke, Thanks a lot for your response. I guess it is a BIOS or driver issue (or both) and not regarding to power. The system is certified to host nVidia GPU accelerator cards with about the same power consumption (4 of them) and is equipped with 4 x 1200 Watts power supplies. The problem is, that I have to buy rack servers (Proliant DL series) and cannot buy blades or SL series server. And the DL380 does not offer enough free PCI slots as soon as the Grid cards are installed. If I understand the picture function correctly I need first to find a platform where I can publicly upload my image and them I post the URL in img-Tags. Even if there is no technical solution: If there is any roadmap behind the certification of the systems I kindly ask to also take 4 socket rack servers into account. Thank you very much. With kind regards Andreas
Hello Luke,

Thanks a lot for your response.

I guess it is a BIOS or driver issue (or both) and not regarding to power. The system is certified to host nVidia GPU accelerator cards with about the same power consumption (4 of them) and is equipped with 4 x 1200 Watts power supplies.

The problem is, that I have to buy rack servers (Proliant DL series) and cannot buy blades or SL series server. And the DL380 does not offer enough free PCI slots as soon as the Grid cards are installed.

If I understand the picture function correctly I need first to find a platform where I can publicly upload my image and them I post the URL in img-Tags.

Even if there is no technical solution: If there is any roadmap behind the certification of the systems I kindly ask to also take 4 socket rack servers into account.

Thank you very much.

With kind regards

Andreas

#3
Posted 04/16/2014 09:59 AM   
Working on a roadmap as we sort out the needs. Certainly we are aware of the need for greater server certification, but that is driven by the server OEM's so letting them know what you want will help all of us. But the DL585 is AMD, which we do not support. For this to work you will need to choose an Intel platform off the HCL (use the link I sent before). I honestly have never tried to upload so let me give it a try: [img]https://e7a1036379b04c088ec9-60920d95569283b5f852a2a26571d54f.ssl.cf2.rackcdn.com/0c524089597f7edf4c70d6d681c84e09[/img]
Working on a roadmap as we sort out the needs. Certainly we are aware of the need for greater server certification, but that is driven by the server OEM's so letting them know what you want will help all of us. But the DL585 is AMD, which we do not support. For this to work you will need to choose an Intel platform off the HCL (use the link I sent before).

I honestly have never tried to upload so let me give it a try:
Image

Regards,

Luke Wignall
Performance Engineering Manager
NVIDIA | Worldwide Sales ­ GRID Computing
http://www.linkedin.com/in/lukewignall/
https://twitter.com/lwignall

#4
Posted 04/16/2014 08:27 PM   
Well, that picture did not load correctly...
Well, that picture did not load correctly...

Regards,

Luke Wignall
Performance Engineering Manager
NVIDIA | Worldwide Sales ­ GRID Computing
http://www.linkedin.com/in/lukewignall/
https://twitter.com/lwignall

#5
Posted 04/16/2014 08:28 PM   
Dear Luke, yes, the server certification is a little bit sparse. Especially when it comes to the big server vendors which are usually settled in large enterprises. My problem is, that the certifications seem to be focused on blade systems and small 2 HU servers. But If I look to such a system I end up with 6-8 PCI slots (usually only 2-4 usable in full length) and If I have to add some 10 GE cards, FC HBAs the GRID card and maybe and APEX accelerator it get's really dense. Not to forget PCIe SSDs in the future. Even if you are not responsible for server certifications maybe you can give a hint to your partners that for flexibility some customers demand also certifications of the bigger systems. A DL58x series server comes with 12 PCI slots (4 of them PEG), at least 10 ussable in full length. What I don't understand is why you don't support servers running with AMD Opteron CPUs. I mean: What has the CPU to do with your graphics card. You write a driver which runs on top of the ESXi OS. The CPU should be of no matter for you. Even if you compete to them in some areas: There is no competition between NVIDIA and AMD in the area of x86-64 server CPUs. And Intel becomes your competitor as well. I mean: You can mange to get your Geforce cards running on every low-cost mainboard with a over-the-night written BIOS and any CPU you can imagine on several different operating systems. But you can't get your grid card running on a very limited subset of these? With best regards Andreas
Dear Luke,

yes, the server certification is a little bit sparse. Especially when it comes to the big server vendors which are usually settled in large enterprises.

My problem is, that the certifications seem to be focused on blade systems and small 2 HU servers. But If I look to such a system I end up with 6-8 PCI slots (usually only 2-4 usable in full length) and If I have to add some 10 GE cards, FC HBAs the GRID card and maybe and APEX accelerator it get's really dense. Not to forget PCIe SSDs in the future.
Even if you are not responsible for server certifications maybe you can give a hint to your partners that for flexibility some customers demand also certifications of the bigger systems. A DL58x series server comes with 12 PCI slots (4 of them PEG), at least 10 ussable in full length.

What I don't understand is why you don't support servers running with AMD Opteron CPUs. I mean: What has the CPU to do with your graphics card. You write a driver which runs on top of the ESXi OS. The CPU should be of no matter for you. Even if you compete to them in some areas: There is no competition between NVIDIA and AMD in the area of x86-64 server CPUs.
And Intel becomes your competitor as well.

I mean: You can mange to get your Geforce cards running on every low-cost mainboard with a over-the-night written BIOS and any CPU you can imagine on several different operating systems. But you can't get your grid card running on a very limited subset of these?

With best regards

Andreas

#6
Posted 04/25/2014 08:56 AM   
We do not certify the servers, the OEM's do. So while we push to speed this up and broaden the available platforms it is the manufacturers that decide on certification. We are actively working with HP and the rest to encourage them to bring more platforms onto the list. Keep in mind the list is growing at a very quick rate, we have gone from 1 Dell server approximately a year ago to the long list today. As for the focus on 2U and blade systems, this is also driven by the OEM's and I assume based on their sales in the VDI space. I know of one other customer using DL58x servers for VDI, and while I am sure there are others off my radar, the list of those demanding the 2U and blades is substantial. I would recommend letting your HP rep know and add to any pressure on them to add this platform.
We do not certify the servers, the OEM's do. So while we push to speed this up and broaden the available platforms it is the manufacturers that decide on certification. We are actively working with HP and the rest to encourage them to bring more platforms onto the list. Keep in mind the list is growing at a very quick rate, we have gone from 1 Dell server approximately a year ago to the long list today. As for the focus on 2U and blade systems, this is also driven by the OEM's and I assume based on their sales in the VDI space. I know of one other customer using DL58x servers for VDI, and while I am sure there are others off my radar, the list of those demanding the 2U and blades is substantial. I would recommend letting your HP rep know and add to any pressure on them to add this platform.

Regards,

Luke Wignall
Performance Engineering Manager
NVIDIA | Worldwide Sales ­ GRID Computing
http://www.linkedin.com/in/lukewignall/
https://twitter.com/lwignall

#7
Posted 04/26/2014 04:15 PM   
Scroll To Top

Add Reply