Artificial Intelligence Computing Leadership from NVIDIA
can't use vGPU Manager 8.x drivers with latest CentOS kernel
Hi, I can't use neither NVIDIA-vGPU-rhel-7.6-418.66.x86_64.rpm nor NVIDIA-vGPU-rhel-7.6-418.92.x86_64.rpm with the latest CentOS kernel (kernel-3.10.0-957.27.2.el7.x86_64). weak-modules complains about symbol versions (when enabling "set -x" in weak-modules): ++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_pin_pages ++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_unpin_pages ++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_register_notifier ++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_unregister_notifier I only debugged this with 418.92, but it must have been the same with 418.66 (no symlinks created in /lib/modules/3.10.0-957.27.2.el7.x86_64/weak-updates/ directory). kernel 3.10.0-957.21.3.el7.x86_64 is OK. my hardware is 18:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1) 3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1) 86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1) af:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1) what can I do (besides booting an older kernel)? thx matthias
Hi,

I can't use neither NVIDIA-vGPU-rhel-7.6-418.66.x86_64.rpm nor NVIDIA-vGPU-rhel-7.6-418.92.x86_64.rpm with the latest CentOS kernel (kernel-3.10.0-957.27.2.el7.x86_64). weak-modules complains about symbol versions (when enabling "set -x" in weak-modules):

++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_pin_pages
++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_unpin_pages
++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_register_notifier
++ echo depmod: WARNING: /tmp/weak-modules.qqZEox/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko disagrees about version of symbol vfio_unregister_notifier

I only debugged this with 418.92, but it must have been the same with 418.66 (no symlinks created in /lib/modules/3.10.0-957.27.2.el7.x86_64/weak-updates/ directory).
kernel 3.10.0-957.21.3.el7.x86_64 is OK.
my hardware is
18:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
af:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)

what can I do (besides booting an older kernel)?

thx
matthias

#1
Posted 08/16/2019 05:28 PM   
Scroll To Top

Add Reply