NVIDIA GPU — Linux Admin Handbook

h2 id="overview">Overview

RHEL 10 supports NVIDIA GPUs through NVIDIA's official repositories and the RHEL Extensions repository. The recommended approach is to use Red Hat's precompiled open kernel modules from Extensions.

# Prerequisites: enable CodeReady Builder, Extensions, and EPEL
sudo subscription-manager repos --enable=codeready-builder-for-rhel-10-$(arch)-rpms
sudo subscription-manager repos --enable=rhel-10-for-$(arch)-extensions-rpms
sudo dnf install -y epel-release

# Add NVIDIA CUDA repository (x86_64)
# For aarch64, use sbsa/ instead of $(arch)/
CUDA_ARCH="$(uname -m)"
[ "$CUDA_ARCH" = "aarch64" ] && CUDA_ARCH="sbsa"
sudo dnf config-manager --add-repo "https://developer.download.nvidia.com/compute/cuda/repos/rhel10/${CUDA_ARCH}/cuda-rhel10.repo"

# Install NVIDIA driver with precompiled open kernel module (recommended)
sudo dnf install -y kmod-nvidia-open nvidia-driver-cuda

# Alternative: DKMS kernel modules (if precompiled not available for your kernel)
# kmod-nvidia-open-latest-dkms — open kernel module (DKMS)
# kmod-nvidia-latest-dkms      — proprietary closed-source (DKMS)

# Load kernel module and verify
sudo modprobe nvidia
sudo nvidia-smi

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel10/$(arch)/cuda-rhel10.repo sudo dnf clean all # Install CUDA toolkit sudo dnf install -y cuda-toolkit # Set environment echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

Persistence Mode

# Enable NVIDIA Persistence Daemon (faster driver load on first use)
sudo systemctl enable --now nvidia-persistenced

# Verify
systemctl status nvidia-persistenced

GPU Passthrough to KVM VMs

# 1. Identify GPU PCI address
lspci | grep -i nvidia
# Example: 01:00.0 VGA compatible controller: NVIDIA Corporation

# 2. Bind GPU to vfio-pci
echo 'options vfio-pci ids=10de:2231,10de:1aef' | sudo tee /etc/modprobe.d/vfio-pci.conf
sudo grubby --update-kernel=ALL --args='rd.driver.blacklist=nvidia modprobe.blacklist=nvidia vfio-pci.ids=10de:2231,10de:1aef'

# 3. Reboot and verify
sudo dmesg | grep vfio-pci

# 4. Attach to VM via virsh or virt-manager
# In XML: <hostdev mode='subsystem' type='pci' managed='yes'>
#   <source><address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
# </hostdev>

GPU in Containers

# Install nvidia-container-toolkit
sudo dnf install -y nvidia-container-toolkit

# Generate CDI device definitions
sudo nvidia-ctk cdi generate --output /etc/cdi/

# Enable CDI in podman config
echo 'cdi_enabled = true' | sudo tee -a /etc/containers/storage.conf

# Run container with GPU
podman run --rm --device=nvidia.com/gpu=all ubuntu nvidia-smi

# Specific GPU
podman run --rm --device=nvidia.com/gpu=0 ubuntu nvidia-smi

📚 Reference

CUDA Installation Guide · libnvidia-container