Skip to content

[Bug]: #1743

@NASANAUT

Description

@NASANAUT

Describe the bug

CUDA initialization fails inside containers only, while the same host can use NVENC successfully outside containers.

The key symptom is:

  • nvidia-smi works inside the container
  • FFmpeg inside the container shows h264_nvenc and cuda
  • but actual CUDA initialization fails with:
cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error

I first noticed this with Plex in Docker, but I can reproduce the same failure with a plain FFmpeg container, so this does not appear to be Plex-specific.


To Reproduce

  1. Verify the host GPU works:
nvidia-smi -L
  1. Verify host FFmpeg NVENC works:
ffmpeg -hide_banner -encoders | grep nvenc
ffmpeg -hide_banner -decoders | grep cuvid
ffmpeg -hide_banner -hwaccels
ffmpeg -hide_banner -f lavfi -i testsrc=size=1280x720:rate=30 -t 5 -c:v h264_nvenc -f null -

On the host, this completes successfully.

  1. Verify basic GPU visibility in containers:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
  1. Reproduce the failure in a plain FFmpeg container:
docker run --rm \
  --runtime=nvidia \
  --gpus all \
  --entrypoint /bin/bash \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \
  linuxserver/ffmpeg:latest \
  -lc '
    echo "=== NVIDIA-SMI ==="
    nvidia-smi || true
    echo
    echo "=== ENCODERS (nvenc) ==="
    ffmpeg -hide_banner -encoders | grep nvenc || true
    echo
    echo "=== HWACCELS ==="
    ffmpeg -hide_banner -hwaccels || true
    echo
    echo "=== NVENC TEST ==="
    ffmpeg -hide_banner -f lavfi -i testsrc=size=1280x720:rate=30 -t 5 -c:v h264_nvenc -f null - || true
  '
  1. Observe the failure:
[h264_nvenc @ ...] dl_fn->cuda_dl->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error

Expected behavior

I expect CUDA initialization to succeed in the container, just like it does on the host.

Specifically, I expect the FFmpeg NVENC test inside the container to complete successfully instead of failing at cuInit(0).


Environment (please provide the following information):

  • nvidia-container-toolkit version: 1.19.0
  • NVIDIA Driver Version: 590.48.01
  • Host OS: Arch Linux
  • Kernel Version: 6.14.0-arch1-1
  • Container Runtime Version: Docker with nvidia-container-runtime configured
  • CPU Architecture: x86_64
  • GPU Model(s): NVIDIA RTX 2000 Ada Generation (AD107GL)
  • CUDA Version: 13.1 (as reported by nvidia-smi)

Not applicable:

  • Kubernetes Distro and Version
  • NVIDIA GPU Operator version

Additional context

I originally saw this with Plex in Docker. Plex first failed with Cannot load libcuda.so.1, but after simplifying the setup and using the NVIDIA libraries already present in /usr/lib inside the container, Plex can now load libcuda.so.1.

However, Plex then fails at the same point as the plain FFmpeg container:

DEBUG - [GPU] Got device: NVIDIA AD107GL [RTX 2000 / 2000E Ada Generation], nvidia@unknown, default true, best true, ID 10de:28b0:10de:1870@0000:15:00.0, DevID [10de:28b0:10de:1870], flags 0xe8
ERROR - [GPU] Cuda function failed with error unknown error
DEBUG - [Req#e0/Transcode] [FFMPEG] - Loaded lib: libcuda.so.1
ERROR - [Req#e0/Transcode] [FFMPEG] -  -> CUDA_ERROR_UNKNOWN: unknown error

So the important point is that this is reproducible outside Plex as well.

What currently works:

  • host nvidia-smi
  • host FFmpeg NVENC
  • container nvidia-smi
  • container reports h264_nvenc
  • container reports cuda in ffmpeg -hwaccels

What fails:

  • container FFmpeg actual CUDA initialization (cuInit(0))
  • Plex hardware transcoding in the container for the same reason

Information to attach

Output of nvidia-smi

Host:

GPU 0: NVIDIA RTX 2000 Ada Generation (UUID: GPU-8de1bc21-ab8b-71ba-c6af-9f5895aaabaf)

Container:

Tue Mar 24 08:33:14 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:15:00.0 Off |                  Off |
+-----------------------------------------------------------------------------------------+

Container logs

FFmpeg container output:

=== NVIDIA-SMI ===
Tue Mar 24 08:33:14 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:15:00.0 Off |                  Off |
+-----------------------------------------------------------------------------------------+

=== ENCODERS (nvenc) ===
 V....D av1_nvenc
 V....D h264_nvenc
 V....D hevc_nvenc

=== HWACCELS ===
Hardware acceleration methods:
vdpau
cuda
vaapi
qsv
drm
opencl
vulkan

=== NVENC TEST ===
[h264_nvenc @ 0x55ceb1999d40] dl_fn->cuda_dl->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[vost#0:0/h264_nvenc @ 0x55ceb1999740] [enc:h264_nvenc @ 0x55ceb1999cc0] Error while opening encoder - maybe incorrect parameters such as bit_rate, rate, width or height.
[out#0/null @ 0x55ceb19993c0] Nothing was written into output file, because at least one of its streams received no packets.
Conversion failed!

Additional diagnostic information

nvidia-ctk / CDI:

NVIDIA Container Toolkit CLI version 1.19.0

INFO[0000] Found 3 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=GPU-8de1bc21-ab8b-71ba-c6af-9f5895aaabaf
nvidia.com/gpu=all

Docker runtime info:

Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc

/etc/docker/daemon.json:

{
    "data-root": "/srv/docker",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

/etc/nvidia-container-runtime/config.toml relevant parts:

disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
environment = []
ldconfig = "@/sbin/ldconfig"
load-kmods = true

[nvidia-container-runtime]
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a bugneeds-triageissue or PR has not been assigned a priority-px label

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions