Skip to content
This repository was archived by the owner on Oct 27, 2023. It is now read-only.
This repository was archived by the owner on Oct 27, 2023. It is now read-only.

Running nvidia-container-runtime with podman is blowing up. #85

@rhatdan

Description

@rhatdan
  1. Issue or feature description
    rootless and rootful podman does not work with the nvidia plugin

  2. Steps to reproduce the issue
    Install the nvidia plugin, configure it to run with podman
    execute the podman command and check if the devices is configured correctly.

  3. Information to attach (optional if deemed irrelevant)

    Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    Kernel version from uname -a
    Fedora 30 and later
    Any relevant kernel output lines from dmesg
    Driver information from nvidia-smi -a
    Docker version from docker version
    NVIDIA packages version from dpkg -l 'nvidia' or rpm -qa 'nvidia'
    NVIDIA container library version from nvidia-container-cli -V
    NVIDIA container library logs (see troubleshooting)
    Docker command, image and tag used

I am reporting this based on other users complaining. This is what they said.

We discovered that the ubuntu 18.04 machine needed a configuration change to get rootless working with nvidia:
"no-cgroups = true" was set in /etc/nvidia-container-runtime/config.toml
Unfortunately this config change did not work on Centos 7, but it did change the rootless error to:
nvidia-container-cli: initialization error: cuda error: unknown error\\n\"""

This config change breaks podman running from root, with the error:
Failed to initialize NVML: Unknown Error

Interestingly, root on ubuntu gets the same error even though rootless works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions