Skip to content

Conversation

@k-candidate
Copy link

@k-candidate k-candidate commented Sep 27, 2025

Summary

Update Amazon Linux 2023 GPU AMI build script to use CDI for NVIDIA GPU support and native Docker runtime registration.
This should resolve #483

Implementation details

Testing

New tests cover the changes: N/A

Description for the changelog

Switch AL2023 GPU AMI to CDI-based NVIDIA runtime registration

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k-candidate k-candidate requested a review from a team as a code owner September 27, 2025 22:08
@k-candidate k-candidate changed the title Cdi Switch AL2023 GPU AMI from legacy OCI hooks to CDI Sep 27, 2025
@harishxr
Copy link
Contributor

harishxr commented Oct 1, 2025

Hi @k-candidate ,

Thanks for submitting this PR, we are tracking this issue internally and will pick this up shortly.

@k-candidate
Copy link
Author

@harishxr version 1.18.0 of nvidia-container-toolkit is out: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.18.0. It takes care of CDI.

@k-candidate k-candidate closed this Nov 8, 2025
@sparrc
Copy link
Contributor

sparrc commented Nov 10, 2025

Hi @k-candidate, why have you closed this PR? Wouldn't we still need to remove oci-add-hooks in the case that we upgrade to nvidia-container-toolkit 1.18?

@k-candidate k-candidate reopened this Dec 2, 2025
@k-candidate
Copy link
Author

Hi @k-candidate, why have you closed this PR? Wouldn't we still need to remove oci-add-hooks in the case that we upgrade to nvidia-container-toolkit 1.18?

Hi @sparrc
I interpreted the comment of harishxr ("we are tracking this issue internally and will pick this up shortly") as he'll take care of it independently of this PR.
I have re-opened this PR.
Please let me know if you need something from my side to help.

Thank you.

@k-candidate
Copy link
Author

As per #483 (comment), the ECS team is waiting for the AL2023 team to take action.
@danehlim, I have created an issue with the AL2023 team to expedite this change. See amazonlinux/amazon-linux-2023#1044.
One alternative is to not depend on the AL2023 team, given that this PR has been open for 3 months. I can install nvidia-container-toolkit 1.18.1 in this PR. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use CDI mode for the AL2023 GPU ECS-optimized AMI

3 participants