Epic: Support RPM-Based Distributions (Rocky Linux, Fedora, RHEL)
Summary
Extend Holodeck's support to include RPM-based Linux distributions such as Rocky Linux, Fedora, and RHEL. This involves adapting all provisioning templates to use DNF/YUM package managers and ensuring compatibility with RPM-based system conventions.
Motivation
Current Holodeck templates are Debian/Ubuntu focused, using apt-get and Debian-specific paths. Many enterprise environments use RHEL-based distributions:
- Rocky Linux: Free RHEL-compatible enterprise Linux
- Fedora: Cutting-edge features, good for testing newer software
- RHEL/CentOS Stream: Enterprise deployments
- Amazon Linux 2023: AWS-optimized, uses DNF
Supporting RPM distributions enables:
- Enterprise environment testing
- AWS-native Amazon Linux support
- Broader user base coverage
- Testing across different package ecosystems
Scope
Target Distributions
| Distribution |
Priority |
Package Manager |
Notes |
| Rocky Linux 9 |
High |
DNF |
RHEL 9 compatible |
| Rocky Linux 8 |
Medium |
DNF |
RHEL 8 compatible |
| Fedora 40+ |
Medium |
DNF |
Latest features |
| Amazon Linux 2023 |
High |
DNF |
AWS default |
| RHEL 9 |
Low |
DNF |
License required |
Components to Adapt
- Common functions and utilities
- NVIDIA Driver installation
- Container runtime installation (containerd, Docker, CRI-O)
- NVIDIA Container Toolkit installation
- Kubernetes installation (kubeadm)
Subtasks
Phase 1: Infrastructure
Phase 2: Common Functions (RPM)
Phase 3: NVIDIA Driver (RPM)
Phase 4: Container Runtime - containerd (RPM)
Phase 5: Container Runtime - Docker (RPM)
Phase 6: Container Runtime - CRI-O (RPM)
Phase 7: NVIDIA Container Toolkit (RPM)
Phase 8: Kubernetes (RPM)
Phase 9: Firewall Configuration
Phase 10: Testing
Phase 11: Documentation
Example Configurations
Rocky Linux 9
apiVersion: holodeck.nvidia.com/v1alpha1
kind: Environment
metadata:
name: rocky-gpu
spec:
provider: aws
auth:
keyName: my-key
publicKey: ~/.ssh/id_rsa.pub
privateKey: ~/.ssh/id_rsa
instance:
type: g4dn.xlarge
region: us-west-2
os: rocky-9
nvidiaDriver:
install: true
containerRuntime:
install: true
name: containerd
nvidiaContainerToolkit:
install: true
enableCDI: true
kubernetes:
install: true
installer: kubeadm
version: v1.31.1
Amazon Linux 2023
apiVersion: holodeck.nvidia.com/v1alpha1
kind: Environment
metadata:
name: al2023-gpu
spec:
provider: aws
auth:
keyName: my-key
publicKey: ~/.ssh/id_rsa.pub
privateKey: ~/.ssh/id_rsa
instance:
type: g4dn.xlarge
region: us-west-2
os: amazon-linux-2023
nvidiaDriver:
install: true
containerRuntime:
install: true
name: containerd
Technical Considerations
Package Manager Differences
| Feature |
APT (Debian) |
DNF (RHEL) |
| Update cache |
apt-get update |
dnf makecache |
| Install |
apt-get install -y |
dnf install -y |
| Add repo |
add-apt-repository |
dnf config-manager --add-repo |
| GPG keys |
apt-key add |
Built into repo file |
SELinux
- Default enabled on RHEL/Rocky/Fedora
- May need to be set to permissive for some components
- Long-term: Proper SELinux policies for GPU workloads
Firewall
firewalld vs ufw
- Port opening differs between systems
- Consider documenting required ports
Acceptance Criteria
Dependencies
Labels
feature compatibility linux-support
Epic: Support RPM-Based Distributions (Rocky Linux, Fedora, RHEL)
Summary
Extend Holodeck's support to include RPM-based Linux distributions such as Rocky Linux, Fedora, and RHEL. This involves adapting all provisioning templates to use DNF/YUM package managers and ensuring compatibility with RPM-based system conventions.
Motivation
Current Holodeck templates are Debian/Ubuntu focused, using
apt-getand Debian-specific paths. Many enterprise environments use RHEL-based distributions:Supporting RPM distributions enables:
Scope
Target Distributions
Components to Adapt
Subtasks
Phase 1: Infrastructure
Implement OS family detection
Implement template variant selection
Phase 2: Common Functions (RPM)
Phase 3: NVIDIA Driver (RPM)
Implement RPM driver template
Handle distribution-specific paths
rhel9rhel9(compatible)fedora39or similarPhase 4: Container Runtime - containerd (RPM)
Phase 5: Container Runtime - Docker (RPM)
Phase 6: Container Runtime - CRI-O (RPM)
Phase 7: NVIDIA Container Toolkit (RPM)
Phase 8: Kubernetes (RPM)
Implement RPM kubeadm template
Handle SELinux considerations
Phase 9: Firewall Configuration
Phase 10: Testing
Create RPM-based test environments
Integration tests per distribution
E2E tests
Phase 11: Documentation
Update prerequisites
Create distribution-specific guides
Update examples
Example Configurations
Rocky Linux 9
Amazon Linux 2023
Technical Considerations
Package Manager Differences
apt-get updatednf makecacheapt-get install -ydnf install -yadd-apt-repositorydnf config-manager --add-repoapt-key addSELinux
Firewall
firewalldvsufwAcceptance Criteria
os: rocky-9auto-resolves AMI and usernameDependencies
Labels
featurecompatibilitylinux-support