Epic: Provision Core Dependencies from Multiple Sources
Summary
Extend Holodeck's provisioning capabilities to install all core dependencies from multiple sources:
(a) Distribution packages (current behavior - default)
(b) A specific git reference (commit, branch, or tag)
(c) A moving "latest" alias tracking a branch (e.g., main)
Scope
This epic covers flexible installation for:
NVIDIA Driver - Support for different branches, runfile installers, or package versions
Container Runtime - containerd, Docker, CRI-O from specific versions or source
Kubernetes - kubeadm/kubelet/kubectl from specific commits or versions
NVIDIA Container Toolkit - Covered in separate epic [Epic] NVIDIA Container Toolkit Installation from Multiple Sources #566
Motivation
Testing GPU infrastructure requires validating different component combinations:
Driver validation : Test specific driver branches or versions for bug fixes
Runtime compatibility : Verify containerd/CRI-O HEAD against stable drivers
Kubernetes pre-release : Test alpha/beta Kubernetes features
Regression testing : Bisect issues across dependency versions
Reproducibility : Pin exact versions for consistent test environments
Proposed Schema
apiVersion : holodeck.nvidia.com/v1alpha1
kind : Environment
spec :
# NVIDIA Driver with source selection
nvidiaDriver :
install : true
source : package | runfile | git # default: package
package :
branch : " 560" # driver branch
version : " 560.35.03" # exact version (optional)
runfile :
url : https://download.nvidia.com/...driver.run
checksum : sha256:...
git :
repo : https://github.com/NVIDIA/open-gpu-kernel-modules.git
ref : refs/tags/560.35.03
# Container Runtime with source selection
containerRuntime :
install : true
name : containerd | docker | crio
source : package | git | latest # default: package
package :
version : " 1.7.23"
git :
repo : https://github.com/containerd/containerd.git
ref : refs/tags/v1.7.23
latest :
track : main
# Kubernetes with source selection
kubernetes :
install : true
installer : kubeadm | kind | microk8s
source : package | release | git # default: release (dl.k8s.io)
release :
version : v1.31.1
git :
repo : https://github.com/kubernetes/kubernetes.git
ref : refs/heads/master # test latest k8s
Subtasks
Phase 1: Common Infrastructure
Define generic source specification pattern
type SourceSpec struct {
Type SourceType `json:"source,omitempty"`
Package * PackageSourceSpec `json:"package,omitempty"`
Git * GitSourceSpec `json:"git,omitempty"`
Latest * LatestSourceSpec `json:"latest,omitempty"`
}
type GitSourceSpec struct {
Repo string `json:"repo,omitempty"`
Ref string `json:"ref"`
Build * BuildSpec `json:"build,omitempty"`
PreBuilt * PreBuiltSpec `json:"preBuilt,omitempty"`
}
Implement generic ref resolver
Reusable across all components
Support for GitHub, GitLab, and generic git repos
Cache resolved refs for efficiency
Implement generic build infrastructure
Common build environment setup
Go, C/C++ toolchain detection
Build artifact management
Component: NVIDIA Driver
Phase 2: Driver Schema
Extend NVIDIADriver spec
type NVIDIADriver struct {
Install bool `json:"install"`
Source DriverSource `json:"source,omitempty"` // package, runfile, git
// Package source (default)
Package * DriverPackageSpec `json:"package,omitempty"`
// Runfile source (manual installer)
Runfile * DriverRunfileSpec `json:"runfile,omitempty"`
// Git source (open-gpu-kernel-modules)
Git * DriverGitSpec `json:"git,omitempty"`
}
type DriverPackageSpec struct {
Branch string `json:"branch,omitempty"` // 560, 550, etc.
Version string `json:"version,omitempty"` // exact version
}
type DriverRunfileSpec struct {
URL string `json:"url"`
Checksum string `json:"checksum,omitempty"`
}
type DriverGitSpec struct {
Repo string `json:"repo,omitempty"`
Ref string `json:"ref"`
}
Phase 3: Driver Installation Paths
Component: Container Runtime
Phase 4: Runtime Schema
Phase 5: Containerd Installation Paths
Phase 6: Docker Installation Paths
Phase 7: CRI-O Installation Paths
Component: Kubernetes
Phase 8: Kubernetes Schema
Phase 9: Kubernetes Installation Paths
Phase 10: Provenance & Status
Phase 11: Testing
Phase 12: Documentation
Example Configurations
All Latest (Testing Bleeding Edge)
spec :
nvidiaDriver :
install : true
source : git
git :
ref : refs/heads/main # open-gpu-kernel-modules main
containerRuntime :
install : true
name : containerd
source : latest
latest :
track : main
nvidiaContainerToolkit :
install : true
source : latest
latest :
track : main
kubernetes :
install : true
source : git
git :
ref : refs/heads/master # k8s master
All Pinned (Reproducible Environment)
spec :
nvidiaDriver :
install : true
source : package
package :
version : " 560.35.03"
containerRuntime :
install : true
name : containerd
source : package
package :
version : " 1.7.20"
nvidiaContainerToolkit :
install : true
source : package
package :
version : " 1.17.3-1"
kubernetes :
install : true
source : release
release :
version : v1.31.1
Acceptance Criteria
Related Issues
Labels
feature dependency-management flexibility
Epic: Provision Core Dependencies from Multiple Sources
Summary
Extend Holodeck's provisioning capabilities to install all core dependencies from multiple sources:
main)Scope
This epic covers flexible installation for:
Motivation
Testing GPU infrastructure requires validating different component combinations:
Proposed Schema
Subtasks
Phase 1: Common Infrastructure
Define generic source specification pattern
Implement generic ref resolver
Implement generic build infrastructure
Component: NVIDIA Driver
Phase 2: Driver Schema
Phase 3: Driver Installation Paths
Package installation (enhanced)
Runfile installation
Open kernel modules build
Component: Container Runtime
Phase 4: Runtime Schema
Phase 5: Containerd Installation Paths
Package installation (enhanced)
apt-get install containerd.io=${VERSION}Git/release installation
Phase 6: Docker Installation Paths
Package installation (current)
Moby from source
Phase 7: CRI-O Installation Paths
Package installation (enhanced)
Source build
Component: Kubernetes
Phase 8: Kubernetes Schema
Phase 9: Kubernetes Installation Paths
Release installation (enhanced)
Git source installation
Kind from git
Phase 10: Provenance & Status
Track component sources in status
Display in CLI
Phase 11: Testing
Unit tests
Integration tests per component
E2E matrix
Phase 12: Documentation
Example Configurations
All Latest (Testing Bleeding Edge)
All Pinned (Reproducible Environment)
Acceptance Criteria
Related Issues
Labels
featuredependency-managementflexibility