Improve helm prereqs setup, preflight, and health checks#1389
Open
benhuntley wants to merge 1 commit intoNVIDIA:mainfrom
Open
Improve helm prereqs setup, preflight, and health checks#1389benhuntley wants to merge 1 commit intoNVIDIA:mainfrom
benhuntley wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Add a more flexible setup flow for helm-prereqs deployments:
- Add setup.sh flags for skip-core, skip-rest, custom core values,
custom MetalLB config, site overlay application, and debug tracing.
- Make registry pull credentials optional so public, preloaded, or
externally managed imagePullSecrets can be used.
- Create imagepullsecret only when REGISTRY_PULL_SECRET is set, derive
the registry server from NCX_IMAGE_REGISTRY, and support custom
REGISTRY_PULL_USERNAME.
- Skip REST-specific env and repo checks when --skip-rest is used, and
skip Core image tag validation when --skip-core is used.
- Expand preflight validation for custom MetalLB manifests or kustomize
dirs, referenced imagePullSecrets, cluster reachability, schedulable
nodes, per-node sysctls, and per-node DNS.
- Sync Vault AppRole credentials through Vault KV and External Secrets
instead of patching the Kubernetes Secret from a hook sidecar.
- Wait for PostgreSQL credentials and Vault AppRole credentials before
deploying NCX Core.
- Harden Vault unseal handling and setup failure cleanup prompts.
- Add health-check.sh for post-install validation of prereqs, Vault,
PostgreSQL, Core pods, PKI, ExternalSecrets, LoadBalancer VIPs, DNS,
NTP, and in-cluster service connectivity.
- Update helm-prereqs documentation for the new setup modes, optional
pull secret contract, custom values/config flags, and health check.
Signed-off-by: Ben Huntley <bhuntley@nvidia.com>
a73296c to
7f32e1c
Compare
ajf
requested changes
May 8, 2026
| {{- if .Values.externalSecrets.enabled }} | ||
| ## ============================================================================= | ||
| ## forge-roots — sync site-root CA cert to all carbide-managed namespaces. | ||
| ## forge-roots — sync site-root CA cert to all NCX-managed namespaces. |
Collaborator
There was a problem hiding this comment.
NCX Infra Controller became NVIDIA infra controller - so maybe just infra controller makes sense here? There's a few NCX references in this PR that should get updated where possible (some are breaking changes - those can be done more methodically).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a more flexible setup flow for helm-prereqs deployments:
Description
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes