TopoLVM needs startup buffer to wait on dns pod startup#104
TopoLVM needs startup buffer to wait on dns pod startup#104ggiguash merged 4 commits intomicroshift-io:mainfrom
Conversation
WalkthroughPatch TopoLVM manifests: adjust controller liveness/readiness probe timing and add startupProbe; add startupProbe to topolvm-node DaemonSet; ensure Deployment replicas remain set to 1 and include new Changes
Sequence Diagram(s)sequenceDiagram
participant KubeAPI as Kubernetes API
participant ControllerPod as topolvm-controller Pod
participant NodePod as topolvm-node Pod
participant Kubelet as Kubelet
rect rgb(235,245,255)
KubeAPI->>ControllerPod: Create Pod (Deployment replicas=1)
Kubelet->>ControllerPod: startupProbe GET /healthz (period=60s, timeout=3s, failures=3)
alt startup succeeds
Kubelet->>ControllerPod: readinessProbe GET /healthz (period=60s, timeout=3s, failures=3)
Kubelet->>ControllerPod: livenessProbe (failureThreshold=3)
else startup fails
Kubelet->>KubeAPI: Restart Pod
end
end
rect rgb(245,255,235)
KubeAPI->>NodePod: Create Pod (DaemonSet)
Kubelet->>NodePod: startupProbe GET /healthz (period=2s, timeout=3s, failures=60)
alt startup succeeds
Kubelet->>NodePod: readiness/liveness probes (existing)
else startup fails
Kubelet->>KubeAPI: Restart Pod
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (1)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/topolvm/assets/03-topolvm.yaml (3)
989-1002: Consider adding startupProbe to topolvm-controller.The topolvm-controller now has an enhanced livenessProbe and readinessProbe, but lacks a startupProbe. If it also depends on DNS during startup (like topolvm-node), it may benefit from a similar startup grace period to prevent early backoff loops.
767-774: lvmd container: Consider adding startupProbe for consistency.The lvmd container still uses the older pattern (initialDelaySeconds) without a startupProbe. If it shares DNS dependencies with topolvm-node, adding a startupProbe would improve consistency and resilience during early pod initialization.
885-891: csi-registrar: Old probe pattern retained.The csi-registrar container still uses
initialDelaySeconds: 10without a startupProbe. If this is intentional (e.g., csi-registrar has no DNS dependency), document the rationale. Otherwise, align it with the modern probe pattern for consistency.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/topolvm/assets/03-topolvm.yaml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: quick-start-and-clean
- GitHub Check: ubuntu-rpm2deb
- GitHub Check: fedora-bootc
- GitHub Check: isolated-network-kindnet
- GitHub Check: isolated-network-ovnk
- GitHub Check: centos10-bootc
- GitHub Check: centos9-bootc
🔇 Additional comments (2)
src/topolvm/assets/03-topolvm.yaml (2)
850-856: Good startup grace period for DNS initialization.The startupProbe configuration provides a reasonable 120-second window (60 failures × 2-second period) to allow DNS pod readiness. The httpGet probe on /healthz is consistent with the livenessProbe endpoint.
843-849: Verify endpoint path change for livenessProbe.The livenessProbe now explicitly uses
/healthzpath and addsfailureThreshold: 3. Confirm this endpoint exists and behaves as expected during normal container operation (not just startup).
|
@copejon , we have generate_manifests.sh script for TopoLVM manifest generation. |
b6dc853 to
150c847
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/topolvm/generate_manifests.sh (1)
73-89: Remove duplicate kustomization.yaml generation.Lines 73-80 and 82-89 generate identical content to the same file; the second write is redundant. Remove the duplicate block.
cat >"${ASSETS_DIR}/kustomization.yaml" <<'EOF' apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - 01-namespace.yaml - 02-cert-manager.yaml - 03-topolvm.yaml EOF - - cat >"${ASSETS_DIR}/kustomization.yaml" <<'EOF' -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization -resources: - - 01-namespace.yaml - - 02-cert-manager.yaml - - 03-topolvm.yaml -EOF
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/topolvm/generate_manifests.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: isolated-network-kindnet
- GitHub Check: isolated-network-ovnk
- GitHub Check: centos10-bootc
- GitHub Check: centos9-bootc
- GitHub Check: ubuntu-rpm2deb
- GitHub Check: fedora-bootc
🔇 Additional comments (1)
src/topolvm/generate_manifests.sh (1)
40-55: Verify yq syntax produces correct manifests.The topolvm-controller patch uses chained pipe assignments; confirm the generated manifest reflects all probe configuration changes as expected.
150c847 to
0f5f022
Compare
0f5f022 to
8be40e4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/topolvm/generate_manifests.sh (1)
70-87: Duplicate kustomization.yaml generation.Lines 71–78 and 80–87 generate identical kustomization.yaml files; the second write overwrites the first. Remove the duplicate (lines 80–87).
Apply this diff:
cat >"${ASSETS_DIR}/kustomization.yaml" <<'EOF' apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - 01-namespace.yaml - 02-cert-manager.yaml - 03-topolvm.yaml EOF - - cat >"${ASSETS_DIR}/kustomization.yaml" <<'EOF' -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization -resources: - - 01-namespace.yaml - - 02-cert-manager.yaml - - 03-topolvm.yaml -EOF
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/topolvm/generate_manifests.sh(1 hunks)
🔇 Additional comments (2)
src/topolvm/generate_manifests.sh (2)
41-55: Well-structured probe configuration for DNS initialization delay.The startup and readiness probe tuning (failureThreshold=3, periodSeconds=60) provides a ~180-second window for topolvm-controller to initialize, addressing the PR objective of preventing DNS timing issues. The
with()syntax correctly targets the specific container and applies all mutations in one operation.
57-68: Verify DaemonSet startup probe settings align with controller probe logic.The topolvm-node DaemonSet startupProbe uses a longer total window (failureThreshold=60 × periodSeconds=2 ≈ 120 seconds) compared to the controller. Confirm this asymmetry is intentional (e.g., node workloads may start faster) or if both should use the same retry window.
|
The failure in the |
Resolves #94
Topolvm pods need a startup grace period to allow the dns pod to become ready. Otherwise, topolvm pods fall into a backoff loop very early and can take up to half an hour to stabilize.
Summary by CodeRabbit