-
Notifications
You must be signed in to change notification settings - Fork 9
CP-35716: fix defaults.image.pullSecrets not applying to workload templates #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dmepham
reviewed
Jan 5, 2026
…plates While adding comprehensive per-type unit tests for defaults.* properties, discovered that defaults.image.pullSecrets was not being applied to most workload templates. Only config-loader-job.yaml and helmless-job.yaml were using the correct generateImagePullSecrets helper; six other templates used legacy helpers that did not fall back to defaults.image.pullSecrets. Functional Change: Before: Setting defaults.image.pullSecrets had no effect on agent-deploy, aggregator-deploy, agent-daemonset, webhook-deploy, backfill-job, or init-cert-job templates. Users had to set the deprecated top-level imagePullSecrets or configure each component individually. After: Setting defaults.image.pullSecrets applies to all workload templates. Backwards compatibility with the deprecated imagePullSecrets is preserved. Root Cause: Six templates were using legacy imagePullSecrets helpers that had different fallback chains: 1. cloudzero-agent.server.imagePullSecrets - only checked .Values.imagePullSecrets 2. cloudzero-agent.insightsController.server.imagePullSecrets - checked insightsController.server.imagePullSecrets -> imagePullSecrets 3. cloudzero-agent.initBackfillJob.imagePullSecrets - checked backFillValues.imagePullSecrets -> insightsController.server.imagePullSecrets -> imagePullSecrets 4. cloudzero-agent.initCertJob.imagePullSecrets - similar chain None of these helpers included defaults.image.pullSecrets in their fallback chain. Solution: 1. Updated generateImagePullSecrets helper (_helpers.tpl:1137-1142) to include deprecated imagePullSecrets as final fallback: .image.pullSecrets | default .root.Values.defaults.image.pullSecrets | default .root.Values.imagePullSecrets 2. Updated six templates to use generateImagePullSecrets: - agent-deploy.yaml (line 310) - aggregator-deploy.yaml (line 159) - agent-daemonset.yaml (line 190) - webhook-deploy.yaml (line 86) - backfill-job.yaml (line 122) - init-cert-job.yaml (line 54) 3. Also fixed aggregator-service.yaml to use generateLabels for consistency with other templates (removes app.kubernetes.io/component which was inconsistently applied across resources). Validation: - All 395 Helm unit tests pass (8 new tests added for defaults.image.pullSecrets) - New tests verify defaults.image.pullSecrets applies to Deployment, DaemonSet, Job, and CronJob resources - New tests verify backwards compatibility with deprecated imagePullSecrets - Manual helm template verification confirms imagePullSecrets renders correctly when defaults.image.pullSecrets is set
dmepham
approved these changes
Jan 5, 2026
evan-cz
added a commit
that referenced
this pull request
Jan 6, 2026
PR #606 code review identified that documentation contains outdated Kubernetes label selectors. The label system was refactored in CP-35429 (commit c7a0b6f) to follow Kubernetes recommended labels best practices, but documentation was not updated to reflect the new selector patterns. Functional Change: Before: Documentation examples used `app.kubernetes.io/component=server` combined with `app.kubernetes.io/name=cloudzero-agent` to query agent resources, which no longer matches the actual labels on deployed resources. After: Documentation examples use `app.kubernetes.io/part-of=cloudzero-agent` combined with `app.kubernetes.io/name=server` (or aggregator, webhook-server, etc.), which correctly matches the current label schema. Solution: 1. Updated helm/docs/troubleshooting-guide.md (~50 instances) - main troubleshooting documentation with extensive kubectl examples 2. Updated helm/docs/deploy-validation.md (6 instances) - deployment validation guide 3. Updated helm/docs/cert-trouble-shooting.md - example output showing new labels 4. Updated helm/docs/upgrades.md - job deletion command selector 5. Updated docs/wiki/Debugging-Guide.md (5 instances) - debugging procedures 6. Updated docs/wiki/Installation-FAQ.md (7 instances) - FAQ kubectl examples 7. Updated docs/wiki/CloudZero-Agent-Replicas-and-Resources.md (2 instances) 8. Updated app/domain/transform/dcgm/CLAUDE.md and README.md - aggregator examples 9. Updated app/functions/helmless/README.md - helmless job log retrieval 10. Deleted obsolete docs/wiki/Webhook-Server.md.bak backup file Label transformation pattern applied throughout: - `app.kubernetes.io/component=server,app.kubernetes.io/name=cloudzero-agent` became `app.kubernetes.io/part-of=cloudzero-agent,app.kubernetes.io/name=server` - kube-state-metrics uses `app.kubernetes.io/name=kube-state-metrics` (sub-chart, no part-of label) Validation: - Verified all documentation files with grep for `app.kubernetes.io/component` - Confirmed remaining occurrences are intentional (label export config, test fixtures, _helpers.tpl source code) - No functional changes to code - documentation only
evan-cz
added a commit
that referenced
this pull request
Jan 6, 2026
PR #606 code review identified that documentation contains outdated Kubernetes label selectors. The label system was refactored in CP-35429 (commit c7a0b6f) to follow Kubernetes recommended labels best practices, but documentation was not updated to reflect the new selector patterns. Functional Change: Before: Documentation examples used `app.kubernetes.io/component=server` combined with `app.kubernetes.io/name=cloudzero-agent` to query agent resources, which no longer matches the actual labels on deployed resources. After: Documentation examples use `app.kubernetes.io/part-of=cloudzero-agent` combined with `app.kubernetes.io/name=server` (or aggregator, webhook-server, etc.), which correctly matches the current label schema. Solution: 1. Updated helm/docs/troubleshooting-guide.md (~50 instances) - main troubleshooting documentation with extensive kubectl examples 2. Updated helm/docs/deploy-validation.md (6 instances) - deployment validation guide 3. Updated helm/docs/cert-trouble-shooting.md - example output showing new labels 4. Updated helm/docs/upgrades.md - job deletion command selector 5. Updated docs/wiki/Debugging-Guide.md (5 instances) - debugging procedures 6. Updated docs/wiki/Installation-FAQ.md (7 instances) - FAQ kubectl examples 7. Updated docs/wiki/CloudZero-Agent-Replicas-and-Resources.md (2 instances) 8. Updated app/domain/transform/dcgm/CLAUDE.md and README.md - aggregator examples 9. Updated app/functions/helmless/README.md - helmless job log retrieval 10. Deleted obsolete docs/wiki/Webhook-Server.md.bak backup file Label transformation pattern applied throughout: - `app.kubernetes.io/component=server,app.kubernetes.io/name=cloudzero-agent` became `app.kubernetes.io/part-of=cloudzero-agent,app.kubernetes.io/name=server` - kube-state-metrics uses `app.kubernetes.io/name=kube-state-metrics` (sub-chart, no part-of label) Validation: - Verified all documentation files with grep for `app.kubernetes.io/component` - Confirmed remaining occurrences are intentional (label export config, test fixtures, _helpers.tpl source code) - No functional changes to code - documentation only
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 6, 2026
* CP-36332: Update docs to use new Kubernetes label selectors PR #606 code review identified that documentation contains outdated Kubernetes label selectors. The label system was refactored in CP-35429 (commit c7a0b6f) to follow Kubernetes recommended labels best practices, but documentation was not updated to reflect the new selector patterns. Functional Change: Before: Documentation examples used `app.kubernetes.io/component=server` combined with `app.kubernetes.io/name=cloudzero-agent` to query agent resources, which no longer matches the actual labels on deployed resources. After: Documentation examples use `app.kubernetes.io/part-of=cloudzero-agent` combined with `app.kubernetes.io/name=server` (or aggregator, webhook-server, etc.), which correctly matches the current label schema. Solution: 1. Updated helm/docs/troubleshooting-guide.md (~50 instances) - main troubleshooting documentation with extensive kubectl examples 2. Updated helm/docs/deploy-validation.md (6 instances) - deployment validation guide 3. Updated helm/docs/cert-trouble-shooting.md - example output showing new labels 4. Updated helm/docs/upgrades.md - job deletion command selector 5. Updated docs/wiki/Debugging-Guide.md (5 instances) - debugging procedures 6. Updated docs/wiki/Installation-FAQ.md (7 instances) - FAQ kubectl examples 7. Updated docs/wiki/CloudZero-Agent-Replicas-and-Resources.md (2 instances) 8. Updated app/domain/transform/dcgm/CLAUDE.md and README.md - aggregator examples 9. Updated app/functions/helmless/README.md - helmless job log retrieval 10. Deleted obsolete docs/wiki/Webhook-Server.md.bak backup file Label transformation pattern applied throughout: - `app.kubernetes.io/component=server,app.kubernetes.io/name=cloudzero-agent` became `app.kubernetes.io/part-of=cloudzero-agent,app.kubernetes.io/name=server` - kube-state-metrics uses `app.kubernetes.io/name=kube-state-metrics` (sub-chart, no part-of label) Validation: - Verified all documentation files with grep for `app.kubernetes.io/component` - Confirmed remaining occurrences are intentional (label export config, test fixtures, _helpers.tpl source code) - No functional changes to code - documentation only * Standardize namespace to cloudzero-agent in documentation Documentation used inconsistent namespace names (`cza`, `cz-agent`, `cz-webhook-test`, `cloudzero`) in kubectl command examples. This creates confusion for users following the documentation and may cause commands to fail if users copy them directly. Functional Change: Before: Documentation examples used various short namespace names like `-n cza`, `-n cz-agent`, `-n cz-webhook-test`, or `-n cloudzero` inconsistently across files. After: All documentation examples consistently use `-n cloudzero-agent`, which is the default namespace for the CloudZero Agent Helm chart installation. Solution: 1. Updated helm/DEVELOPMENT.md - HPA verification commands (2 instances) 2. Updated app/functions/helmless/README.md - ConfigMap extraction commands (2 instances) 3. Updated app/functions/certifik8s/README.md - ServiceAccount YAML examples (2 instances) 4. Updated tests/load/README.md - Load testing kubectl commands (8 instances) 5. Updated tests/testkube/README.md - TestKube debug commands (2 instances) 6. Updated tests/webhook/README.md - Webhook metrics port-forward (1 instance) Namespace transformation pattern applied: - `cza` -> `cloudzero-agent` - `cz-agent` -> `cloudzero-agent` - `cz-webhook-test` -> `cloudzero-agent` Validation: - Verified with grep that no remaining instances of `-n cza`, `-n cz-agent`, `-n cz-webhook-test`, or `-n cloudzero` (without -agent) exist in markdown files - External component namespaces intentionally preserved (cert-manager, kube-system, istio-system, monitoring, etc.) - No functional changes to code - documentation only
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While adding comprehensive per-type unit tests for defaults.* properties, discovered that defaults.image.pullSecrets was not being applied to most workload templates. Only config-loader-job.yaml and helmless-job.yaml were using the correct generateImagePullSecrets helper; six other templates used legacy helpers that did not fall back to defaults.image.pullSecrets.
Functional Change:
Before: Setting defaults.image.pullSecrets had no effect on agent-deploy, aggregator-deploy, agent-daemonset, webhook-deploy, backfill-job, or init-cert-job templates. Users had to set the deprecated top-level imagePullSecrets or configure each component individually.
After: Setting defaults.image.pullSecrets applies to all workload templates. Backwards compatibility with the deprecated imagePullSecrets is preserved.
Root Cause:
Six templates were using legacy imagePullSecrets helpers that had different fallback chains:
None of these helpers included defaults.image.pullSecrets in their fallback chain.
Solution:
Updated generateImagePullSecrets helper (_helpers.tpl:1137-1142) to include deprecated imagePullSecrets as final fallback: .image.pullSecrets | default .root.Values.defaults.image.pullSecrets | default .root.Values.imagePullSecrets
Updated six templates to use generateImagePullSecrets:
Also fixed aggregator-service.yaml to use generateLabels for consistency with other templates (removes app.kubernetes.io/component which was inconsistently applied across resources).
Validation: