Skip to content

PC 1061: Return error on pat exchange failure#361

Merged
sandipanpanda merged 1 commit into
mainfrom
pc-1061-fail-zxp-helm
Apr 23, 2026
Merged

PC 1061: Return error on pat exchange failure#361
sandipanpanda merged 1 commit into
mainfrom
pc-1061-fail-zxp-helm

Conversation

@sandipanpanda
Copy link
Copy Markdown
Contributor

@sandipanpanda sandipanpanda commented Apr 23, 2026

Fixing zxporter to crash-loop on PAT exchange failure

📚 Description of Changes

Provide an overview of your changes and why they’re needed. Link to any related issues (e.g., "Fixes #123"). If your PR fixes a bug, resolves a feature request, or updates documentation, please explain how.

  • What Changed:
    (Describe the modifications, additions, or removals.)

  • Why This Change:
    (Explain the problem this PR addresses or the improvement it provides.)
    Before PAT exchange on hitting cluster limit would fail silently

  • Affected Components:
    (Which component does this change affect? - put x for all components)

  • Compose

  • K8s

  • Other (please specify)

❓ Motivation and Context

Why is this change required? What problem does it solve?

  • Context:
    (Provide background information or link to related discussions/issues.)

  • Relevant Tasks/Issues:
    (e.g., Fixes: #GitHub Issue)

🔍 Types of Changes

Indicate which type of changes your code introduces (check all that apply):

  • BUGFIX: Non-breaking fix for an issue.
  • NEW FEATURE: Non-breaking addition of functionality.
  • BREAKING CHANGE: Fix or feature that causes existing functionality to not work as expected.
  • ENHANCEMENT: Improvement to existing functionality.
  • CHORE: Changes that do not affect production (e.g., documentation, build tooling, CI).

🔬 QA / Verification Steps

Describe the steps a reviewer should take to verify your changes:

  1. (Step one: e.g., "Run make test to verify all tests pass.")
  2. (Step two: e.g., "Deploy to a Kind cluster with make create-kind && make deploy.")
  3. (Additional steps as needed.)

✅ Global Checklist

Please check all boxes that apply:

  • I have read and followed the CONTRIBUTING guidelines.
  • My code follows the code style of this project.
  • I have updated the documentation as needed.
  • I have added tests that cover my changes.
  • All new and existing tests have passed locally.
  • I have run this code in a local environment to verify functionality.
  • I have considered the security implications of this change.

Summary by Gitar

  • Code Refactoring:
    • Changed K8sClient type from *kubernetes.Clientset to kubernetes.Interface for better abstraction.
    • Added dakrClientFactory to EnvBasedController to allow dependency injection of the client.

This will update automatically on new commits.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 23, 2026

CI failed: The test suite failed because the controller manager pod is crashing during startup, likely due to an unhandled or fatal error introduced by the recent changes in PAT exchange logic.

Overview

The 'Test Metrics Server Lifecycle' test is failing because the controller-manager pod is unable to pass its readiness and liveness probes. Analysis indicates that the binary is crashing before it can bind to the health check port, strongly suggesting a regression in the initialization logic introduced by the PR.

Failures

Controller Manager Readiness/Liveness Probe Failure (confidence: medium)

  • Type: test
  • Affected jobs: 72702735189
  • Related to change: yes
  • Root cause: The controller manager is exiting prematurely during startup. This correlates with the changes in internal/controller/custom.go where a new error-handling path for PAT exchange was introduced; it is likely triggering a fatal exit that prevents the manager from serving health probes.
  • Suggested fix: Review the initialization flow in internal/controller/custom.go. Verify if the PAT exchange failure is causing a panic or log.Fatal that interrupts the manager's lifecycle. Ensure that errors in this block are handled gracefully so the manager can continue to initialize or enter a managed error state instead of crashing.

Summary

  • Change-related failures: 1, specifically the regression in controller initialization causing probe failures.
  • Infrastructure/flaky failures: 0.
  • Recommended action: Audit the new error handling in internal/controller/custom.go for any premature exit signals and ensure they do not halt the controller manager's startup sequence.
Code Review ✅ Approved

Adds explicit error handling for PAT exchange failures, ensuring the application properly propagates authentication errors. No issues found.

Tip

Comment Gitar fix CI to trigger a fix.

Was this helpful? React with 👍 / 👎 | Gitar

@sandipanpanda sandipanpanda merged commit aaa917e into main Apr 23, 2026
26 of 28 checks passed
@sandipanpanda sandipanpanda deleted the pc-1061-fail-zxp-helm branch April 23, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants