Skip to content

PC-902 Netmon heatlcheck + endpoints#303

Merged
mutantkeyboard merged 5 commits into
mainfrom
netmon_healthcheck
Feb 26, 2026
Merged

PC-902 Netmon heatlcheck + endpoints#303
mutantkeyboard merged 5 commits into
mainfrom
netmon_healthcheck

Conversation

@mutantkeyboard
Copy link
Copy Markdown
Contributor

@mutantkeyboard mutantkeyboard commented Feb 26, 2026

[Add health checks to zxporter-netmon]

📚 Description of Changes

  • Create the HealthManager (internal/health/manager.go) to track and manage the health status of all dakr components. Ensure thread-safe registration, status updates, and report building for component health.
  • We need to report the health of component levels in the dakr system. This should follow a similar approach to zxporter's health check.

Context
The current health check only provides basic information. We aim to enhance this by tracking other important components within the netmon system.

  • The issue involves replacing existing probes with health manager endpoints. The goal is to upgrade the /healthz and /readyz endpoints according to specific design logic.

Context
This task follows a similar approach used in zxporter. The implementation will focus on enhancing the health probes.

/healthz returns a 503 status when a fatal component fails.
/readyz indicates the readiness of configuration, collector, and transport.

  • Develop a goroutine in zxporter that builds a health report from HealthManager every 60 seconds and sends it to Dakr using the ReportHealth RPC. Handle RPC failures gracefully and log warnings as needed. (follow same approach through we use in zxporter)

  • What Changed:
    (Describe the modifications, additions, or removals.)

  • Why This Change:
    (Explain the problem this PR addresses or the improvement it provides.)

  • Affected Components:
    (Which component does this change affect? - put x for all components)

  • Compose

  • K8s

  • Other (please specify)

❓ Motivation and Context

Why is this change required? What problem does it solve?

🔍 Types of Changes

Indicate which type of changes your code introduces (check all that apply):

  • BUGFIX: Non-breaking fix for an issue.
  • NEW FEATURE: Non-breaking addition of functionality.
  • BREAKING CHANGE: Fix or feature that causes existing functionality to not work as expected.
  • ENHANCEMENT: Improvement to existing functionality.
  • CHORE: Changes that do not affect production (e.g., documentation, build tooling, CI).

🔬 QA / Verification Steps

Describe the steps a reviewer should take to verify your changes:

  1. (Step one: e.g., "Run make test to verify all tests pass.")
  2. (Step two: e.g., "Deploy to a Kind cluster with make create-kind && make deploy.")
  3. (Additional steps as needed.)

✅ Global Checklist

Please check all boxes that apply:

  • I have read and followed the CONTRIBUTING guidelines.
  • My code follows the code style of this project.
  • I have updated the documentation as needed.
  • I have added tests that cover my changes.
  • All new and existing tests have passed locally.
  • I have run this code in a local environment to verify functionality.
  • I have considered the security implications of this change.

Antonio Nesic added 2 commits February 26, 2026 10:32
Move HealthManager creation and health server startup to before
ctrl.NewManager() so K8s probes are answered immediately, preventing
connection refused and 503 errors during slow initialization.
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Feb 26, 2026

Important

Upgrade your plan to unlock code review, CI analysis, custom rules, and more.

@mutantkeyboard mutantkeyboard enabled auto-merge (squash) February 26, 2026 11:39
@mutantkeyboard mutantkeyboard merged commit cecd778 into main Feb 26, 2026
25 checks passed
@mutantkeyboard mutantkeyboard deleted the netmon_healthcheck branch February 26, 2026 17:04
Parthiba-Hazra pushed a commit that referenced this pull request May 5, 2026
* fix: start health server before manager init to prevent 503 on upgrades

Move HealthManager creation and health server startup to before
ctrl.NewManager() so K8s probes are answered immediately, preventing
connection refused and 503 errors during slow initialization.

* Netmon healthcheck

* Resync zxporter main with main branch

* Suppress golint-ci

* Lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants