Skip to content

feat(health): add configurable retry logic for health checks#8

Merged
pamungkaski merged 1 commit intomainfrom
ki/health-retry
Jan 23, 2026
Merged

feat(health): add configurable retry logic for health checks#8
pamungkaski merged 1 commit intomainfrom
ki/health-retry

Conversation

@pamungkaski
Copy link
Collaborator

Add health_check_max_failures config to prevent nodes from being marked unhealthy on transient failures. Nodes now require X consecutive failed health checks (default: 3) before being marked unhealthy, providing better resilience against temporary network issues and momentary lag spikes.

Key changes:

  • Add health_check_max_failures to Global config (default: 3)
  • Add consecutive_failures tracking to ElNodeState and ClNodeState
  • Update calculate_el_health() and calculate_cl_health() to implement retry logic
  • Reset failure counter on successful health check
  • Update all test helpers and add comprehensive retry/recovery tests

The retry logic prevents health status flapping while maintaining quick recovery when nodes become healthy again.

Add health_check_max_failures config to prevent nodes from being marked
unhealthy on transient failures. Nodes now require X consecutive failed
health checks (default: 3) before being marked unhealthy, providing better
resilience against temporary network issues and momentary lag spikes.

Key changes:
- Add health_check_max_failures to Global config (default: 3)
- Add consecutive_failures tracking to ElNodeState and ClNodeState
- Update calculate_el_health() and calculate_cl_health() to implement retry logic
- Reset failure counter on successful health check
- Update all test helpers and add comprehensive retry/recovery tests

The retry logic prevents health status flapping while maintaining quick
recovery when nodes become healthy again.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@pamungkaski pamungkaski merged commit 50d8359 into main Jan 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants