Skip to content

feat: Add retry mechanism with exponential backoff for transient GitHub API failures #1070

@myakove

Description

@myakove

Summary

Add a retry mechanism with exponential backoff for transient GitHub API failures (HTTP 500/502/503) to prevent webhook processing failures during temporary GitHub outages.

Problem / Motivation

The webhook server currently has no retry logic for transient GitHub API failures. When GitHub's API experiences temporary issues, all API calls fail immediately after urllib3's built-in retries exhaust (which happens very quickly with only ~3 retries and no meaningful backoff).

This was observed in production on the RedHat webhook server for the mtv-api-tests repository where multiple check_run and pull_request webhooks failed with:

HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/RedHatQE/mtv-api-tests/pulls/420/requested_reviewers (Caused by ResponseError('too many 500 error responses'))

Multiple operations failed in a cascade: label management, assignee assignment, reviewer requests, and Compare API calls.

Requirements

  1. Add tenacity library as a dependency for retry logic
  2. Create a utility wrapper function (e.g., github_api_call()) that wraps asyncio.to_thread() calls with retry + exponential backoff
  3. Retry ONLY on transient errors: HTTP 500, 502, 503, ConnectionError, MaxRetryError, ResponseError
  4. Do NOT retry on permanent errors: 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), 422 (Validation)
  5. Use exponential backoff: e.g., 2s → 4s → 8s → 16s (max ~4 retries, ~30s total)
  6. Log each retry attempt with warning level
  7. Replace raw asyncio.to_thread() calls across handlers with the new retry wrapper

Deliverables

  • Add tenacity to pyproject.toml dependencies
  • Create retry utility in webhook_server/utils/github_retry.py
  • Replace asyncio.to_thread() calls in webhook_server/libs/github_api.py with retry wrapper
  • Replace asyncio.to_thread() calls in handler files (labels_handler.py, pull_request_handler.py, issue_comment_handler.py, owners_files_handler.py, check_run_handler.py, pull_request_review_handler.py, runner_handler.py) with retry wrapper
  • Add unit tests for the retry utility
  • Ensure all existing tests pass
  • Verify mypy type checking passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions