Skip to content

feat: add retry with exponential backoff for GitHub API calls#1071

Merged
myakove merged 1 commit intomainfrom
feat/issue-1070-github-api-retry
Apr 14, 2026
Merged

feat: add retry with exponential backoff for GitHub API calls#1071
myakove merged 1 commit intomainfrom
feat/issue-1070-github-api-retry

Conversation

@myakove
Copy link
Copy Markdown
Collaborator

@myakove myakove commented Apr 14, 2026

Summary

Replace all asyncio.to_thread() GitHub API calls with github_api_call() wrapper that retries on transient HTTP 500/502/503/504 errors with exponential backoff (2s, 4s, 8s, 16s — max 4 retries).

Problem

When GitHub's API experiences temporary issues (HTTP 500 errors), all API calls fail immediately after urllib3's built-in retries exhaust. This was observed in production on the RedHat webhook server for mtv-api-tests where multiple webhooks failed with too many 500 error responses.

Solution

  • New github_api_call() function in webhook_server/utils/github_retry.py — drop-in replacement for asyncio.to_thread() with retry logic
  • Retries on: HTTP 500/502/503/504, ConnectionError, MaxRetryError
  • No retry on: 401/403/404/422 (permanent errors)
  • All 146 asyncio.to_thread() GitHub API calls replaced across 9 handler files
  • GraphQL (httpx) path in get_unresolved_review_threads also gets retry logic
  • Non-GitHub operations (e.g., shutil.rmtree) remain as asyncio.to_thread()
  • CLAUDE.md updated to enforce github_api_call() over raw asyncio.to_thread()

Testing

  • 36 new unit tests for retry utility (100% coverage on github_retry.py)
  • All 1529 tests pass, 90.72% overall coverage
  • E2E tested against myk-org/for-testing-only — PR created, labels/reviewers/check runs all processed correctly with zero errors

Also included

  • .dev/local-server-with-smee.sh — proper script with smee background process and cleanup trap
  • .dev/docker-compose.yaml — added develop.watch for docker compose --watch support

Closes #1070

Summary by CodeRabbit

  • New Features

    • Centralized, retry-enabled GitHub API call handling with exponential backoff and consistent logging, improving reliability for transient API/network failures.
  • Documentation

    • Updated guidance to standardize usage of the retry-enabled API wrapper and to include logging/context examples.
  • Tests

    • Added tests covering retry rules, backoff timing, argument forwarding, and retry-exhaustion behavior.

@myakove-bot
Copy link
Copy Markdown
Collaborator

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Pre-commit Checks: pre-commit runs automatically if .pre-commit-config.yaml exists
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: All label categories are enabled (default configuration)

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest python-module-install - Test Python package installation
  • /retest pre-commit - Run pre-commit hooks and checks
  • /retest conventional-title - Validate commit message format
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 1 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  5. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • myakove
  • rnetser

Reviewers:

  • myakove
  • rnetser
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge
AI Features
  • Conventional Title: Mode: fix (claude/claude-opus-4-6[1m])
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])
  • Test Oracle: Triggers: approved (claude/claude-opus-4-6[1m]); /test-oracle can be used anytime

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9b1b3030-5367-4a2c-998a-b306d3853d4a

📥 Commits

Reviewing files that changed from the base of the PR and between c6feea9 and 94f2b86.

📒 Files selected for processing (15)
  • CLAUDE.md
  • webhook_server/libs/github_api.py
  • webhook_server/libs/handlers/check_run_handler.py
  • webhook_server/libs/handlers/issue_comment_handler.py
  • webhook_server/libs/handlers/labels_handler.py
  • webhook_server/libs/handlers/owners_files_handler.py
  • webhook_server/libs/handlers/pull_request_handler.py
  • webhook_server/libs/handlers/push_handler.py
  • webhook_server/libs/handlers/runner_handler.py
  • webhook_server/libs/test_oracle.py
  • webhook_server/tests/test_clean_rebase_detection.py
  • webhook_server/tests/test_github_retry.py
  • webhook_server/tests/test_runner_handler.py
  • webhook_server/tests/test_test_oracle.py
  • webhook_server/utils/github_retry.py

Walkthrough

Centralizes PyGithub blocking-call handling by adding an async retry wrapper github_api_call() and replacing direct asyncio.to_thread(...) usages across the webhook server. Adds retry/backoff classification for transient GitHub/transport errors, updates handlers to pass logging context, and adds tests and documentation reflecting the new pattern.

Changes

Cohort / File(s) Summary
New Retry Utility
webhook_server/utils/github_retry.py
Adds _is_retryable(ex: Exception) -> bool and github_api_call[T](func, *args, logger, log_prefix, **kwargs) -> T, which offloads sync PyGithub calls via asyncio.to_thread() and retries on transient errors (HTTP 500/502/503/504 and transport errors) with exponential backoff and warning logs.
GitHub API Layer
webhook_server/libs/github_api.py
Replaced many asyncio.to_thread(...) calls with await github_api_call(..., logger=self.logger, log_prefix=self.log_prefix) for rate-limit checks, repository/PR fields, paginated iteration, and head-SHA resolution; converted a GraphQL POST+raise_for_status flow to a retry loop with exponential backoff.
Handlers (bulk updates)
webhook_server/libs/handlers/check_run_handler.py, .../issue_comment_handler.py, .../labels_handler.py, .../owners_files_handler.py, .../pull_request_handler.py, .../push_handler.py, .../runner_handler.py
Uniformly migrated synchronous PyGithub interactions (property reads, list/iteration, create/edit/comment/label/assignee/review operations) from asyncio.to_thread(...) to github_api_call(..., logger=..., log_prefix=...). Preserved control flow and exception semantics, added logging context forwarding.
Test Oracle
webhook_server/libs/test_oracle.py
Replaced two asyncio.to_thread(...) calls (PR comment, pull_request.html_url) with github_api_call(...) including logger/log_prefix.
Tests
webhook_server/tests/test_github_retry.py, plus small updates in test_clean_rebase_detection.py, test_test_oracle.py, test_runner_handler.py
Adds comprehensive tests for _is_retryable() and github_api_call() (retry behavior, backoff, non-retryable cases, CancelledError propagation, arg forwarding, logging). Updated a few test comments and expectations to reflect new boolean return for is_branch_exists and github_api_call usage.
Documentation
CLAUDE.md
Rewrote guidance to require github_api_call() for all method calls, potentially-blocking property accesses (including lambda reads), paginated iteration, and concurrent calls; updated examples to pass logger/log_prefix.

Sequence Diagram

sequenceDiagram
    participant Handler as Handler (e.g., PullRequestHandler)
    participant Wrapper as github_api_call()
    participant Thread as asyncio.to_thread()
    participant PyGithub as PyGithub (sync)
    participant GitHub as GitHub Server

    Handler->>Wrapper: await github_api_call(func, *args, logger, log_prefix)
    Wrapper->>Thread: offload func to thread
    Thread->>PyGithub: execute synchronous call
    PyGithub->>GitHub: HTTP request
    alt success
        GitHub-->>PyGithub: 200 OK
        PyGithub-->>Thread: return result
        Thread-->>Wrapper: result
        Wrapper-->>Handler: return result
    else transient 5xx / connection error
        GitHub-->>PyGithub: 5xx / connection fail
        PyGithub-->>Thread: raise exception
        Thread-->>Wrapper: exception
        Wrapper->>Wrapper: _is_retryable -> true
        Wrapper->>Wrapper: log warning (attempt N/M)
        Wrapper->>Wrapper: await asyncio.sleep(exponential_delay)
        Wrapper->>Thread: retry offload
    else non-retryable 4xx / auth / validation
        GitHub-->>PyGithub: 4xx
        PyGithub-->>Thread: raise exception
        Thread-->>Wrapper: exception
        Wrapper->>Wrapper: _is_retryable -> false
        Wrapper-->>Handler: re-raise immediately
    end
    alt retries exhausted
        Wrapper-->>Handler: re-raise last exception
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • rnetser
  • dbasunag
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing retry with exponential backoff for GitHub API calls, which is the primary objective of this PR.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #1070: implements github_api_call() wrapper with retry logic, retries on transient errors (500/502/503/504, ConnectionError, MaxRetryError), skips permanent errors (401/403/404/422), uses exponential backoff, replaces 146 asyncio.to_thread() calls across 9 handlers, adds comprehensive unit tests with 100% coverage, and includes documentation updates.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the retry-mechanism implementation: github_api_call() wrapper, handler replacements, documentation, tests, and supporting infrastructure (.dev files for local development). No unrelated refactoring or feature additions detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-1070-github-api-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webhook_server/libs/handlers/runner_handler.py (1)

1073-1088: ⚠️ Potential issue | 🔴 Critical

CRITICAL: create_issue_comment() is called with wrong argument count.

pull_request.create_issue_comment accepts a single body parameter. This call passes two positional strings, so when gh pr create fails, the fallback comment will raise TypeError instead of posting recovery instructions. PyGithub's signature is create_issue_comment(body: str).

Proposed fix
-                    await github_api_call(
-                        pull_request.create_issue_comment,
-                        "**Cherry-pick branch created, but PR creation failed**\n",
-                        f"Branch `{new_branch_name}` was pushed to the repository.\n"
-                        f"Create the PR manually:\n"
-                        "```\n"
-                        f"gh pr create --repo {repo_full_name}"
-                        f" --base {target_branch}"
-                        f" --head {new_branch_name}"
-                        f" --label {cherry_picked_label}"
-                        + (f" --label {AI_RESOLVED_CONFLICTS_LABEL}" if cherry_pick_had_conflicts else "")
-                        + f" --title '{pr_title}'"
-                        f" --body '{pr_body}'\n"
-                        "```",
-                        logger=self.logger,
-                        log_prefix=self.log_prefix,
-                    )
+                    body = (
+                        "**Cherry-pick branch created, but PR creation failed**\n"
+                        f"Branch `{new_branch_name}` was pushed to the repository.\n"
+                        f"Create the PR manually:\n"
+                        "```\n"
+                        f"gh pr create --repo {repo_full_name}"
+                        f" --base {target_branch}"
+                        f" --head {new_branch_name}"
+                        f" --label {cherry_picked_label}"
+                        + (f" --label {AI_RESOLVED_CONFLICTS_LABEL}" if cherry_pick_had_conflicts else "")
+                        + f" --title '{pr_title}'"
+                        f" --body '{pr_body}'\n"
+                        "```"
+                    )
+                    await github_api_call(
+                        pull_request.create_issue_comment,
+                        body,
+                        logger=self.logger,
+                        log_prefix=self.log_prefix,
+                    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webhook_server/libs/handlers/runner_handler.py` around lines 1073 - 1088, The
call to pull_request.create_issue_comment is passing multiple positional
strings, causing a TypeError because create_issue_comment(body: str) expects a
single string; fix this in the github_api_call invocation by concatenating the
message into one string (e.g., assemble a single variable like body using
new_branch_name, repo_full_name, target_branch, cherry_picked_label,
AI_RESOLVED_CONFLICTS_LABEL, pr_title, pr_body, and cherry_pick_had_conflicts)
and then call await github_api_call(pull_request.create_issue_comment, body,
logger=self.logger, log_prefix=self.log_prefix).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.dev/docker-compose.yaml:
- Line 15: Replace the overly-broad "privileged: true" setting with explicit
minimal capabilities and device/security options: remove privileged: true and
add cap_add entries (SYS_ADMIN, MKNOD, NET_ADMIN, SETUID, SETGID), devices entry
(/dev/fuse) and security_opt entries (label=disable, seccomp=unconfined);
optionally make privileged togglable by using an environment variable
(PRIVILEGED_MODE) and keeping the same cap_add/devices/security_opt when
enabled; apply the same change to both the .dev/docker-compose.yaml and
examples/docker-compose.yaml files, targeting the "privileged" key in each
compose service block.
- Around line 11-14: Disable unauthenticated endpoints by default and restrict
binding: set ENABLE_LOG_SERVER and ENABLE_MCP_SERVER to false in the
docker-compose service and change the port mapping from "19876:5000" to bind
only localhost (e.g., "127.0.0.1:19876:5000") or remove the mapping entirely so
the log/MCP server is not exposed on 0.0.0.0; update references to
ENABLE_LOG_SERVER and ENABLE_MCP_SERVER and the "19876:5000" mapping
accordingly.

In @.dev/local-server-with-smee.sh:
- Around line 8-9: The cleanup() trap must guard against unset or stale SMEE_PID
to avoid failures; update cleanup() to first check that the SMEE_PID variable is
non-empty and a valid positive integer and that a process with that PID exists
(e.g., test with kill -0 "$SMEE_PID" or pid existence check) before invoking
kill, and ensure any kill errors are suppressed/handled so the EXIT trap is
idempotent; reference the cleanup() function and SMEE_PID variable when applying
this change.
- Around line 2-6: The startup script must source the env file relative to the
script location and validate required env vars before starting processes:
compute the script directory and source the .env from that directory (instead of
relying on the current working directory), then check SMEE_URL, SERVER_PORT, and
DOCKER_COMPOSE_FILE are set and non-empty (exit with a clear error message if
any are missing). Also harden the cleanup/trap logic to only kill SMEE_PID if it
is set and a valid numeric PID (or check the process exists) to avoid killing an
empty/invalid PID. Update references in the script to use the validated
variables going forward.

In `@webhook_server/libs/github_api.py`:
- Around line 1182-1183: _get_last_commit currently consumes the PaginatedList
returned by pull_request.get_commits outside the retryable github_api_call
boundary; change the call so the pagination is iterated inside github_api_call
by passing a lambda that converts the generator to a list (e.g.,
github_api_call(lambda: list(pull_request.get_commits()), logger=self.logger,
log_prefix=self.log_prefix)), then return the last element from that returned
list; update the call site in _get_last_commit to use this pattern so iteration
is inside github_api_call.

In `@webhook_server/libs/handlers/check_run_handler.py`:
- Around line 424-426: The call to github_api_call passes pull_request.base.ref
directly, causing evaluation before the wrapper; change the argument to be a
zero-arg lambda that returns pull_request.base.ref (i.e.,
github_api_call(lambda: pull_request.base.ref, self.repository.get_branch, ...))
so the property is accessed inside github_api_call; keep the same target
function (self.repository.get_branch) and the existing logger/log_prefix
parameters (self.logger, self.log_prefix) intact.
- Around line 265-267: The loop over the PaginatedList returned by
github_api_call(self.github_webhook.last_commit.get_check_runs, ...) causes lazy
page fetches on the event loop; change the call so the PaginatedList is fully
materialized inside the thread pool (e.g., invoke github_api_call with a lambda
that calls list(self.github_webhook.last_commit.get_check_runs()) or otherwise
wraps the method call in list(...)) so the returned object is a concrete list
and the subsequent for run in ... iteration does not block the event loop.

In `@webhook_server/libs/handlers/issue_comment_handler.py`:
- Around line 412-414: The current call materializes the PaginatedList outside
the retry wrapper (repo_contributors = list(await
github_api_call(self.repository.get_contributors, ...))), which can block the
event loop; change it so github_api_call performs the iteration by passing a
lambda that converts the PaginatedList to a list inside the retry wrapper (e.g.,
call github_api_call(lambda: list(self.repository.get_contributors()),
logger=self.logger, log_prefix=self.log_prefix)) so that `repo_contributors` is
assigned from the awaited result of that wrapped call; keep references to
github_api_call, self.repository.get_contributors, logger and log_prefix
unchanged.

In `@webhook_server/libs/handlers/labels_handler.py`:
- Around line 115-116: The current code calls
github_api_call(pull_request.get_labels, ...) and then iterates over the
returned PaginatedList outside github_api_call; change the call to materialize
the PaginatedList inside github_api_call by passing a lambda that returns
list(pull_request.get_labels()), e.g. github_api_call(lambda:
list(pull_request.get_labels()), logger=self.logger,
log_prefix=self.log_prefix), then iterate over that list to build and return
[lb.name for lb in labels]; ensure references to github_api_call and
pull_request.get_labels are used as shown.

In `@webhook_server/libs/handlers/owners_files_handler.py`:
- Around line 496-498: The current iteration over the PaginatedList returned by
pull_request.get_issue_comments occurs on the event loop; wrap the iteration
inside the worker thread by calling github_api_call with a lambda that
materializes the list (e.g., lambda: list(pull_request.get_issue_comments(...)))
so the full PaginatedList is fetched inside github_api_call; update the call
site where you currently pass pull_request.get_issue_comments to instead pass a
lambda that returns list(pull_request.get_issue_comments(...)) and then iterate
the returned concrete list.

In `@webhook_server/libs/handlers/pull_request_handler.py`:
- Around line 1559-1564: The code currently treats any falsy
pull_request.mergeable as a failure; change it to only fail when mergeable is
explicitly False by reusing the polling behavior in
label_pull_request_by_merge_state(): call github_api_call to fetch
pull_request.mergeable (as before) but if the result is None, poll/retry (with
the same delay/count/timeout logic used by label_pull_request_by_merge_state())
until mergeable is True or False, and only append the failure message when the
final value is False; keep using github_api_call and reference
pull_request.mergeable and label_pull_request_by_merge_state to locate and align
the retry logic.
- Around line 943-948: The PR comment creation is awaited inline inside the
tag-deletion critical path (calls to github_api_call with
pull_request.create_issue_comment), which can cause the delete functions
(_delete_registry_tag_via_regctl and _delete_ghcr_tag_via_github_api) to surface
failures from comment posting; change this to fire-and-forget: after the delete
completes, spawn a detached async task/future to call
github_api_call(pull_request.create_issue_comment, ...) and log any errors
inside that task without re-raising, so the main delete flow only fails on
actual delete errors and not on notification failures; use
self.logger/self.log_prefix for internal task logging and ensure the original
awaited delete call remains unchanged.
- Around line 224-226: The create calls (e.g.,
github_api_call(pull_request.create_issue_comment, ...)) are non-idempotent and
retries can duplicate comments; make these writes idempotent by adding a
deterministic marker/nonce to the comment body (e.g., "// marker:
welcome-<pr-number>" or similar) and change the flow to first search existing
comments via pull_request.get_issue_comments (or the corresponding list call)
for that marker and skip create if found, otherwise call github_api_call to
create; alternatively implement an upsert by creating then editing the comment
if a matching marker exists — apply this pattern to all affected operations
(create_issue_comment and the other create-style writes noted in the review) so
retries do not produce duplicates.

In `@webhook_server/tests/test_github_retry.py`:
- Around line 99-288: Patch asyncio.to_thread in these tests to avoid real
thread offloading: add a patch for
"webhook_server.utils.github_retry.asyncio.to_thread" (new_callable=AsyncMock)
to each test decorator or as a class-level patch so each test gets an extra mock
parameter (e.g., mock_to_thread), and configure mock_to_thread.side_effect =
lambda fn, *a, **kw: fn(*a, **kw) so github_api_call (which delegates to
asyncio.to_thread) runs synchronously and deterministic for backoff/assertions.

In `@webhook_server/utils/github_retry.py`:
- Around line 96-98: The retry wrapper currently allows re-executing mutating
GitHub calls (e.g., create_issue_comment, create_issue, create_review_request,
create_check_run, merge) which can duplicate side effects; modify the retry
logic in webhook_server/utils/github_retry.py to make write retries explicit by
splitting policies into read vs write (e.g., add two decorators or functions
like retry_read_only and retry_mutating) and change the default wrapper to use
retry_read_only; require an explicit opt-in flag or different callsite wrapper
(e.g., retry_mutating(retries=..., force_opt_in=True)) for mutating functions so
callers of
create_issue_comment/create_issue/create_review_request/create_check_run/merge
must opt-in to retry behavior.
- Around line 46-62: The retry classifier _is_retryable currently skips explicit
handling of urllib3.exceptions.ResponseError; import ResponseError and update
_is_retryable to include an isinstance(ex, ResponseError) check alongside
RequestsConnectionError and MaxRetryError so ResponseError is treated as
retryable (follow the same pattern used for MaxRetryError in the existing
block).

---

Outside diff comments:
In `@webhook_server/libs/handlers/runner_handler.py`:
- Around line 1073-1088: The call to pull_request.create_issue_comment is
passing multiple positional strings, causing a TypeError because
create_issue_comment(body: str) expects a single string; fix this in the
github_api_call invocation by concatenating the message into one string (e.g.,
assemble a single variable like body using new_branch_name, repo_full_name,
target_branch, cherry_picked_label, AI_RESOLVED_CONFLICTS_LABEL, pr_title,
pr_body, and cherry_pick_had_conflicts) and then call await
github_api_call(pull_request.create_issue_comment, body, logger=self.logger,
log_prefix=self.log_prefix).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cae89f37-bd88-4ca6-a359-fbe5e3002215

📥 Commits

Reviewing files that changed from the base of the PR and between ebba749 and a0cae1c.

📒 Files selected for processing (16)
  • .dev/docker-compose.yaml
  • .dev/local-server-with-smee.sh
  • CLAUDE.md
  • webhook_server/libs/github_api.py
  • webhook_server/libs/handlers/check_run_handler.py
  • webhook_server/libs/handlers/issue_comment_handler.py
  • webhook_server/libs/handlers/labels_handler.py
  • webhook_server/libs/handlers/owners_files_handler.py
  • webhook_server/libs/handlers/pull_request_handler.py
  • webhook_server/libs/handlers/push_handler.py
  • webhook_server/libs/handlers/runner_handler.py
  • webhook_server/libs/test_oracle.py
  • webhook_server/tests/test_clean_rebase_detection.py
  • webhook_server/tests/test_github_retry.py
  • webhook_server/tests/test_test_oracle.py
  • webhook_server/utils/github_retry.py

Comment thread .dev/docker-compose.yaml Outdated
Comment thread .dev/docker-compose.yaml Outdated
Comment thread .dev/local-server-with-smee.sh Outdated
Comment thread .dev/local-server-with-smee.sh Outdated
Comment thread webhook_server/libs/github_api.py Outdated
Comment thread webhook_server/libs/handlers/pull_request_handler.py Outdated
Comment thread webhook_server/libs/handlers/pull_request_handler.py
Comment thread webhook_server/tests/test_github_retry.py
Comment thread webhook_server/utils/github_retry.py
Comment thread webhook_server/utils/github_retry.py
@myakove
Copy link
Copy Markdown
Collaborator Author

myakove commented Apr 14, 2026

@coderabbitai[bot]

The following review comments were reviewed and a decision was made:

webhook_server/libs/handlers/runner_handler.py:1073 (outside diff) — CRITICAL: create_issue_comment() is called with wrong argument count.

Addressed: Fixed: combined the two positional string arguments into a single body variable before passing to create_issue_comment().

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@myakove Thanks for the fix! Combining the two positional string arguments into a single body variable before passing to create_issue_comment() is the correct approach — create_issue_comment() accepts exactly one positional argument (body: str), so the previous two-arg call would have raised a TypeError at runtime.


🧠 Learnings used
Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1016
File: webhook_server/libs/ai_cli.py:9-11
Timestamp: 2026-03-03T13:59:54.737Z
Learning: In webhook_server/**/*.py, violations of the contextual logging requirement (using module-level `get_logger(name=...)` instead of `get_logger_with_params(repository=..., hook_id=..., component=...)`) must always be flagged as CRITICAL severity and marked as must-fix. These are blocking issues that lose repository/hook correlation in logs.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 778
File: webhook_server/libs/pull_request_handler.py:327-330
Timestamp: 2025-05-13T12:06:27.297Z
Learning: In the GitHub webhook server, synchronous GitHub API calls (like create_issue_comment, add_to_assignees, etc.) in async methods should be awaited using asyncio.to_thread or loop.run_in_executor to prevent blocking the event loop.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 778
File: webhook_server/libs/pull_request_handler.py:327-330
Timestamp: 2025-05-13T12:06:27.297Z
Learning: In the GitHub webhook server, synchronous GitHub API calls (like create_issue_comment, add_to_assignees, etc.) in async methods should be awaited using asyncio.to_thread or loop.run_in_executor to prevent blocking the event loop.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1025
File: webhook_server/libs/handlers/runner_handler.py:1031-1033
Timestamp: 2026-03-12T13:44:36.341Z
Learning: In webhook_server/libs/handlers/runner_handler.py and other handlers, PyGithub properties such as `.number`, `.title`, and `.body` on PullRequest objects are safe to access directly in async code without asyncio.to_thread(), as they are cached from the webhook payload or from a previously fetched object (e.g., via get_pull()). Per CLAUDE.md: 'Safe operations (don't block): Properties from webhook payload (.number, .title, .body)'. Only actual API method calls (e.g., get_pull(), create_issue_comment(), add_to_labels()) need to be wrapped in asyncio.to_thread().

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1002
File: webhook_server/tests/test_pull_request_review_handler.py:298-344
Timestamp: 2026-02-28T12:05:25.906Z
Learning: In the myk-org/github-webhook-server repository's test-oracle integration (webhook_server/libs/test_oracle.py and webhook_server/libs/handlers/*.py), the 'approved' trigger refers to the `/approve` **command** (user-initiated comment command), NOT GitHub's review approval state. GitHub review approvals without the `/approve` command should NOT trigger the test oracle. This distinction is critical: review state "approved" alone does not trigger oracle; only the explicit `/approve` command does.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 586
File: webhook_server_container/libs/github_api.py:1947-1956
Timestamp: 2024-10-08T09:19:56.185Z
Learning: In `webhook_server_container/libs/github_api.py`, the indentation style used in the `set_pull_request_automerge` method is acceptable as per the project's coding standards.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 612
File: webhook_server_container/libs/github_api.py:925-926
Timestamp: 2024-10-29T10:42:50.163Z
Learning: In `webhook_server_container/libs/github_api.py`, the method `self._keep_approved_by_approvers_after_rebase()` must be called after removing labels when synchronizing a pull request. Therefore, it should be placed outside the `ThreadPoolExecutor` to ensure it runs sequentially after label removal.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 612
File: webhook_server_container/libs/github_api.py:2089-2100
Timestamp: 2024-10-29T08:09:57.157Z
Learning: In `webhook_server_container/libs/github_api.py`, when the function `_keep_approved_by_approvers_after_rebase` is called, existing approval labels have already been cleared after pushing new changes, so there's no need to check for existing approvals within this function.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1025
File: webhook_server/libs/handlers/runner_handler.py:952-977
Timestamp: 2026-03-12T13:44:41.853Z
Learning: In myk-org/github-webhook-server (webhook_server/**/*.py), PyGithub properties `.number`, `.title`, and `.body` are cached from the webhook payload and do NOT trigger blocking API calls. Therefore, these properties do NOT need to be wrapped with `asyncio.to_thread()` in async methods. Only PyGithub properties or methods that make live API calls (e.g., `.user.login`, `.base.ref`, `get_pull()`, `create_issue_comment()`, etc.) require wrapping. This is documented in CLAUDE.md as "Safe operations: Properties from webhook payload (.number, .title, .body)".

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@myakove
Copy link
Copy Markdown
Collaborator Author

myakove commented Apr 14, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webhook_server/libs/handlers/issue_comment_handler.py (1)

100-127: ⚠️ Potential issue | 🔴 Critical

CRITICAL: CancelledError is silently swallowed in the command fan-out.

In Python, asyncio.CancelledError is a BaseException, not an Exception. When asyncio.gather(..., return_exceptions=True) collects results from cancelled tasks, it returns CancelledError objects. The current code checks isinstance(result, Exception) first, which fails for CancelledError since it's not a subclass of Exception. This means cancelled command tasks are treated as success and the handler step completes normally instead of propagating the cancellation signal—breaking handler cancellation semantics.

Fix
                 failed_commands: list[tuple[str, Exception]] = []
                 for idx, result in enumerate(results):
                     user_command = _user_commands[idx]
+                    if isinstance(result, asyncio.CancelledError):
+                        raise result
                     if isinstance(result, Exception):
-                        # Re-raise CancelledError immediately to allow cancellation to propagate
-                        if isinstance(result, asyncio.CancelledError):
-                            raise result
                         self.logger.error(f"{self.log_prefix} Command execution failed: /{user_command} - {result}")
                         failed_commands.append((user_command, result))

Check for CancelledError before checking for other Exception types to ensure cancellation propagates correctly. Per coding guidelines: "Always re-raise asyncio.CancelledError in except blocks to preserve cancellation semantics".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webhook_server/libs/handlers/issue_comment_handler.py` around lines 100 -
127, The code that processes results from asyncio.gather is incorrectly treating
asyncio.CancelledError as a successful result because it first checks
isinstance(result, Exception); change the result-handling logic in the block
that iterates over results (the for loop using results, _user_commands, and idx)
to first detect and re-raise asyncio.CancelledError (or isinstance(result,
asyncio.CancelledError)) before handling other Exception types so that
cancellations from tasks created via
asyncio.create_task(self.user_commands(...)) propagate correctly; ensure this
check occurs prior to any Exception-based handling of failures.
♻️ Duplicate comments (2)
webhook_server/libs/handlers/pull_request_handler.py (1)

1171-1178: ⚠️ Potential issue | 🟠 Major

HIGH: pull_request.user.login is still evaluated before github_api_call() starts.

Python evaluates call arguments first, so both sites can still block the event loop and bypass retry entirely. In add_pull_request_owner_as_assingee(), a transient failure resolving .user.login will also send control into the fallback-assignee path even though the assignee lookup was the part that failed. Fetch the login first, e.g. assignee_login = await github_api_call(lambda: pull_request.user.login, logger=self.logger, log_prefix=self.log_prefix), and then reuse assignee_login in the wrapped call.

Read-only verification
#!/bin/bash
set -euo pipefail
rg -n -C2 'pull_request\.user\.login' webhook_server/libs/handlers/pull_request_handler.py
fd -a 'CLAUDE.md' | xargs -r rg -n 'user\.login|base\.ref|Safe operations'

Based on learnings, "Only PyGithub properties or methods that perform live API calls (e.g., .user.login, .base.ref, get_pull(), create_issue_comment(), etc.) should be wrapped to avoid blocking."

Also applies to: 1497-1503

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webhook_server/libs/handlers/pull_request_handler.py` around lines 1171 -
1178, The call currently passes pull_request.user.login as a positional argument
to github_api_call which evaluates the property before github_api_call runs;
change this by first resolving the owner login with an awaited github_api_call
wrapper (e.g., assignee_login = await github_api_call(lambda:
pull_request.user.login, logger=self.logger, log_prefix=self.log_prefix)) and
then call github_api_call(self.repository.create_issue, ...,
assignee=assignee_login, ...); update the method containing this code (e.g.,
add_pull_request_owner_as_assingee) and apply the same pattern to the other
occurrences noted (around the block at the later occurrence that references
pull_request.user.login) so any PyGithub property access is wrapped and awaited
before being passed into github_api_call.
webhook_server/libs/github_api.py (1)

1025-1029: ⚠️ Potential issue | 🟠 Major

HIGH: Materialize get_pulls() inside github_api_call().

PyGithub documents Commit.get_pulls() as returning PaginatedList[PullRequest], and PaginatedList is consumed through collection operations like indexing. _pulls[0] therefore still happens outside the retry/non-blocking boundary here, which reintroduces the same pagination bug you already fixed for get_commits(). (pygithub.readthedocs.io)

Proposed fix
         commit_obj = await github_api_call(
             self.repository.get_commit, commit["sha"], logger=self.logger, log_prefix=self.log_prefix
         )
         with contextlib.suppress(Exception):
-            _pulls = await github_api_call(commit_obj.get_pulls, logger=self.logger, log_prefix=self.log_prefix)
-            if _pulls:
-                self.logger.debug(f"{self.log_prefix} Found PR from commit SHA: {_pulls[0].number}")
-                return _pulls[0]
+            pulls = await github_api_call(
+                lambda: list(commit_obj.get_pulls()),
+                logger=self.logger,
+                log_prefix=self.log_prefix,
+            )
+            if pulls:
+                self.logger.debug(f"{self.log_prefix} Found PR from commit SHA: {pulls[0].number}")
+                return pulls[0]

As per coding guidelines, "Wrap PaginatedList iteration with github_api_call(lambda: list(...), ...) to prevent blocking the event loop."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webhook_server/libs/github_api.py` around lines 1025 - 1029, The call to
commit_obj.get_pulls is returning a PaginatedList and the indexing _pulls[0]
happens outside the retry/non-blocking wrapper; change the github_api_call
invocation to materialize the list within the wrapper (e.g., call
github_api_call(lambda: list(commit_obj.get_pulls()), ...)) so the pagination
iteration happens inside github_api_call, then use the returned list to check
and log the first PR (the symbols to update are commit_obj.get_pulls,
github_api_call, the local _pulls variable and the self.logger.debug(...) that
references _pulls[0]).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@webhook_server/libs/handlers/owners_files_handler.py`:
- Around line 209-211: The code is using github_api_call for local filesystem
operations (e.g., reading a file via content_path.read_text and enumerating with
rglob) which misclassifies disk I/O as GitHub API failures; replace those
github_api_call usages in owners_files_handler.py with plain thread-offloading
using asyncio.to_thread: call await asyncio.to_thread(content_path.read_text,
encoding="utf-8") for reading file contents and use await
asyncio.to_thread(lambda: list(content_root.rglob(...))) or await
asyncio.to_thread(list, content_root.rglob(...)) for recursive file listing,
remove any retry/logging behavior tied to github_api_call for these paths, and
keep using self.logger/self.log_prefix for local-file error logging where
appropriate.

In `@webhook_server/libs/handlers/pull_request_handler.py`:
- Around line 1023-1033: The exception handler logs "Tag cleanup succeeded"
which contradicts reality—this branch runs when registry login/delete failed—so
update the message in the except branch (around github_api_call invoking
pull_request.create_issue_comment) to accurately state that tag cleanup failed
and the PR notification failed (e.g., use "Tag cleanup failed and PR
notification failed: {repository_full_tag}") and include self.log_prefix and the
caught exception context via self.logger.exception so the log reflects the
failure and includes error details.
- Around line 1766-1768: Remove the redundant truthiness guard around the
pull_request parameter: delete the "if pull_request and" short-circuit and
directly call github_api_call(lambda: pull_request.is_merged(), ...) so a
missing/None pull_request fails fast; alternatively add an explicit check that
raises a ValueError (e.g., if pull_request is None: raise
ValueError("pull_request is required")) before calling github_api_call. This
touches the usage of the pull_request parameter and the github_api_call(lambda:
pull_request.is_merged(), logger=self.logger, log_prefix=self.log_prefix)
invocation in the pull request handler.

In `@webhook_server/libs/handlers/runner_handler.py`:
- Around line 724-725: Change is_branch_exists to return a bool and handle 404
instead of propagating exceptions: update the signature of
is_branch_exists(self, branch: str) -> bool, import GithubException from
github.GithubException, call github_api_call(self.repository.get_branch, branch,
...) inside a try/except, return True on success, catch GithubException and if
ex.status == 404 return False, otherwise re-raise the exception; keep logger and
log_prefix parameters when calling github_api_call.

In `@webhook_server/tests/test_github_retry.py`:
- Around line 73-83: Add unit tests that explicitly exercise
urllib3.exceptions.ResponseError so the explicit type-check branch in
_is_retryable is covered: import ResponseError from urllib3.exceptions and add a
test like test_urllib3_response_error_is_retryable that constructs
ResponseError("connection reset by peer") and asserts _is_retryable(ex) is True;
additionally add a parallel case in the TestGithubApiCall retry-flow tests (the
tests that currently exercise RequestsConnectionError and MaxRetryError) to
simulate/raise a ResponseError and assert the retry behavior is triggered for
the same code paths. Ensure the tests reference _is_retryable and the
TestGithubApiCall retry logic so coverage protects the explicit ResponseError
branch.

In `@webhook_server/utils/github_retry.py`:
- Around line 65-70: The wrapper github_api_call currently erases the wrapped
callable's parameter types by using Callable[..., T] and *args: Any/**kwargs:
Any; fix this by introducing a ParamSpec (e.g., P) and make the function generic
over P and T, change the callable type to Callable[P, T], and update the
signature to accept *args: P.args and **kwargs: P.kwargs (keeping logger and
log_prefix params) so callers preserve proper type checking; also import
ParamSpec from typing and adjust any relevant type variables declarations (P, T)
accordingly.

---

Outside diff comments:
In `@webhook_server/libs/handlers/issue_comment_handler.py`:
- Around line 100-127: The code that processes results from asyncio.gather is
incorrectly treating asyncio.CancelledError as a successful result because it
first checks isinstance(result, Exception); change the result-handling logic in
the block that iterates over results (the for loop using results,
_user_commands, and idx) to first detect and re-raise asyncio.CancelledError (or
isinstance(result, asyncio.CancelledError)) before handling other Exception
types so that cancellations from tasks created via
asyncio.create_task(self.user_commands(...)) propagate correctly; ensure this
check occurs prior to any Exception-based handling of failures.

---

Duplicate comments:
In `@webhook_server/libs/github_api.py`:
- Around line 1025-1029: The call to commit_obj.get_pulls is returning a
PaginatedList and the indexing _pulls[0] happens outside the retry/non-blocking
wrapper; change the github_api_call invocation to materialize the list within
the wrapper (e.g., call github_api_call(lambda: list(commit_obj.get_pulls()),
...)) so the pagination iteration happens inside github_api_call, then use the
returned list to check and log the first PR (the symbols to update are
commit_obj.get_pulls, github_api_call, the local _pulls variable and the
self.logger.debug(...) that references _pulls[0]).

In `@webhook_server/libs/handlers/pull_request_handler.py`:
- Around line 1171-1178: The call currently passes pull_request.user.login as a
positional argument to github_api_call which evaluates the property before
github_api_call runs; change this by first resolving the owner login with an
awaited github_api_call wrapper (e.g., assignee_login = await
github_api_call(lambda: pull_request.user.login, logger=self.logger,
log_prefix=self.log_prefix)) and then call
github_api_call(self.repository.create_issue, ..., assignee=assignee_login,
...); update the method containing this code (e.g.,
add_pull_request_owner_as_assingee) and apply the same pattern to the other
occurrences noted (around the block at the later occurrence that references
pull_request.user.login) so any PyGithub property access is wrapped and awaited
before being passed into github_api_call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 28cd11e8-16eb-420d-9716-a4c69483ad0b

📥 Commits

Reviewing files that changed from the base of the PR and between a0cae1c and c6feea9.

📒 Files selected for processing (14)
  • CLAUDE.md
  • webhook_server/libs/github_api.py
  • webhook_server/libs/handlers/check_run_handler.py
  • webhook_server/libs/handlers/issue_comment_handler.py
  • webhook_server/libs/handlers/labels_handler.py
  • webhook_server/libs/handlers/owners_files_handler.py
  • webhook_server/libs/handlers/pull_request_handler.py
  • webhook_server/libs/handlers/push_handler.py
  • webhook_server/libs/handlers/runner_handler.py
  • webhook_server/libs/test_oracle.py
  • webhook_server/tests/test_clean_rebase_detection.py
  • webhook_server/tests/test_github_retry.py
  • webhook_server/tests/test_test_oracle.py
  • webhook_server/utils/github_retry.py

Comment thread webhook_server/libs/handlers/owners_files_handler.py Outdated
Comment thread webhook_server/libs/handlers/pull_request_handler.py
Comment thread webhook_server/libs/handlers/pull_request_handler.py Outdated
Comment thread webhook_server/libs/handlers/runner_handler.py Outdated
Comment thread webhook_server/tests/test_github_retry.py
Comment thread webhook_server/utils/github_retry.py
@myakove
Copy link
Copy Markdown
Collaborator Author

myakove commented Apr 14, 2026

@coderabbitai[bot]

The following review comments were reviewed and a decision was made:

webhook_server/libs/handlers/issue_comment_handler.py:100 (outside diff) — CRITICAL: CancelledError is silently swallowed in the command fan-out.

Addressed: Added CancelledError check before Exception check in the gather results processing loop.

webhook_server/libs/handlers/pull_request_handler.py:1171 (duplicate) — HIGH: pull_request.user.login is still evaluated before github_api_call() starts.

Addressed: Wrapped pull_request.user.login in a separate github_api_call(lambda: ...) call before using the value.

webhook_server/libs/github_api.py:1025 (duplicate) — HIGH: Materialize get_pulls() inside github_api_call().

Addressed: Materialized get_pulls() inside github_api_call() with lambda: list(...).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@myakove Thanks for confirming all three fixes!

  • issue_comment_handler.py:100 — CancelledError swallowed: Correct fix — checking for asyncio.CancelledError before the broad Exception catch ensures cancellation propagates cleanly and doesn't get silently absorbed in the fan-out loop.

  • pull_request_handler.py:1171pull_request.user.login eager evaluation: Correct fix — wrapping the lazy property access in its own github_api_call(lambda: pull_request.user.login, ...) ensures the blocking network fetch is deferred into the thread and covered by the retry logic, rather than evaluated on the async thread before the call.

  • github_api.py:1025 — Materialize get_pulls() inside github_api_call(): Correct fix — get_pulls() returns a lazy PaginatedList that only hits the network when iterated; wrapping with lambda: list(...) forces full materialization inside the retried thread call, so transient failures during pagination are also retried.


🧠 Learnings used
Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1025
File: webhook_server/libs/handlers/runner_handler.py:1031-1033
Timestamp: 2026-03-12T13:44:36.341Z
Learning: In webhook_server/libs/handlers/runner_handler.py and other handlers, PyGithub properties such as `.number`, `.title`, and `.body` on PullRequest objects are safe to access directly in async code without asyncio.to_thread(), as they are cached from the webhook payload or from a previously fetched object (e.g., via get_pull()). Per CLAUDE.md: 'Safe operations (don't block): Properties from webhook payload (.number, .title, .body)'. Only actual API method calls (e.g., get_pull(), create_issue_comment(), add_to_labels()) need to be wrapped in asyncio.to_thread().

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1071
File: webhook_server/utils/github_retry.py:96-98
Timestamp: 2026-04-14T10:10:33.046Z
Learning: In myk-org/github-webhook-server, the `github_api_call()` wrapper in `webhook_server/utils/github_retry.py` intentionally retries ALL GitHub API calls including non-idempotent/mutating ones (e.g., `create_issue_comment`, `create_issue`, `create_review_request`, `create_check_run`, `merge`) on transient 5xx errors. This is an accepted and documented tradeoff — rare duplicate side effects are considered preferable to hard failures. Do NOT flag this as an issue or suggest splitting read/write retry policies; the maintainer explicitly rejected that approach as adding unnecessary complexity for minimal benefit.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1071
File: webhook_server/libs/handlers/pull_request_handler.py:224-226
Timestamp: 2026-04-14T10:11:41.602Z
Learning: In myk-org/github-webhook-server, retrying non-idempotent GitHub write operations (e.g., `create_issue_comment`, `create_issue`, `create_review_request`) through `github_api_call()` in `webhook_server/utils/github_retry.py` is an accepted and documented tradeoff. The risk of duplicate comments or issues on retry after a transient 5xx error is knowingly accepted in exchange for resilience against transient GitHub API failures. Do NOT flag this pattern as a bug or suggest adding idempotency markers — the decision is intentional and documented in a comment in `github_retry.py` above the retry loop.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 612
File: webhook_server_container/libs/github_api.py:925-926
Timestamp: 2024-10-29T10:42:50.163Z
Learning: In `webhook_server_container/libs/github_api.py`, the method `self._keep_approved_by_approvers_after_rebase()` must be called after removing labels when synchronizing a pull request. Therefore, it should be placed outside the `ThreadPoolExecutor` to ensure it runs sequentially after label removal.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 612
File: webhook_server_container/libs/github_api.py:2089-2100
Timestamp: 2024-10-29T08:09:57.157Z
Learning: In `webhook_server_container/libs/github_api.py`, when the function `_keep_approved_by_approvers_after_rebase` is called, existing approval labels have already been cleared after pushing new changes, so there's no need to check for existing approvals within this function.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1016
File: webhook_server/libs/ai_cli.py:9-11
Timestamp: 2026-03-03T13:59:54.737Z
Learning: In webhook_server/**/*.py, violations of the contextual logging requirement (using module-level `get_logger(name=...)` instead of `get_logger_with_params(repository=..., hook_id=..., component=...)`) must always be flagged as CRITICAL severity and marked as must-fix. These are blocking issues that lose repository/hook correlation in logs.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1025
File: webhook_server/libs/handlers/runner_handler.py:952-977
Timestamp: 2026-03-12T13:44:41.853Z
Learning: In myk-org/github-webhook-server (webhook_server/**/*.py), PyGithub properties `.number`, `.title`, and `.body` are cached from the webhook payload and do NOT trigger blocking API calls. Therefore, these properties do NOT need to be wrapped with `asyncio.to_thread()` in async methods. Only PyGithub properties or methods that make live API calls (e.g., `.user.login`, `.base.ref`, `get_pull()`, `create_issue_comment()`, etc.) require wrapping. This is documented in CLAUDE.md as "Safe operations: Properties from webhook payload (.number, .title, .body)".

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 778
File: webhook_server/libs/pull_request_handler.py:327-330
Timestamp: 2025-05-13T12:06:27.297Z
Learning: In the GitHub webhook server, synchronous GitHub API calls (like create_issue_comment, add_to_assignees, etc.) in async methods should be awaited using asyncio.to_thread or loop.run_in_executor to prevent blocking the event loop.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 778
File: webhook_server/libs/pull_request_handler.py:327-330
Timestamp: 2025-05-13T12:06:27.297Z
Learning: In the GitHub webhook server, synchronous GitHub API calls (like create_issue_comment, add_to_assignees, etc.) in async methods should be awaited using asyncio.to_thread or loop.run_in_executor to prevent blocking the event loop.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 1036
File: webhook_server/libs/github_api.py:609-630
Timestamp: 2026-03-17T14:45:44.692Z
Learning: In myk-org/github-webhook-server, in the `status` event handler inside `process()` in `webhook_server/libs/github_api.py`, only `pending` state is skipped. Terminal states `success`, `failure`, and `error` all trigger `check_if_can_be_merged()` re-evaluation so that the `can-be-merged` check always reflects the current status check outcome. Do NOT flag `failure`/`error` states as incorrectly triggering re-evaluation — this is intentional by design.

Learnt from: myakove
Repo: myk-org/github-webhook-server PR: 586
File: webhook_server_container/libs/github_api.py:1947-1956
Timestamp: 2024-10-08T09:19:56.185Z
Learning: In `webhook_server_container/libs/github_api.py`, the indentation style used in the `set_pull_request_automerge` method is acceptable as per the project's coding standards.

@myakove
Copy link
Copy Markdown
Collaborator Author

myakove commented Apr 14, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@myakove myakove merged commit 3275803 into main Apr 14, 2026
7 of 9 checks passed
@myakove myakove deleted the feat/issue-1070-github-api-retry branch April 14, 2026 12:00
@myakove-bot
Copy link
Copy Markdown
Collaborator

New container for ghcr.io/myk-org/github-webhook-server:latest published

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add retry mechanism with exponential backoff for transient GitHub API failures

2 participants