Skip to content

[chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers#626

Merged
nemesifier merged 2 commits intomasterfrom
fix-ci-failure-bot-retry
Mar 16, 2026
Merged

[chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers#626
nemesifier merged 2 commits intomasterfrom
fix-ci-failure-bot-retry

Conversation

@nemesifier
Copy link
Copy Markdown
Member

Checklist

  • I have read the OpenWISP Contributing Guidelines.
  • I have manually tested the changes proposed in this pull request.
  • I have written new test cases for new code and/or updated existing tests for changes to existing code.
  • N/A I have updated the documentation.

Reference to Existing Issue

Related to #616

Description of Changes

Fixed buggy CI failure bot retry logic which caused infinite retries and narrowed down transient error identifiers to avoid accidental triggering.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 150a714a-7fe9-4735-99fc-4f0e2ef83eaf

📥 Commits

Reviewing files that changed from the base of the PR and between 30816ea and 13cea34.

📒 Files selected for processing (3)
  • .github/actions/bot-ci-failure/analyze_failure.py
  • .github/actions/bot-ci-failure/test_analyze_failure.py
  • .github/workflows/reusable-bot-ci-failure.yml
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Python==3.12 | django~=5.0.0
  • GitHub Check: Python==3.11 | django~=5.2.0
  • GitHub Check: Python==3.11 | django~=4.2.0
  • GitHub Check: Python==3.13 | django~=5.2.0
  • GitHub Check: Python==3.10 | django~=4.2.0
  • GitHub Check: Python==3.13 | django~=5.1.0
  • GitHub Check: Python==3.10 | django~=5.0.0
  • GitHub Check: Python==3.12 | django~=5.1.0
  • GitHub Check: Python==3.12 | django~=4.2.0
  • GitHub Check: Python==3.12 | django~=5.2.0
  • GitHub Check: Python==3.10 | django~=5.1.0
  • GitHub Check: Python==3.11 | django~=5.0.0
  • GitHub Check: Python==3.11 | django~=5.1.0
  • GitHub Check: Python==3.10 | django~=5.2.0
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-03-14T20:44:14.568Z
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Changes: Update tests to cover non-trivial changes and ensure proper validation of modified behavior

Applied to files:

  • .github/actions/bot-ci-failure/test_analyze_failure.py
📚 Learning: 2026-03-14T20:44:14.568Z
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Bug Fixes: Ensure the test is deterministic and not flaky - flag tests that depend on timing, sleeps, specific timezones, system time, randomness without fixed seed, race conditions, concurrency timing, network access, external services, filesystem state, environment-specific configuration, execution order, shared global state, hardcoded ports, or unawaited async operations

Applied to files:

  • .github/actions/bot-ci-failure/test_analyze_failure.py
📚 Learning: 2026-03-14T20:44:14.568Z
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Features: Add tests for new features and ensure coverage does not decrease significantly; prefer Selenium browser tests for UI-impacting features

Applied to files:

  • .github/actions/bot-ci-failure/test_analyze_failure.py
🔇 Additional comments (5)
.github/workflows/reusable-bot-ci-failure.yml (1)

146-150: LGTM!

The refactored RETRY_COUNT extraction correctly handles the case where grep -cF returns exit code 1 when no matches are found. Using -F for fixed-string matching is appropriate since the marker contains special characters. The inline comment adequately explains why the jq-based approach was replaced.

.github/actions/bot-ci-failure/analyze_failure.py (1)

21-32: LGTM!

The narrowed transient failure markers effectively reduce false positives by requiring more specific patterns:

  • marionette.errors instead of generic marionette
  • ConnectionRefusedError (Python exception name) instead of connection refused
  • Full Coveralls URL pattern instead of just coveralls

This prevents commit messages containing these keywords from accidentally triggering auto-retry behavior, as validated by the new test case.

.github/actions/bot-ci-failure/test_analyze_failure.py (3)

173-198: LGTM!

The test cases are properly updated to use the new, more specific transient failure patterns. Each test validates detection of actual error output (e.g., marionette.errors.MarionetteException, ConnectionRefusedError: [Errno 111], full Coveralls URL) rather than generic keywords.


206-213: Good addition of a negative test case.

This test effectively validates that commit messages containing transient-related keywords (like "marionette", "coveralls", "connection refused") do NOT trigger false positive transient detection. This directly tests the PR's objective of narrowing down transient error identifiers.


255-282: LGTM!

The process_error_logs tests are correctly updated to use the new marker patterns, ensuring integration-level validation of the transient detection logic.


📝 Walkthrough

Walkthrough

This PR updates the CI failure detection logic for a bot that determines whether test failures are transient and should trigger retries. It modifies the set of error patterns recognized as transient failures, updates the corresponding test cases to match the new patterns, and refactors how the retry count is computed from pull request comments using a different parsing approach.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • [ci:fix] CI Failure bot refinements #624: Directly modifies the same CI failure bot files (TRANSIENT_FAILURE_MARKERS, test_analyze_failure.py, reusable-bot-ci-failure.yml) for related transient-failure detection updates.

Suggested labels

github_actions, helper-bots


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Bug Fixes ❓ Inconclusive Unable to examine pull request changes - shell command execution not available in current environment. Review the git diff output manually or provide the file contents to assess if bugs mentioned in the PR are properly fixed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title follows the required format with [chores:fix] prefix and clearly describes the main changes: fixing CI failure bot retry logic and narrowing transient error identifiers.
Description check ✅ Passed Description includes all required sections with sufficient detail, though documentation update is marked as N/A and no screenshot is provided (not applicable for this type of change).
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-ci-failure-bot-retry
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nemesifier nemesifier changed the title [chroes:fix] Fixed CI failure bot retry and narrowed down transient error identifiers [chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers Mar 16, 2026
@coderabbitai coderabbitai Bot added github_actions Pull requests that update GitHub Actions code helper-bots Helper bots, release management automation labels Mar 16, 2026
@github-project-automation github-project-automation Bot moved this from In progress to Reviewer approved in OpenWISP Priorities for next releases Mar 16, 2026
@nemesifier nemesifier merged commit cc39994 into master Mar 16, 2026
36 checks passed
@nemesifier nemesifier deleted the fix-ci-failure-bot-retry branch March 16, 2026 20:04
@github-project-automation github-project-automation Bot moved this from Reviewer approved to Done in OpenWISP Priorities for next releases Mar 16, 2026
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 97.348%. remained the same
when pulling 13cea34 on fix-ci-failure-bot-retry
into 30816ea on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug github_actions Pull requests that update GitHub Actions code helper-bots Helper bots, release management automation

Development

Successfully merging this pull request may close these issues.

2 participants