[chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers by nemesifier · Pull Request #626 · openwisp/openwisp-utils

nemesifier · 2026-03-16T19:59:01Z

Checklist

I have read the OpenWISP Contributing Guidelines.
I have manually tested the changes proposed in this pull request.
I have written new test cases for new code and/or updated existing tests for changes to existing code.
N/A I have updated the documentation.

Reference to Existing Issue

Related to #616

Description of Changes

Fixed buggy CI failure bot retry logic which caused infinite retries and narrowed down transient error identifiers to avoid accidental triggering.

coderabbitai · 2026-03-16T19:59:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 150a714a-7fe9-4735-99fc-4f0e2ef83eaf

📥 Commits

Reviewing files that changed from the base of the PR and between 30816ea and 13cea34.

📒 Files selected for processing (3)

.github/actions/bot-ci-failure/analyze_failure.py
.github/actions/bot-ci-failure/test_analyze_failure.py
.github/workflows/reusable-bot-ci-failure.yml

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Python==3.12 | django~=5.0.0
GitHub Check: Python==3.11 | django~=5.2.0
GitHub Check: Python==3.11 | django~=4.2.0
GitHub Check: Python==3.13 | django~=5.2.0
GitHub Check: Python==3.10 | django~=4.2.0
GitHub Check: Python==3.13 | django~=5.1.0
GitHub Check: Python==3.10 | django~=5.0.0
GitHub Check: Python==3.12 | django~=5.1.0
GitHub Check: Python==3.12 | django~=4.2.0
GitHub Check: Python==3.12 | django~=5.2.0
GitHub Check: Python==3.10 | django~=5.1.0
GitHub Check: Python==3.11 | django~=5.0.0
GitHub Check: Python==3.11 | django~=5.1.0
GitHub Check: Python==3.10 | django~=5.2.0

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2026-03-14T20:44:14.568Z

Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Changes: Update tests to cover non-trivial changes and ensure proper validation of modified behavior

Applied to files:

.github/actions/bot-ci-failure/test_analyze_failure.py

📚 Learning: 2026-03-14T20:44:14.568Z

Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Bug Fixes: Ensure the test is deterministic and not flaky - flag tests that depend on timing, sleeps, specific timezones, system time, randomness without fixed seed, race conditions, concurrency timing, network access, external services, filesystem state, environment-specific configuration, execution order, shared global state, hardcoded ports, or unawaited async operations

Applied to files:

.github/actions/bot-ci-failure/test_analyze_failure.py

📚 Learning: 2026-03-14T20:44:14.568Z

Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Features: Add tests for new features and ensure coverage does not decrease significantly; prefer Selenium browser tests for UI-impacting features

Applied to files:

.github/actions/bot-ci-failure/test_analyze_failure.py

🔇 Additional comments (5)

.github/workflows/reusable-bot-ci-failure.yml (1)

146-150: LGTM!

The refactored RETRY_COUNT extraction correctly handles the case where grep -cF returns exit code 1 when no matches are found. Using -F for fixed-string matching is appropriate since the marker contains special characters. The inline comment adequately explains why the jq-based approach was replaced.

.github/actions/bot-ci-failure/analyze_failure.py (1)

21-32: LGTM!

The narrowed transient failure markers effectively reduce false positives by requiring more specific patterns:

marionette.errors instead of generic marionette

ConnectionRefusedError (Python exception name) instead of connection refused

Full Coveralls URL pattern instead of just coveralls

This prevents commit messages containing these keywords from accidentally triggering auto-retry behavior, as validated by the new test case.

.github/actions/bot-ci-failure/test_analyze_failure.py (3)

173-198: LGTM!

The test cases are properly updated to use the new, more specific transient failure patterns. Each test validates detection of actual error output (e.g., marionette.errors.MarionetteException, ConnectionRefusedError: [Errno 111], full Coveralls URL) rather than generic keywords.

206-213: Good addition of a negative test case.

This test effectively validates that commit messages containing transient-related keywords (like "marionette", "coveralls", "connection refused") do NOT trigger false positive transient detection. This directly tests the PR's objective of narrowing down transient error identifiers.

255-282: LGTM!

The process_error_logs tests are correctly updated to use the new marker patterns, ensuring integration-level validation of the transient detection logic.

📝 Walkthrough

Walkthrough

This PR updates the CI failure detection logic for a bot that determines whether test failures are transient and should trigger retries. It modifies the set of error patterns recognized as transient failures, updates the corresponding test cases to match the new patterns, and refactors how the retry count is computed from pull request comments using a different parsing approach.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[ci:fix] CI Failure bot refinements #624: Directly modifies the same CI failure bot files (TRANSIENT_FAILURE_MARKERS, test_analyze_failure.py, reusable-bot-ci-failure.yml) for related transient-failure detection updates.

Suggested labels

github_actions, helper-bots

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Bug Fixes	❓ Inconclusive	Unable to examine pull request changes - shell command execution not available in current environment.	Review the git diff output manually or provide the file contents to assess if bugs mentioned in the PR are properly fixed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title follows the required format with [chores:fix] prefix and clearly describes the main changes: fixing CI failure bot retry logic and narrowing transient error identifiers.
Description check	✅ Passed	Description includes all required sections with sufficient detail, though documentation update is marked as N/A and no screenshot is provided (not applicable for this type of change).

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-ci-failure-bot-retry

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coveralls · 2026-03-16T20:04:55Z

coverage: 97.348%. remained the same
when pulling 13cea34 on fix-ci-failure-bot-retry
into 30816ea on master.

nemesifier added 2 commits March 16, 2026 16:39

[chores:fix] Fixed counter in CI failure bot

a0c4923

[chores:fix] Made TRANSIENT_FAILURE_MARKERS more specific

13cea34

nemesifier self-assigned this Mar 16, 2026

nemesifier added the bug label Mar 16, 2026

nemesifier added this to OpenWISP Priorities for next releases Mar 16, 2026

github-project-automation Bot moved this to In progress in OpenWISP Priorities for next releases Mar 16, 2026

nemesifier changed the title ~~[chroes:fix] Fixed CI failure bot retry and narrowed down transient error identifiers~~ [chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers Mar 16, 2026

coderabbitai Bot added github_actions Pull requests that update GitHub Actions code helper-bots Helper bots, release management automation labels Mar 16, 2026

coderabbitai Bot approved these changes Mar 16, 2026

View reviewed changes

github-project-automation Bot moved this from In progress to Reviewer approved in OpenWISP Priorities for next releases Mar 16, 2026

nemesifier merged commit cc39994 into master Mar 16, 2026
36 checks passed

nemesifier deleted the fix-ci-failure-bot-retry branch March 16, 2026 20:04

github-project-automation Bot moved this from Reviewer approved to Done in OpenWISP Priorities for next releases Mar 16, 2026

coderabbitai Bot mentioned this pull request Mar 16, 2026

[ci] CI Failure Bot: run only on pull requests + OSError transient marker #627

Merged

2 tasks

coderabbitai Bot mentioned this pull request Apr 10, 2026

[ci] Added Selenium connection failure to transient markers #648

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers#626

[chores:fix] Fixed CI failure bot retry and narrowed down transient error identifiers#626
nemesifier merged 2 commits intomasterfrom
fix-ci-failure-bot-retry

nemesifier commented Mar 16, 2026

Uh oh!

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Pre-merge checks failed

Uh oh!

Uh oh!

coveralls commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nemesifier commented Mar 16, 2026

Checklist

Reference to Existing Issue

Description of Changes

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Pre-merge checks failed

❌ Failed checks (1 inconclusive)

Uh oh!

Uh oh!

coveralls commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading