Skip to content

[fix] Removed 'FAILED (' from strict markers to unblock auto-retry#655

Open
stktyagi wants to merge 5 commits intomasterfrom
fix/transient-error-markers
Open

[fix] Removed 'FAILED (' from strict markers to unblock auto-retry#655
stktyagi wants to merge 5 commits intomasterfrom
fix/transient-error-markers

Conversation

@stktyagi
Copy link
Copy Markdown
Member

@stktyagi stktyagi commented Apr 24, 2026

Removed 'FAILED' keyword from strict failure markers which caused blocking of re-run mechanism

Checklist

  • I have read the OpenWISP Contributing Guidelines.
  • I have manually tested the changes proposed in this pull request.
  • I have written new test cases for new code and/or updated existing tests for changes to existing code.
  • I have updated the documentation.

Description of Changes

Existence of 'FAILED' keyword in strict failure markers blocked re-run mechanism due its occurence in any case of failure in CI failure logs.

Removed 'FAILED' keyword from strict failure markers which caused blocking of re-run mechanism
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Two changes in the CI failure analysis logic and its tests: the "FAILED (" string was removed from STRICT_TEST_FAILURE_MARKERS in .github/actions/bot-ci-failure/analyze_failure.py, and "selenium.common.exceptions.WebDriverException" was added to TRANSIENT_FAILURE_MARKERS. A new unit test was added to ensure process_error_logs does not treat a unittest terminal summary (FAILED (errors=1)) as a strict failure when the underlying logs indicate a transient WebDriver/infrastructure error.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

bug, github_actions, helper-bots

Suggested reviewers

  • nemesifier
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required [type] format with 'fix' prefix and clearly describes the main change: removing 'FAILED (' from strict markers to unblock auto-retry.
Description check ✅ Passed The description includes most required sections: checklist items marked, reference context provided, and changes described. However, the 'Reference to Existing Issue' section is missing the issue number link.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Bug Fixes ✅ Passed The pull request properly fixes the root cause by removing the problematic 'FAILED (' marker from STRICT_TEST_FAILURE_MARKERS and adding WebDriverException to transient markers. A regression test validates the exact bug scenario with deterministic hardcoded log data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/transient-error-markers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added bug github_actions Pull requests that update GitHub Actions code helper-bots Helper bots, release management automation labels Apr 24, 2026
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Apr 24, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

The change removes "FAILED (" from the strict failure markers and "selenium.common.exceptions.WebDriverException" from transient markers to fix the auto-retry mechanism being incorrectly blocked.

Incremental Changes (since last review):

  • Also removed "selenium.common.exceptions.WebDriverException" from TRANSIENT_FAILURE_MARKERS
  • This broad exception class is no longer needed since more specific markers (InvalidSessionIdException, about:neterror) remain
  • Removed obsolete test test_transient_ignores_unittest_failed_summary from test_analyze_failure.py

Analysis:

  • The remaining markers ("FAIL:" and "AssertionError" for strict; InvalidSessionIdException, about:neterror for transient) are more specific indicators
  • This is a minimal, targeted fix that addresses the false positive issue described in the PR
  • No security vulnerabilities or critical bugs introduced
Files Reviewed (2 files)
  • .github/actions/bot-ci-failure/analyze_failure.py - Removes "FAILED (" from STRICT_TEST_FAILURE_MARKERS and "selenium.common.exceptions.WebDriverException" from TRANSIENT_FAILURE_MARKERS
  • .github/actions/bot-ci-failure/test_analyze_failure.py - Removes obsolete test for the deleted markers

Reviewed by kimi-k2.5-0127 · 134,461 tokens

coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 24, 2026
Ensures that the standard 'FAILED (errors=x)' summary appended by the unittest framework does not falsely override transient crash detection and block the auto-retry mechanism.
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 24, 2026

Coverage Status

coverage: 97.529%. remained the same — fix/transient-error-markers into master

@openwisp-companion
Copy link
Copy Markdown

Test Failures in CI

Hello @stktyagi,
(Analysis for commit 3b35a46)

The CI pipeline failed due to a test failure in the bot-ci-failure action.

Failure:
The test test_transient_ignores_unittest_failed_summary in .github/actions/bot-ci-failure/test_analyze_failure.py failed with an AssertionError. The test expects tests_failed to be False, but it received True. This indicates that the process_error_logs function incorrectly identified a transient error (WebDriverException: Reached error page: about:neterror) as a regular test failure.

Fix:
Review the logic in process_error_logs within .github/actions/bot-ci-failure/analyze_failure.py to ensure that transient errors like WebDriverException are correctly handled and do not increment the tests_failed count. Specifically, the function should correctly identify and categorize the about:neterror page as a transient issue, preventing it from being counted as a failed test.

Added WebDriverException in transient failure error markers
Added WebDriverException in transient failure error markers
@openwisp-companion
Copy link
Copy Markdown

Test Failures in CI

Hello @stktyagi,
(Analysis for commit 925b0cf)

There are 3 test failures in the CI logs:

  1. TestIsTransientFailure.test_detects_coveralls: This test failed because it expected True but received False. The _is_transient_failure function likely did not correctly identify the Coveralls error message as transient.
  • Fix: Review the logic in _is_transient_failure within .github/actions/bot-ci-failure/test_analyze_failure.py to ensure it correctly handles Coveralls-related error messages.
  1. TestProcessErrorLogs.test_coveralls_only_is_transient: This test failed because it expected True for the transient flag but received False. This indicates that the process_error_logs function did not correctly classify the Coveralls error as transient.
  • Fix: Examine the process_error_logs function in .github/actions/bot-ci-failure/test_analyze_failure.py to ensure it correctly identifies and flags Coveralls errors as transient.
  1. TestProcessErrorLogs.test_transient_ignores_unittest_failed_summary: This test failed because it expected False for tests_failed but received True. This suggests that the process_error_logs function incorrectly identified a unittest failure summary as a transient error, overriding the actual test failure count.
  • Fix: Adjust the logic in process_error_logs to correctly differentiate between transient errors and standard test failures, ensuring that unittest failure summaries are not misinterpreted.

@stktyagi
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Full review triggered.

@stktyagi
Copy link
Copy Markdown
Member Author

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Comments resolved and changes approved.

Copy link
Copy Markdown
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stktyagi I am thinking.. we are trying to figure out which of these 3 cases we're in:

  1. tests failed due to flakyness but also has real test failures
  2. tests failed only due to flakyness
  3. tests failed only due to real test failures

I think the problem is mainly discern between 1 and 2. Implementing this with text patterns and make it work across different programming languages (python, lua, nodejs, etc) will be super tricky.

Maybe we can make it work for Python, but I am skeptical this will work also for the nodejs repos.

Are you sure this is the right approach?
Why don't we give all the context to the LLM and ask it to tell us if we are under case 1 or 2? The LLM should be good at this. We could do this in a separate API request, which has the sole goal of understanding if we are falling under case 1 or case 2, based on the result of the response we'll either restart the build or not, what do you think of this?

@stktyagi
Copy link
Copy Markdown
Member Author

what do you think of this?

My main concern with that approach would be handling hallucinations, we would need to be deterministic with whether a CI failure is due to actual test failure or flakyness. Also, while Node.js and Python format tracebacks differently, the actual infrastructure crashes we want to forgive are mostly consistent across all CIs and even if we encounter new ones, I feel including them iteratively until we reach max coverage seems fine.

It feels like a trade-off :- Non-deterministic approach would be language versatile but might hallucinate, whereas current approach would be deterministic but we would need to consistently monitor new transient error inclusions and iterations. Depends which seems better to us.

@openwisp-companion
Copy link
Copy Markdown

All CI checks passed

Hello @stktyagi,
(Analysis for commit 6bb9def)

No failures were detected in the CI logs.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/actions/bot-ci-failure/test_analyze_failure.py:
- Around line 341-354: Add a negative regression test to ensure a unittest
summary "FAILED (errors=1)" does not cancel a real error classification when
there is a non-transient "ERROR:" traceback; create a new test (e.g.,
test_unittest_failed_with_real_error_blocks_retry) that builds a log string
containing a real "ERROR:"/traceback (no transient WebDriverException markers)
plus the "FAILED (errors=1)" summary, call process_error_logs and assert
tests_failed is True and transient_only is False; reference the existing
test_transient_ignores_unittest_failed_summary and the flags
has_strict_failure/is_transient in analyze_failure.py as the behavior you are
locking in.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 36263c46-7352-42ba-8d54-5612a5a4b122

📥 Commits

Reviewing files that changed from the base of the PR and between c1111f0 and 6bb9def.

📒 Files selected for processing (2)
  • .github/actions/bot-ci-failure/analyze_failure.py
  • .github/actions/bot-ci-failure/test_analyze_failure.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Python==3.13 | django~=5.1.0
  • GitHub Check: Python==3.11 | django~=5.0.0
  • GitHub Check: Python==3.10 | django~=5.2.0
  • GitHub Check: Python==3.13 | django~=5.2.0
  • GitHub Check: Python==3.12 | django~=5.0.0
  • GitHub Check: Python==3.10 | django~=4.2.0
  • GitHub Check: Python==3.11 | django~=5.2.0
  • GitHub Check: Python==3.10 | django~=5.1.0
  • GitHub Check: Python==3.12 | django~=5.2.0
  • GitHub Check: Python==3.12 | django~=4.2.0
  • GitHub Check: Python==3.11 | django~=5.1.0
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Bug Fixes: Ensure the test is deterministic and not flaky - flag tests that depend on timing, sleeps, specific timezones, system time, randomness without fixed seed, race conditions, concurrency timing, network access, external services, filesystem state, environment-specific configuration, execution order, shared global state, hardcoded ports, or unawaited async operations
Learnt from: pushpitkamboj
Repo: openwisp/openwisp-utils PR: 584
File: .github/workflows/reusable-bot-changelog.yml:49-49
Timestamp: 2026-03-05T09:38:10.320Z
Learning: In openwisp-utils, PR title prefixes are strictly limited to `[feature]`, `[fix]`, and `[change]` (exact bracketed tags, no scoping/sub-types). The regex `^\[(feature|fix|change)\]` in `.github/workflows/reusable-bot-changelog.yml` is intentional and correct — scoped variants like `[feature/bots]` are not valid and should not be matched.
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Features: Add tests for new features and ensure coverage does not decrease significantly; prefer Selenium browser tests for UI-impacting features
Learnt from: CR
Repo: openwisp/openwisp-utils PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-14T20:44:14.568Z
Learning: Bug Fixes: If the bug affects the user interface, include a Selenium browser test; if missing, raise a warning
🔇 Additional comments (2)
.github/actions/bot-ci-failure/analyze_failure.py (2)

10-13: Removal of "FAILED (" from strict markers looks safe.

Genuine unittest failures still emit per-test FAIL: <test> or ERROR: <test> lines (caught by STRICT_TEST_FAILURE_MARKERS/GENERIC_TEST_FAILURE_MARKERS), and assertion bugs still surface via AssertionError. The terminal FAILED (errors=N, failures=M) line is purely a summary, so dropping it from the strict set unblocks the transient-retry path without weakening detection of real failures. The new regression test in test_analyze_failure.py pins this behavior.


37-37: Adding base-class WebDriverException as a transient marker — accepting the false-positive trade-off.

selenium.common.exceptions.WebDriverException is the base class of most Selenium exceptions (NoSuchElementException, ElementClickInterceptedException, TimeoutException, …). In practice that's fine for substring matching because Python tracebacks print the concrete subclass FQN, so this marker will only fire when the runtime raises WebDriverException directly (typical for browser/marionette crashes, about:neterror, session teardown).

However, it does mean any future code path or test helper that raises a bare WebDriverException — including legitimate test logic bugs — will now be classified as transient and will forgive co-located ERROR: / Traceback markers in the same job, potentially masking a real failure and triggering an auto-retry loop. This iterative pattern-matching trade-off has been accepted.

Minor note: selenium.common.exceptions.InvalidSessionIdException on line 36 is a WebDriverException subclass, but both entries are needed because tracebacks print the concrete subclass, so explicit subclass entries remain useful for matching.

Comment on lines +341 to +354
def test_transient_ignores_unittest_failed_summary(self):
"""Ensure unittest's 'FAILED (errors=1)' summary does not override transient crashes."""
content = (
"===== JOB 5 =====\n"
"Traceback (most recent call last):\n"
"selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror\n"
"----------------------------------------------------------------------\n"
"Ran 367 tests in 311.148s\n\n"
"FAILED (errors=1)\n"
)
text, tests_failed, transient_only = process_error_logs(content)
self.assertFalse(tests_failed)
self.assertTrue(transient_only)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM — regression test correctly pins the fix.

The scenario exercises exactly the buggy classification path: transient WebDriverException plus a generic Traceback plus the unittest FAILED (errors=1) summary. With the marker changes in analyze_failure.py, has_strict_failure is False, is_transient is True, so the transient branch forgives the generic traceback and the unittest summary, yielding tests_failed=False, transient_only=True as asserted.

One small follow-up worth considering (optional): also add a negative test where FAILED (errors=1) appears with a real ERROR: traceback but no transient marker, asserting tests_failed=True. test_pure_generic_bug_blocks_retry covers the spirit of this, but a variant that explicitly contains the FAILED (errors=N) summary would lock in the contract that the summary line alone never flips classification either way.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/actions/bot-ci-failure/test_analyze_failure.py around lines 341 -
354, Add a negative regression test to ensure a unittest summary "FAILED
(errors=1)" does not cancel a real error classification when there is a
non-transient "ERROR:" traceback; create a new test (e.g.,
test_unittest_failed_with_real_error_blocks_retry) that builds a log string
containing a real "ERROR:"/traceback (no transient WebDriverException markers)
plus the "FAILED (errors=1)" summary, call process_error_logs and assert
tests_failed is True and transient_only is False; reference the existing
test_transient_ignores_unittest_failed_summary and the flags
has_strict_failure/is_transient in analyze_failure.py as the behavior you are
locking in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug github_actions Pull requests that update GitHub Actions code helper-bots Helper bots, release management automation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants