NO-JIRA: Fix flaky observability and telemetry test teardown#6578
Conversation
Add retry logic to Loki queries in observability tests to handle the race condition between OTEL collector restart and Loki data ingestion. Add healthcheck to telemetry suite teardown to prevent vg-manager CrashLoopBackOff from accumulating across rapid MicroShift restarts, which caused intermittent storage test failures in shared scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> pre-commit.check-secrets: ENABLED
|
@agullon: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughTwo robot test suites updated with resilience improvements. Observability suite now retries Loki query verification up to 10 times over 50 seconds. Telemetry suite adds ostree-health resource helper and waits for MicroShift healthcheck in teardown sequence. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/suites/telemetry/telemetry.robot`:
- Around line 112-116: Teardown currently calls "Wait For MicroShift Healthcheck
Success" directly which can abort the teardown when it fails; update the
teardown to run the healthcheck with continue-on-failure semantics (e.g., wrap
"Wait For MicroShift Healthcheck Success" with a continue-on-failure keyword
such as "Run Keyword And Ignore Error" or "Run Keyword And Continue On Failure")
so that "Logout MicroShift Host" and "Remove Kubeconfig" always execute; modify
the Teardown sequence to call the healthcheck via that wrapper while leaving
"Logout MicroShift Host" and "Remove Kubeconfig" unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: df48e8af-9f98-48cf-8dac-80204ac18a81
📒 Files selected for processing (2)
test/suites/optional/observability.robottest/suites/telemetry/telemetry.robot
|
/cherrypick release-4.22 |
|
@agullon: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: agullon, kasturinarra The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by CI |
|
@agullon: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@agullon: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/retest |
|
@agullon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@agullon: new pull request created: #6579 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
vg-managerCrashLoopBackOff from accumulating across rapid MicroShift restartsTest plan
el98-lrel@optionalobservability Loki tests pass consistentlyel98-lrel@storage-telemetrystorage reboot test passes consistently🤖 Generated with Claude Code