Increase Fedora VM startup timeout to 20 minutes#2184
Conversation
30Gi DataVolume cloning can exceed the previous 10-minute hardcoded timeout, causing test flakes where the VM starts seconds after timeout. Add configurable StartupTimeout field to VmBackupRestoreCase and set 20 minutes for the Fedora todolist test. Other tests default to 10m. Closes: openshift#2183 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
WalkthroughA configurable Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Pull request overview
This PR reduces flakes in the KubeVirt VM e2e suite by making the “VM startup” wait configurable per test case, allowing slower-provisioning VMs (notably Fedora with a large cloned DataVolume) to have a longer startup window.
Changes:
- Added a
StartupTimeoutfield toVmBackupRestoreCaseand used it when waiting for a VM to reachRunning. - Defaulted the startup timeout to 10 minutes when
StartupTimeoutis not set. - Set the Fedora todolist VM test to use a 20-minute startup timeout.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/e2e/virt_backup_restore_suite_test.go (1)
102-106: 🏗️ Heavy liftAdd a DataVolume-ready gate before polling VM
Runningfor clearer failures.Line 106 currently waits only on VM state, so slow clone/import paths still surface as generic startup timeouts. A pre-check for DataVolume
Succeeded(when the VM uses a DataVolume) would make this path much less flaky and far easier to diagnose.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/virt_backup_restore_suite_test.go` around lines 102 - 106, Before polling for VM Running, add a pre-check that, if the test VM uses a DataVolume, waits until that DataVolume reaches phase "Succeeded" (or returns an error if it fails) to avoid surfacing slow clone/imports as generic VM startup timeouts; locate the VM startup block around wait.PollUntilContextTimeout and, using the same startupTimeout and a 10s poll interval, poll the DataVolume object (via the CDI client used elsewhere in the test) by name/namespace and only proceed to the existing VM Running poll once DataVolume.Status.Phase == "Succeeded" (or fail early on an error phase), ensuring you reference brCase to get the VM spec/info and reuse the existing context and timeout variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/e2e/virt_backup_restore_suite_test.go`:
- Around line 102-106: Before polling for VM Running, add a pre-check that, if
the test VM uses a DataVolume, waits until that DataVolume reaches phase
"Succeeded" (or returns an error if it fails) to avoid surfacing slow
clone/imports as generic VM startup timeouts; locate the VM startup block around
wait.PollUntilContextTimeout and, using the same startupTimeout and a 10s poll
interval, poll the DataVolume object (via the CDI client used elsewhere in the
test) by name/namespace and only proceed to the existing VM Running poll once
DataVolume.Status.Phase == "Succeeded" (or fail early on an error phase),
ensuring you reference brCase to get the VM spec/info and reuse the existing
context and timeout variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c8e92d65-0c38-4477-a32a-c06d33c15ecc
📒 Files selected for processing (1)
tests/e2e/virt_backup_restore_suite_test.go
|
/cherry-pick oadp-1.6 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kaovilai: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
d18cc6c
into
openshift:oadp-dev
|
@kaovilai: new pull request created: #2195 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Why the changes were made
Fixes #2183
The Fedora todolist VM test clones a 30Gi DataVolume, which can take longer than the previous hardcoded 10-minute startup timeout. In the failing CI run, the VM reached "Running" state just 4 seconds after the timeout expired, causing a test flake.
This PR:
StartupTimeoutfield toVmBackupRestoreCaseHow to test the changes made
TEST_VIRT=true make test-e2eNote
Responses generated with Claude
🤖 Generated with Claude Code
Summary by CodeRabbit