Skip to content

Increase Fedora VM startup timeout to 20 minutes#2184

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:oadp-devfrom
kaovilai:fix/kubevirt-disk-timeout
May 5, 2026
Merged

Increase Fedora VM startup timeout to 20 minutes#2184
openshift-merge-bot[bot] merged 1 commit into
openshift:oadp-devfrom
kaovilai:fix/kubevirt-disk-timeout

Conversation

@kaovilai
Copy link
Copy Markdown
Member

@kaovilai kaovilai commented Apr 30, 2026

Why the changes were made

Fixes #2183

The Fedora todolist VM test clones a 30Gi DataVolume, which can take longer than the previous hardcoded 10-minute startup timeout. In the failing CI run, the VM reached "Running" state just 4 seconds after the timeout expired, causing a test flake.

This PR:

  • Adds a configurable StartupTimeout field to VmBackupRestoreCase
  • Sets 20 minutes for the Fedora todolist test (large disk)
  • Defaults to 10 minutes for other tests (CirrOS, small disk)

How to test the changes made

  • Run kubevirt e2e tests: TEST_VIRT=true make test-e2e
  • Verify Fedora todolist test passes with the increased timeout
  • Verify CirrOS tests still use the default 10-minute timeout

Note

Responses generated with Claude

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Enhanced VM backup and restore test reliability with configurable startup timeout settings per test case.

30Gi DataVolume cloning can exceed the previous 10-minute hardcoded
timeout, causing test flakes where the VM starts seconds after timeout.
Add configurable StartupTimeout field to VmBackupRestoreCase and set
20 minutes for the Fedora todolist test. Other tests default to 10m.

Closes: openshift#2183

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Copilot AI review requested due to automatic review settings April 30, 2026 18:36
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Walkthrough

A configurable StartupTimeout field is added to the VmBackupRestoreCase struct to allow per-test VM startup timeouts. The polling logic for VM status is updated to use this timeout, defaulting to 10 minutes. The Fedora todolist test case now specifies a 20-minute timeout to accommodate slower DataVolume provisioning.

Changes

Cohort / File(s) Summary
VM Startup Timeout Configuration
tests/e2e/virt_backup_restore_suite_test.go
Added StartupTimeout field to VmBackupRestoreCase struct and updated runVmBackupAndRestore to apply this timeout when polling VM status until Running. Fedora todolist test case configured with 20-minute timeout instead of the 10-minute default.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning 8 new Ginkgo test entries use KubeVirt APIs (kubevirt.io, cdi.kubevirt.io, hco.kubevirt.io) and OLM resources unavailable on MicroShift without protective mechanisms. Add [apigroup:kubevirt.io] tags or [Skipped:MicroShift] labels to exclude tests from MicroShift CI.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning PR adds new e2e test file with ginkgo test cases requiring external internet connectivity to download.cirros-cloud.net Use pre-cached or internally-available CirrOS image, add conditional fallback for disconnected environments, or add [Skipped:Disconnected] annotation
✅ Passed checks (9 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: increasing Fedora VM startup timeout from 10 to 20 minutes, which is directly reflected in the code changes.
Description check ✅ Passed The description is well-structured, follows the template, explains the root cause, implementation approach, and provides testing instructions.
Linked Issues check ✅ Passed The PR directly addresses issue #2183 by implementing the primary recommended action: increasing the VM startup timeout to 20 minutes for large DataVolume tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the timeout issue. The PR adds configurable StartupTimeout field and applies it appropriately without introducing unrelated modifications.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names are stable and deterministic, using static strings that clearly describe test validation without dynamic information like timestamps or generated identifiers.
Test Structure And Quality ✅ Passed Pull request maintains solid test structure with proper Ginkgo patterns, timeouts, and backward compatibility.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies timeout configurations for existing tests without adding new test definitions, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test code to add configurable VM startup timeout. No Kubernetes scheduling constraints, deployment manifests, or operator controller code changes.
Ote Binary Stdout Contract ✅ Passed The PR does not violate the OTE Binary Stdout Contract. It only adds configuration fields to a struct and uses them conditionally within test execution code. Existing logging uses Go's standard log package which defaults to stderr, not stdout, and the PR introduces no new stdout writes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@kaovilai kaovilai added AI-assisted A label to indicate that AI models were used in full or in part to create this PR. ai-generated-test ai-gen-bugfix labels Apr 30, 2026
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces flakes in the KubeVirt VM e2e suite by making the “VM startup” wait configurable per test case, allowing slower-provisioning VMs (notably Fedora with a large cloned DataVolume) to have a longer startup window.

Changes:

  • Added a StartupTimeout field to VmBackupRestoreCase and used it when waiting for a VM to reach Running.
  • Defaulted the startup timeout to 10 minutes when StartupTimeout is not set.
  • Set the Fedora todolist VM test to use a 20-minute startup timeout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/e2e/virt_backup_restore_suite_test.go (1)

102-106: 🏗️ Heavy lift

Add a DataVolume-ready gate before polling VM Running for clearer failures.

Line 106 currently waits only on VM state, so slow clone/import paths still surface as generic startup timeouts. A pre-check for DataVolume Succeeded (when the VM uses a DataVolume) would make this path much less flaky and far easier to diagnose.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/virt_backup_restore_suite_test.go` around lines 102 - 106, Before
polling for VM Running, add a pre-check that, if the test VM uses a DataVolume,
waits until that DataVolume reaches phase "Succeeded" (or returns an error if it
fails) to avoid surfacing slow clone/imports as generic VM startup timeouts;
locate the VM startup block around wait.PollUntilContextTimeout and, using the
same startupTimeout and a 10s poll interval, poll the DataVolume object (via the
CDI client used elsewhere in the test) by name/namespace and only proceed to the
existing VM Running poll once DataVolume.Status.Phase == "Succeeded" (or fail
early on an error phase), ensuring you reference brCase to get the VM spec/info
and reuse the existing context and timeout variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/e2e/virt_backup_restore_suite_test.go`:
- Around line 102-106: Before polling for VM Running, add a pre-check that, if
the test VM uses a DataVolume, waits until that DataVolume reaches phase
"Succeeded" (or returns an error if it fails) to avoid surfacing slow
clone/imports as generic VM startup timeouts; locate the VM startup block around
wait.PollUntilContextTimeout and, using the same startupTimeout and a 10s poll
interval, poll the DataVolume object (via the CDI client used elsewhere in the
test) by name/namespace and only proceed to the existing VM Running poll once
DataVolume.Status.Phase == "Succeeded" (or fail early on an error phase),
ensuring you reference brCase to get the VM spec/info and reuse the existing
context and timeout variables.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8e92d65-0c38-4477-a32a-c06d33c15ecc

📥 Commits

Reviewing files that changed from the base of the PR and between 36d616e and 3af1c02.

📒 Files selected for processing (1)
  • tests/e2e/virt_backup_restore_suite_test.go

@kaovilai
Copy link
Copy Markdown
Member Author

/cherry-pick oadp-1.6

@openshift-cherrypick-robot
Copy link
Copy Markdown
Contributor

@kaovilai: once the present PR merges, I will cherry-pick it on top of oadp-1.6 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick oadp-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 30, 2026

@kaovilai: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.23-e2e-test-aws 3af1c02 link false /test 4.23-e2e-test-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kaovilai kaovilai added the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit d18cc6c into openshift:oadp-dev May 5, 2026
18 of 19 checks passed
@openshift-cherrypick-robot
Copy link
Copy Markdown
Contributor

@kaovilai: new pull request created: #2195

Details

In response to this:

/cherry-pick oadp-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-assisted A label to indicate that AI models were used in full or in part to create this PR. ai-gen-bugfix ai-generated-test approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

e2e Kubevirt disk timeout issues

5 participants