Skip to content

SPLAT-2715: Fixed broken test and simplified some logic#182

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
vr4manta:SPLAT-2715
Apr 16, 2026
Merged

SPLAT-2715: Fixed broken test and simplified some logic#182
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
vr4manta:SPLAT-2715

Conversation

@vr4manta
Copy link
Copy Markdown
Contributor

@vr4manta vr4manta commented Apr 6, 2026

SPLAT-2715

Changes

  • Removed unneeded check for host id status field since BYO does not populate it
  • Simplified machine clean up logic.
  • Changed By() logic to use GinkgoWriter for log messages w/ variables

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 6, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 6, 2026

@vr4manta: This pull request references SPLAT-2715 which is a valid jira issue.

Details

In response to this:

SPLAT-2715

Changes

  • Removed unneeded check for host id status field since BYO does not populate it
  • Simplified machine clean up logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaced context-level and per-test deferred cleanup in dedicated-host e2e tests with per-test DeferCleanup calls to cleanupMachineAndNode(...); waitForMachineDedicatedHostID(...) now returns the discovered dedicated host ID and only asserts equality when an expected ID is provided; dynamic test retrieves host ID via the modified wait, calls cleanup before verifying host state transitions, and treats "host not found"/invalid-host-type errors as acceptable.

Changes

Cohort / File(s) Summary
Dedicated host e2e test
openshift-tests/test/e2e/dedicated_hosts.go
Removed context-level AfterEach and per-test defer cleanup blocks; added cleanupMachineAndNode(...) and register per-test cleanup via DeferCleanup immediately after machine creation; changed waitForMachineDedicatedHostID(...) to return the discovered host ID and only compare when expectedHostID is provided; BYO test no longer waits for provider host ID population; dynamic test now retrieves host ID with the new wait, invokes cleanup before asserting host release, and treats "not found"/invalid-host-type errors (checked via strings.Contains) as success while continuing retries on other errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@vr4manta
Copy link
Copy Markdown
Contributor Author

vr4manta commented Apr 6, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 6, 2026

@vr4manta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/83ed4120-31bd-11f1-8ce9-02ef516c0ad0-0

@vr4manta
Copy link
Copy Markdown
Contributor Author

vr4manta commented Apr 6, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 6, 2026

@vr4manta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d9c12930-31e6-11f1-8710-50ce8d8e61a0-0

@vr4manta
Copy link
Copy Markdown
Contributor Author

vr4manta commented Apr 8, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d2767c60-3340-11f1-9415-ca7b3f77fd00-0

@vr4manta vr4manta marked this pull request as ready for review April 8, 2026 16:59
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026
@openshift-ci openshift-ci Bot requested review from chrischdi and racheljpg April 8, 2026 16:59
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 520-528: The current Eventually block masks all errors by
returning true whenever getDedicatedHostState(ctx, ec2Client, dynamicHostID)
returns an error; update the error handling so you only treat a "host not found"
/ "already deleted" error as a success and propagate or fail on any other
errors: call getDedicatedHostState, if err indicates a not-found (inspect error
type/message), log the not-found case and return true, but for other errors
log/return false (or surface the error via Expect/Fail) so transient
network/perm issues don't make the test pass; keep the state check for state ==
"released" || state == "pending" and reference getDedicatedHostState, ec2Client
and dynamicHostID when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 004ce3d7-c30e-4de0-8e38-613e5be6f829

📥 Commits

Reviewing files that changed from the base of the PR and between 37a0672 and 819295c.

📒 Files selected for processing (1)
  • openshift-tests/test/e2e/dedicated_hosts.go

Comment thread openshift-tests/test/e2e/dedicated_hosts.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
openshift-tests/test/e2e/dedicated_hosts.go (1)

521-534: ⚠️ Potential issue | 🟠 Major

Keep the Eventually poll retryable for non-not-found AWS errors.

Fail(...) inside the polling callback aborts the check on the first transient DescribeHosts error. That makes this release verification flaky under throttling/network blips instead of retrying until timeout. Log and return false for retryable errors; only treat confirmed not-found as success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift-tests/test/e2e/dedicated_hosts.go` around lines 521 - 534, The
polling callback passed to Eventually aborts on any non-not-found error because
it calls Fail(...) inside the closure; change the logic in the closure that
calls getDedicatedHostState(ctx, ec2Client, dynamicHostID) so that only
recognized "not found" errors (strings.Contains checks) return true, while all
other errors are logged with GinkgoWriter.Printf (including error details and
dynamicHostID) and the closure returns false to allow Eventually to retry until
timeout; remove the Fail(...) call from this callback so transient DescribeHosts
errors don't abort the poll.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 893-920: The code captures nodeName from m.Status.NodeRef only
once before deletion, so if NodeRef is unset the helper skips waiting for node
removal and a late-registering node can leak; update the cleanup in the
dedicated hosts test to, when nodeName is empty, poll the Machine resource
(using client.Machines in machineutil.MachineAPINamespace and the machineName)
after deletion succeeds to detect if/when Status.NodeRef becomes non-nil (or
stop polling if the Machine itself becomes NotFound), set nodeName from
m.Status.NodeRef.Name once observed, and then proceed to call
kubeClient.CoreV1().Nodes().Get in the existing Eventually block to wait for the
node to be removed; ensure you still handle apierrors.IsNotFound for both
machine and node cases.
- Around line 396-397: Register the teardown immediately after a successful
Create instead of only at the end of the spec: right after the machine creation
succeeds, call/attach cleanupMachineAndNode(ctx, kubeConfig, kubeClient,
machineName) via the test framework's cleanup registration (e.g.,
t.Cleanup/DeferCleanup/framework.RegisterCleanupAction) so it will run on test
failures; apply the same pattern for the dedicated-host creation(s) that
currently only get cleaned at the end so both machine/node and dedicated-host
cleanups run even on early Expect failures.

---

Duplicate comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 521-534: The polling callback passed to Eventually aborts on any
non-not-found error because it calls Fail(...) inside the closure; change the
logic in the closure that calls getDedicatedHostState(ctx, ec2Client,
dynamicHostID) so that only recognized "not found" errors (strings.Contains
checks) return true, while all other errors are logged with GinkgoWriter.Printf
(including error details and dynamicHostID) and the closure returns false to
allow Eventually to retry until timeout; remove the Fail(...) call from this
callback so transient DescribeHosts errors don't abort the poll.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbbfef5b-ac86-47ca-9ee9-ce67486a83cd

📥 Commits

Reviewing files that changed from the base of the PR and between 819295c and 60263b9.

📒 Files selected for processing (1)
  • openshift-tests/test/e2e/dedicated_hosts.go

Comment thread openshift-tests/test/e2e/dedicated_hosts.go Outdated
Comment thread openshift-tests/test/e2e/dedicated_hosts.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
openshift-tests/test/e2e/dedicated_hosts.go (1)

907-933: ⚠️ Potential issue | 🟠 Major

Cleanup can stall before deletion when NodeRef never appears.

At Line 907, cleanup waits for Status.NodeRef before deleting the Machine. If the node never registers, this times out and Line 930 is never reached, leaving the Machine undeleted and potentially blocking host release checks.

Suggested fix
-	// If nodeName is empty, poll the machine to detect late-registering nodes
-	if nodeName == "" {
-		By(fmt.Sprintf("Node ref not set initially, polling machine %s for late-registering node", machineName))
-		Eventually(func() bool {
-			m, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
-			if apierrors.IsNotFound(err) {
-				// Machine is gone, no node registered
-				return true
-			}
-			if err != nil {
-				// Other error, continue polling
-				return false
-			}
-			if m.Status.NodeRef != nil {
-				nodeName = m.Status.NodeRef.Name
-				return true
-			}
-			return false
-		}, defaultTestTimeout, defaultPollingInterval).Should(BeTrue())
-	}
-
 	// Delete the machine
 	By(fmt.Sprintf("Cleaning up test machine %s", machineName))
 	err = client.Machines(machineutil.MachineAPINamespace).Delete(ctx, machineName, metav1.DeleteOptions{})
 	if err != nil && !apierrors.IsNotFound(err) {
 		Expect(err).NotTo(HaveOccurred())
 	}
 
-	// Wait for machine to be deleted
+	// Wait for machine deletion and capture NodeRef if it appears before final removal.
 	By(fmt.Sprintf("Waiting for machine %s to be deleted", machineName))
 	Eventually(func() bool {
-		_, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
-		return apierrors.IsNotFound(err)
+		m, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
+		if apierrors.IsNotFound(err) {
+			return true
+		}
+		if err != nil {
+			return false
+		}
+		if nodeName == "" && m.Status.NodeRef != nil {
+			nodeName = m.Status.NodeRef.Name
+		}
+		return false
 	}, defaultTestTimeout, defaultPollingInterval).Should(BeTrue())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift-tests/test/e2e/dedicated_hosts.go` around lines 907 - 933, The
current polling loop that waits for m.Status.NodeRef (when nodeName == "" in the
block using client.Machines(...).Get and Eventually) can hang the cleanup if the
node never registers; change the logic so the test does not block deletion:
perform a bounded/non-blocking poll for NodeRef (use a short timeout or context)
and then proceed to call client.Machines(...).Delete for machineName regardless
of whether nodeName was discovered, ensuring the Delete is always attempted even
if m.Status.NodeRef remained nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 907-933: The current polling loop that waits for m.Status.NodeRef
(when nodeName == "" in the block using client.Machines(...).Get and Eventually)
can hang the cleanup if the node never registers; change the logic so the test
does not block deletion: perform a bounded/non-blocking poll for NodeRef (use a
short timeout or context) and then proceed to call client.Machines(...).Delete
for machineName regardless of whether nodeName was discovered, ensuring the
Delete is always attempted even if m.Status.NodeRef remained nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ab38731-aa57-44fe-b513-16b9dbab39c1

📥 Commits

Reviewing files that changed from the base of the PR and between 60263b9 and 59517de.

📒 Files selected for processing (1)
  • openshift-tests/test/e2e/dedicated_hosts.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 539-541: The code calls Fail(fmt.Sprintf(...)) inside the
Eventually check which aborts the test instead of allowing retries; change the
Fail(...) call in the block that checks host state to log the unexpected error
(e.g., using framework.Logf, GinkgoWriter, or t.Logf) and then return false so
Eventually can retry on transient AWS API errors — locate the block referencing
dynamicHostID and replace the Fail(...) invocation with a non-fatal log plus
return false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d903dd0-03d2-480f-bdd1-99bc1ecb4730

📥 Commits

Reviewing files that changed from the base of the PR and between 59517de and e613b26.

📒 Files selected for processing (1)
  • openshift-tests/test/e2e/dedicated_hosts.go

Comment thread openshift-tests/test/e2e/dedicated_hosts.go Outdated
Copy link
Copy Markdown
Contributor

@mtulio mtulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2026
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2026
@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

@nrb
Copy link
Copy Markdown
Contributor

nrb commented Apr 14, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026
@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 15, 2026

@vr4manta: This pull request references SPLAT-2715 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

SPLAT-2715

Changes

  • Removed unneeded check for host id status field since BYO does not populate it
  • Simplified machine clean up logic.
  • Changed By() logic to use GinkgoWriter for log messages w/ variables

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

/approve

One Q.

/assign @nrb

Comment thread openshift-tests/test/e2e/dedicated_hosts.go
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2026
@damdo
Copy link
Copy Markdown
Member

damdo commented Apr 15, 2026

@vr4manta this looks ready to merge. Feel free to add the verified label once you are happy with the test working as expected.

@vr4manta
Copy link
Copy Markdown
Contributor Author

/verified by @vr4manta

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@vr4manta: This PR has been marked as verified by @vr4manta.

Details

In response to this:

/verified by @vr4manta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@nrb
Copy link
Copy Markdown
Contributor

nrb commented Apr 15, 2026

A lot of the failed e2e are from sig-node; is there a known issue with the test suite that's causing this?

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 59e6648 and 2 for PR HEAD 106e29e in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

@vr4manta: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit cba0ea7 into openshift:main Apr 16, 2026
14 checks passed
@vr4manta
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.22

@openshift-cherrypick-robot
Copy link
Copy Markdown

@vr4manta: new pull request created: #184

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants