SPLAT-2715: Fixed broken test and simplified some logic by vr4manta · Pull Request #182 · openshift/machine-api-provider-aws

vr4manta · 2026-04-06T13:34:28Z

Changes

Removed unneeded check for host id status field since BYO does not populate it
Simplified machine clean up logic.
Changed By() logic to use GinkgoWriter for log messages w/ variables

openshift-ci-robot · 2026-04-06T13:34:33Z

@vr4manta: This pull request references SPLAT-2715 which is a valid jira issue.

Details

In response to this:

SPLAT-2715

Changes

Removed unneeded check for host id status field since BYO does not populate it

Simplified machine clean up logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-04-06T13:34:33Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-04-06T13:34:36Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaced context-level and per-test deferred cleanup in dedicated-host e2e tests with per-test DeferCleanup calls to cleanupMachineAndNode(...); waitForMachineDedicatedHostID(...) now returns the discovered dedicated host ID and only asserts equality when an expected ID is provided; dynamic test retrieves host ID via the modified wait, calls cleanup before verifying host state transitions, and treats "host not found"/invalid-host-type errors as acceptable.

Changes

Cohort / File(s)	Summary
Dedicated host e2e test `openshift-tests/test/e2e/dedicated_hosts.go`	Removed context-level `AfterEach` and per-test `defer` cleanup blocks; added `cleanupMachineAndNode(...)` and register per-test cleanup via `DeferCleanup` immediately after machine creation; changed `waitForMachineDedicatedHostID(...)` to return the discovered host ID and only compare when `expectedHostID` is provided; BYO test no longer waits for provider host ID population; dynamic test now retrieves host ID with the new wait, invokes cleanup before asserting host release, and treats "not found"/invalid-host-type errors (checked via `strings.Contains`) as success while continuing retries on other errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vr4manta · 2026-04-06T13:35:44Z

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

openshift-ci · 2026-04-06T13:35:47Z

@vr4manta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/83ed4120-31bd-11f1-8ce9-02ef516c0ad0-0

vr4manta · 2026-04-06T18:31:37Z

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

openshift-ci · 2026-04-06T18:31:41Z

@vr4manta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2
periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d9c12930-31e6-11f1-8710-50ce8d8e61a0-0

vr4manta · 2026-04-08T11:48:11Z

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

openshift-ci · 2026-04-08T11:48:15Z

@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d2767c60-3340-11f1-9415-ca7b3f77fd00-0

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 520-528: The current Eventually block masks all errors by
returning true whenever getDedicatedHostState(ctx, ec2Client, dynamicHostID)
returns an error; update the error handling so you only treat a "host not found"
/ "already deleted" error as a success and propagate or fail on any other
errors: call getDedicatedHostState, if err indicates a not-found (inspect error
type/message), log the not-found case and return true, but for other errors
log/return false (or surface the error via Expect/Fail) so transient
network/perm issues don't make the test pass; keep the state check for state ==
"released" || state == "pending" and reference getDedicatedHostState, ec2Client
and dynamicHostID when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 004ce3d7-c30e-4de0-8e38-613e5be6f829

📥 Commits

Reviewing files that changed from the base of the PR and between 37a0672 and 819295c.

📒 Files selected for processing (1)

openshift-tests/test/e2e/dedicated_hosts.go

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

openshift-tests/test/e2e/dedicated_hosts.go (1)
521-534: ⚠️ Potential issue | 🟠 Major

Keep the Eventually poll retryable for non-not-found AWS errors.

Fail(...) inside the polling callback aborts the check on the first transient DescribeHosts error. That makes this release verification flaky under throttling/network blips instead of retrying until timeout. Log and return false for retryable errors; only treat confirmed not-found as success.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift-tests/test/e2e/dedicated_hosts.go` around lines 521 - 534, The
polling callback passed to Eventually aborts on any non-not-found error because
it calls Fail(...) inside the closure; change the logic in the closure that
calls getDedicatedHostState(ctx, ec2Client, dynamicHostID) so that only
recognized "not found" errors (strings.Contains checks) return true, while all
other errors are logged with GinkgoWriter.Printf (including error details and
dynamicHostID) and the closure returns false to allow Eventually to retry until
timeout; remove the Fail(...) call from this callback so transient DescribeHosts
errors don't abort the poll.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 893-920: The code captures nodeName from m.Status.NodeRef only
once before deletion, so if NodeRef is unset the helper skips waiting for node
removal and a late-registering node can leak; update the cleanup in the
dedicated hosts test to, when nodeName is empty, poll the Machine resource
(using client.Machines in machineutil.MachineAPINamespace and the machineName)
after deletion succeeds to detect if/when Status.NodeRef becomes non-nil (or
stop polling if the Machine itself becomes NotFound), set nodeName from
m.Status.NodeRef.Name once observed, and then proceed to call
kubeClient.CoreV1().Nodes().Get in the existing Eventually block to wait for the
node to be removed; ensure you still handle apierrors.IsNotFound for both
machine and node cases.
- Around line 396-397: Register the teardown immediately after a successful
Create instead of only at the end of the spec: right after the machine creation
succeeds, call/attach cleanupMachineAndNode(ctx, kubeConfig, kubeClient,
machineName) via the test framework's cleanup registration (e.g.,
t.Cleanup/DeferCleanup/framework.RegisterCleanupAction) so it will run on test
failures; apply the same pattern for the dedicated-host creation(s) that
currently only get cleaned at the end so both machine/node and dedicated-host
cleanups run even on early Expect failures.

---

Duplicate comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 521-534: The polling callback passed to Eventually aborts on any
non-not-found error because it calls Fail(...) inside the closure; change the
logic in the closure that calls getDedicatedHostState(ctx, ec2Client,
dynamicHostID) so that only recognized "not found" errors (strings.Contains
checks) return true, while all other errors are logged with GinkgoWriter.Printf
(including error details and dynamicHostID) and the closure returns false to
allow Eventually to retry until timeout; remove the Fail(...) call from this
callback so transient DescribeHosts errors don't abort the poll.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbbfef5b-ac86-47ca-9ee9-ce67486a83cd

📥 Commits

Reviewing files that changed from the base of the PR and between 819295c and 60263b9.

📒 Files selected for processing (1)

openshift-tests/test/e2e/dedicated_hosts.go

coderabbitai

♻️ Duplicate comments (1)

openshift-tests/test/e2e/dedicated_hosts.go (1)

907-933: ⚠️ Potential issue | 🟠 Major

Cleanup can stall before deletion when NodeRef never appears.

At Line 907, cleanup waits for Status.NodeRef before deleting the Machine. If the node never registers, this times out and Line 930 is never reached, leaving the Machine undeleted and potentially blocking host release checks.

Suggested fix

-	// If nodeName is empty, poll the machine to detect late-registering nodes
-	if nodeName == "" {
-		By(fmt.Sprintf("Node ref not set initially, polling machine %s for late-registering node", machineName))
-		Eventually(func() bool {
-			m, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
-			if apierrors.IsNotFound(err) {
-				// Machine is gone, no node registered
-				return true
-			}
-			if err != nil {
-				// Other error, continue polling
-				return false
-			}
-			if m.Status.NodeRef != nil {
-				nodeName = m.Status.NodeRef.Name
-				return true
-			}
-			return false
-		}, defaultTestTimeout, defaultPollingInterval).Should(BeTrue())
-	}
-
 	// Delete the machine
 	By(fmt.Sprintf("Cleaning up test machine %s", machineName))
 	err = client.Machines(machineutil.MachineAPINamespace).Delete(ctx, machineName, metav1.DeleteOptions{})
 	if err != nil && !apierrors.IsNotFound(err) {
 		Expect(err).NotTo(HaveOccurred())
 	}
 
-	// Wait for machine to be deleted
+	// Wait for machine deletion and capture NodeRef if it appears before final removal.
 	By(fmt.Sprintf("Waiting for machine %s to be deleted", machineName))
 	Eventually(func() bool {
-		_, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
-		return apierrors.IsNotFound(err)
+		m, err := client.Machines(machineutil.MachineAPINamespace).Get(ctx, machineName, metav1.GetOptions{})
+		if apierrors.IsNotFound(err) {
+			return true
+		}
+		if err != nil {
+			return false
+		}
+		if nodeName == "" && m.Status.NodeRef != nil {
+			nodeName = m.Status.NodeRef.Name
+		}
+		return false
 	}, defaultTestTimeout, defaultPollingInterval).Should(BeTrue())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@openshift-tests/test/e2e/dedicated_hosts.go` around lines 907 - 933, The
current polling loop that waits for m.Status.NodeRef (when nodeName == "" in the
block using client.Machines(...).Get and Eventually) can hang the cleanup if the
node never registers; change the logic so the test does not block deletion:
perform a bounded/non-blocking poll for NodeRef (use a short timeout or context)
and then proceed to call client.Machines(...).Delete for machineName regardless
of whether nodeName was discovered, ensuring the Delete is always attempted even
if m.Status.NodeRef remained nil.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 907-933: The current polling loop that waits for m.Status.NodeRef
(when nodeName == "" in the block using client.Machines(...).Get and Eventually)
can hang the cleanup if the node never registers; change the logic so the test
does not block deletion: perform a bounded/non-blocking poll for NodeRef (use a
short timeout or context) and then proceed to call client.Machines(...).Delete
for machineName regardless of whether nodeName was discovered, ensuring the
Delete is always attempted even if m.Status.NodeRef remained nil.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ab38731-aa57-44fe-b513-16b9dbab39c1

📥 Commits

Reviewing files that changed from the base of the PR and between 60263b9 and 59517de.

📒 Files selected for processing (1)

openshift-tests/test/e2e/dedicated_hosts.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift-tests/test/e2e/dedicated_hosts.go`:
- Around line 539-541: The code calls Fail(fmt.Sprintf(...)) inside the
Eventually check which aborts the test instead of allowing retries; change the
Fail(...) call in the block that checks host state to log the unexpected error
(e.g., using framework.Logf, GinkgoWriter, or t.Logf) and then return false so
Eventually can retry on transient AWS API errors — locate the block referencing
dynamicHostID and replace the Fail(...) invocation with a non-fatal log plus
return false.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d903dd0-03d2-480f-bdd1-99bc1ecb4730

📥 Commits

Reviewing files that changed from the base of the PR and between 59517de and e613b26.

📒 Files selected for processing (1)

openshift-tests/test/e2e/dedicated_hosts.go

mtulio

/lgtm

vr4manta · 2026-04-14T17:43:20Z

/retest

nrb · 2026-04-14T19:17:21Z

/lgtm

vr4manta · 2026-04-15T11:29:35Z

/retest

openshift-ci-robot · 2026-04-15T11:30:52Z

@vr4manta: This pull request references SPLAT-2715 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

SPLAT-2715

Changes

Removed unneeded check for host id status field since BYO does not populate it

Simplified machine clean up logic.

Changed By() logic to use GinkgoWriter for log messages w/ variables

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

damdo

Thanks

/approve

One Q.

/assign @nrb

openshift-ci · 2026-04-15T12:06:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [damdo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

damdo · 2026-04-15T14:56:23Z

@vr4manta this looks ready to merge. Feel free to add the verified label once you are happy with the test working as expected.

vr4manta · 2026-04-15T20:08:57Z

/verified by @vr4manta

openshift-ci-robot · 2026-04-15T20:09:11Z

@vr4manta: This PR has been marked as verified by @vr4manta.

Details

In response to this:

/verified by @vr4manta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

nrb · 2026-04-15T20:17:53Z

A lot of the failed e2e are from sig-node; is there a known issue with the test suite that's causing this?

openshift-merge-bot · 2026-04-15T20:34:17Z

/retest-required

Remaining retests: 0 against base HEAD 59e6648 and 2 for PR HEAD 106e29e in total

openshift-ci · 2026-04-16T03:19:34Z

@vr4manta: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

vr4manta · 2026-04-20T17:03:36Z

/cherry-pick release-4.22

openshift-cherrypick-robot · 2026-04-20T17:04:26Z

@vr4manta: new pull request created: #184

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 6, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2026

vr4manta force-pushed the SPLAT-2715 branch from 3dab898 to 819295c Compare April 6, 2026 18:31

vr4manta marked this pull request as ready for review April 8, 2026 16:59

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026

openshift-ci Bot requested review from chrischdi and racheljpg April 8, 2026 16:59

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread openshift-tests/test/e2e/dedicated_hosts.go

vr4manta force-pushed the SPLAT-2715 branch from 819295c to 60263b9 Compare April 9, 2026 12:38

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread openshift-tests/test/e2e/dedicated_hosts.go Outdated

Comment thread openshift-tests/test/e2e/dedicated_hosts.go

vr4manta force-pushed the SPLAT-2715 branch from 60263b9 to 59517de Compare April 9, 2026 17:01

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

vr4manta force-pushed the SPLAT-2715 branch from 59517de to e613b26 Compare April 9, 2026 18:04

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread openshift-tests/test/e2e/dedicated_hosts.go Outdated

Fixed broken test and simplified some logic

be5cc4f

vr4manta force-pushed the SPLAT-2715 branch from e613b26 to be5cc4f Compare April 9, 2026 19:04

mtulio reviewed Apr 9, 2026

View reviewed changes

openshift-ci Bot assigned mtulio Apr 9, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2026

Changed By() calls to not embed variables

106e29e

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2026

openshift-ci Bot assigned nrb Apr 14, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026

damdo reviewed Apr 15, 2026

View reviewed changes

Comment thread openshift-tests/test/e2e/dedicated_hosts.go

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 15, 2026

openshift-merge-bot Bot merged commit cba0ea7 into openshift:main Apr 16, 2026
14 checks passed

openshift-cherrypick-robot mentioned this pull request Apr 20, 2026

[release-4.22] SPLAT-2715: Fixed broken test and simplified some logic #184

Open

Conversation

vr4manta commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

openshift-ci-robot commented Apr 6, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

openshift-ci Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

vr4manta commented Apr 6, 2026

Uh oh!

openshift-ci Bot commented Apr 6, 2026

Uh oh!

vr4manta commented Apr 6, 2026

Uh oh!

openshift-ci Bot commented Apr 6, 2026

Uh oh!

vr4manta commented Apr 8, 2026

Uh oh!

openshift-ci Bot commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtulio left a comment

Choose a reason for hiding this comment

Uh oh!

vr4manta commented Apr 14, 2026

Uh oh!

nrb commented Apr 14, 2026

Uh oh!

vr4manta commented Apr 15, 2026

Uh oh!

openshift-ci-robot commented Apr 15, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

damdo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 15, 2026

Uh oh!

damdo commented Apr 15, 2026

Uh oh!

vr4manta commented Apr 15, 2026

Uh oh!

openshift-ci-robot commented Apr 15, 2026

Uh oh!

nrb commented Apr 15, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 15, 2026

Uh oh!

openshift-ci Bot commented Apr 16, 2026

Uh oh!

Uh oh!

vr4manta commented Apr 20, 2026

Uh oh!

vr4manta commented Apr 6, 2026 •

edited

Loading

openshift-ci-robot commented Apr 6, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

openshift-ci-robot commented Apr 15, 2026 •

edited by openshift-ci Bot

Loading