-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add Windows BYOH provisioning support to Prow CI #71002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d471ef0 to
b64fb51
Compare
|
/pj-rehearse periodic-ci-terraform-windows-provisioner-main-e2e-aws-upi-winc-byoh |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): periodic-ci-terraform-windows-provisioner-main-e2e-aws-upi-winc-byoh either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
b64fb51 to
230c4c4
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
230c4c4 to
c0e09b8
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
c0e09b8 to
20c6c58
Compare
|
/pj-rehearse abort |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
20c6c58 to
31234c7
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
31234c7 to
6316a1a
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-azure-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-vsphere-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-aws-ipi-ovn-winc-f14 either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
961b089 to
4a12503
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-aws-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-azure-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-vsphere-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
4a12503 to
18d8b73
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-aws-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-azure-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.21-debug-winc-vsphere-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jianlinliu The image is now available and all implementation comments have been resolved: Bootstrap PR merged: #73680 built the terraform-windows-provisioner image at registry.ci.openshift.org/ci/terraform-windows-provisioner:latest The windows-byoh-provision step succeeded in 16m17s:
The debug-winc-* job failures are unrelated - they're failing on missing stable:tests-private image in the internal registry (infrastructure issue), not BYOH provisioning. |
In https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/71002/rehearse-71002-periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14/2013636132352299008/artifacts/aws-ipi-ovn-winc-f14/windows-byoh-provision/build-log.txt, obviously some windows machines are created, but in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/71002/rehearse-71002-periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14/2013636132352299008/artifacts/aws-ipi-ovn-winc-f14/windows-byoh-destroy/build-log.txt, nothing is destroyed. That will leave orphan resources on the cloud platforms. |
| cd "${WORK_DIR}" || exit 1 | ||
|
|
||
| # Detect platform for terraform directory | ||
| PLATFORM=$(oc get infrastructure cluster -o=jsonpath="{.status.platformStatus.type}" | tr '[:upper:]' '[:lower:]' 2>/dev/null || echo "unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the destroy part, we should not rely on the cluster too much, if the cluster got broken during testing, the following destroy would not happen, that would leave orphan resources on platforms.
This adds step-registry components and chains for provisioning Windows nodes via BYOH (Bring Your Own Host) using terraform-windows-provisioner. Changes: - Add windows-byoh-provision step (uses terraform-windows-provisioner) - Add windows-byoh-destroy step with cloud CLI fallback mechanism - Update all winc chains to include BYOH provisioning as mandatory step - Add deprovision chains to destroy BYOH nodes before cluster teardown - Add platform "none" (UPI AWS) chain for Windows BYOH - Export instance info in WMCO BYOH format for e2e tests - Use upi-installer image for cloud CLI access (aws, gcloud, az) Platform coverage: - AWS IPI (MachineSet + BYOH) - Azure IPI (MachineSet + BYOH) - GCP IPI (MachineSet + BYOH) - vSphere IPI (MachineSet + BYOH) - Nutanix IPI (MachineSet + BYOH) - AWS UPI platform "none" (BYOH only) Destroy fallback mechanism: When Terraform state is unavailable, the destroy step uses cloud CLI to find and delete instances by name pattern, preventing orphaned instances that could block cluster network deletion. BYOH is now the default Windows node provisioning method for all Windows Container CI testing.
18d8b73 to
8ae42c2
Compare
|
[REHEARSALNOTIFIER]
A total of 99 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
The debug-winc-* rehearsal jobs fail with 'manifest unknown' error when trying to pull the tests-private image from the internal registry. This fix adds tests-private as a base_image (imported from ci namespace) for releases 4.20-4.23 where the config doesn't build this image itself. Versions 4.17-4.19 already build tests-private in their images section, so they don't need this base_image entry. This allows debug-winc jobs to reference stable:tests-private without ImagePullBackOff errors. Fixes: openshift#71002 (comment)
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add Windows BYOH Provisioning Support to Prow CI
This PR adds step-registry components for provisioning Windows nodes via BYOH (Bring Your Own Host) using https://github.com/openshift/terraform-windows-provisioner.
Overview
BYOH is the default Windows node provisioning method for Windows Container testing in OpenShift. This PR integrates BYOH provisioning into the Prow CI step-registry, enabling automated Windows node deployment across all supported platforms.
Changes
New Step-Registry Components
1. Windows BYOH Steps (windows/byoh/)
upi-installerimage for cloud CLI access (aws, gcloud, az)2. Deprovision Chains (All Platforms)
Created deprovision chains with proper cleanup ordering:
cucushift-installer-rehearse-aws-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-azure-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-gcp-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-vsphere-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-nutanix-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-aws-upi-ovn-winc-deprovisionCleanup Order:
gather,gather-core-dump) - Collects logs while Windows nodes existwindows-byoh-destroy) - Cleans up Windows instancesThis ensures diagnostics are captured before destroying Windows nodes, while still cleaning up BYOH instances before final network teardown.
3. Updated Provision Chains (All Platforms)
Added
windows-byoh-provisionas mandatory step to all winc chains:cucushift-installer-rehearse-aws-ipi-ovn-winc-provisioncucushift-installer-rehearse-azure-ipi-ovn-winc-provisioncucushift-installer-rehearse-gcp-ipi-ovn-winc-provisioncucushift-installer-rehearse-vsphere-ipi-ovn-winc-provisioncucushift-installer-rehearse-nutanix-ipi-ovn-winc-provision4. Updated Workflows (All Platforms)
Updated workflows to use new deprovision chains in
post:section:cucushift-installer-rehearse-aws-ipi-ovn-winc-workflowcucushift-installer-rehearse-azure-ipi-ovn-winc-workflowcucushift-installer-rehearse-gcp-ipi-ovn-winc-workflowcucushift-installer-rehearse-vsphere-ipi-ovn-winc-workflowcucushift-installer-rehearse-nutanix-ipi-ovn-winc-workflow5. New Platform "None" Chain and Workflow
cucushift-installer-rehearse-aws-upi-ovn-winc-provisioncucushift-installer-rehearse-aws-upi-ovn-winc-deprovisioncucushift-installer-rehearse-aws-upi-ovn-winc-workflowPlatform Coverage
How It Works
Provision Step Flow
CLUSTER_PROFILE_DIR(auto-provided by Prow)cloud-private-keysecret (by terraform-windows-provisioner)byoh.sh apply${SHARED_DIR}/<ip>_windows_instance.txtformat for WMCO BYOH e2e testsDestroy Step Flow
byoh.sh destroyto clean up resourcesgcloud compute instances deleteto find and remove instances by name patternaws ec2 terminate-instancesto find and remove instances by tagaz vm deleteto find and remove VMs in cluster resource groupIntegration with WMCO Tests
The instance info format is compatible with existing WMCO BYOH e2e tests:
Consumed by
windows-e2e-operator-test-byohConfiguration
Default Values
All values are overridable via environment variables:
Platform-Specific Settings
All platform-specific configuration (instance types, disk sizes, network settings) is auto-detected from the cluster or can be overridden via environment variables. See https://github.com/openshift/terraform-windows-provisioner for details.
Testing
These step-registry components can be tested using
/pj-rehearsebefore merging.Example Test Workflow
Test Results
Validated with
/pj-rehearseon AWS IPI 4.17:periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14- SUCCEEDEDUpdates
2025-11-12: Fixed deprovision step ordering across all platforms
windows-byoh-destroywas running beforegather-must-gather, preventing log collection from Windows nodesDependencies
This step-registry uses terraform-windows-provisioner for provisioning Windows nodes:
The provisioner is a standalone tool with:
Container Image
Both provision and destroy steps use the
upi-installerimage which includes:This ensures cloud CLI commands are available for both normal operation and fallback cleanup.
Migration Path
This PR adds BYOH as a mandatory step to all winc chains. Existing Windows testing workflows will now provision nodes via both MachineSets (IPI platforms) and BYOH.
Future Work
Planned for separate PRs:
Files Changed
Related
Preview
You can preview the step-registry structure at:
ci-operator/step-registry/windows/byoh/provision/- Reusable BYOH provision stepci-operator/step-registry/windows/byoh/destroy/- Reusable BYOH destroy step with fallbackci-operator/step-registry/cucushift/installer/rehearse/- Updated chains and workflows with BYOH/cc @jrvaldes @weinliu