-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add Windows BYOH provisioning support to Prow CI #73785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-main-debug-vshphere-gcp-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): pull-ci-openshift-openshift-tests-private-main-debug-vshphere-gcp-ipi either don't exist or were not found to be affected, and cannot be rehearsed |
5bb51ef to
21ecb46
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-vspere-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-vspere-ipi either don't exist or were not found to be affected, and cannot be rehearsed |
21ecb46 to
e91f31b
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 either don't exist or were not found to be affected, and cannot be rehearsed |
This adds step-registry components and chains for provisioning Windows nodes via BYOH (Bring Your Own Host) using terraform-windows-provisioner. Changes: - Add windows-byoh-provision step (uses terraform-windows-provisioner) - Add windows-byoh-destroy step with cloud CLI fallback mechanism - Update all winc chains to include BYOH provisioning as mandatory step - Add deprovision chains to destroy BYOH nodes before cluster teardown - Add platform "none" (UPI AWS) chain for Windows BYOH - Export instance info in WMCO BYOH format for e2e tests - Use upi-installer image for cloud CLI access (aws, gcloud, az) Platform coverage: - AWS IPI (MachineSet + BYOH) - Azure IPI (MachineSet + BYOH) - GCP IPI (MachineSet + BYOH) - vSphere IPI (MachineSet + BYOH) - Nutanix IPI (MachineSet + BYOH) - AWS UPI platform "none" (BYOH only) Destroy fallback mechanism: When Terraform state is unavailable, the destroy step uses cloud CLI to find and delete instances by name pattern, preventing orphaned instances that could block cluster network deletion. BYOH is now the default Windows node provisioning method for all Windows Container CI testing.
e91f31b to
4bbde3a
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/test all |
|
@rrasouli: job(s): eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- Generate random 3-digit suffix for unique instance names (byoh-XXX) - Save instance name to SHARED_DIR for destroy step to read - Fix wait for Ready nodes logic (exclude NotReady status) - Reduce timeout to 30 minutes (matches flexy-templates production) - Fix destroy script indentation for shellcheck compliance - Add automatic fallback to default name if shared file missing
902cde2 to
45c9276
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/test step-registry-shellcheck |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
7258872 to
c335a5c
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
c335a5c to
6dd9421
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-aws-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi pull-ci-openshift-openshift-tests-private-release-4.19-debug-winc-azure-ipi eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@rrasouli: job(s): eriodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 either don't exist or were not found to be affected, and cannot be rehearsed |
|
/test debug-winc-vsphere-ipi |
|
@rrasouli: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- SHARED_DIR backed by K8s Secret with 3MB limit
- Tarring entire terraform_byoh/ dir exceeded limit (includes .terraform/ providers)
- Now only copy terraform.tfstate file (small, <100KB)
- Provision: cp to SHARED_DIR/terraform_byoh_${PLATFORM}.tfstate
- Destroy: cp back to ARTIFACT_DIR before terraform destroy
Fixes: error: failed to create/update secret: Request entity too large: limit is 3145728
6dd9421 to
8e73558
Compare
|
[REHEARSALNOTIFIER]
A total of 99 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-vsphere-ipi-ovn-winc-f14 pull-ci-openshift-openshift-tests-private-main-debug-winc-gcp-ipi |
|
@rrasouli: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
Closing in favor of clean PR #73920 with single squashed commit and simplified description. |
|
@rrasouli: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add Windows BYOH Provisioning Support to Prow CI
This PR adds step-registry components for provisioning Windows nodes via BYOH (Bring Your Own Host) using https://github.com/openshift/terraform-windows-provisioner.
Overview
BYOH is the default Windows node provisioning method for Windows Container testing in OpenShift. This PR integrates BYOH provisioning into the Prow CI step-registry, enabling automated Windows node deployment across all supported platforms.
Changes
New Step-Registry Components
1. Windows BYOH Steps (windows/byoh/)
windows-byoh-provision: Provisions Windows nodes using terraform-windows-provisioner
${CLUSTER_PROFILE_DIR}/ssh-publickey${ARTIFACT_DIR}/terraform_byoh/(SHARED_DIR is cluster-profile mount, not shared between steps)windows-byoh-destroy: Cleanup step for BYOH Windows nodes
${ARTIFACT_DIR}/terraform_byoh/2. Deprovision Chains (All Platforms)
Created deprovision chains with proper cleanup ordering:
cucushift-installer-rehearse-aws-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-azure-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-gcp-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-vsphere-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-nutanix-ipi-ovn-winc-deprovisioncucushift-installer-rehearse-aws-upi-ovn-winc-deprovisionCleanup Order:
gather,gather-aws-console) - Collects logs while nodes existwindows-byoh-destroy) - Cleans up Windows instances with proper arguments3. Updated Provision Chains (All Platforms)
Added
windows-byoh-provisionas mandatory step to all winc chains:cucushift-installer-rehearse-aws-ipi-ovn-winc-provisioncucushift-installer-rehearse-azure-ipi-ovn-winc-provisioncucushift-installer-rehearse-gcp-ipi-ovn-winc-provisioncucushift-installer-rehearse-vsphere-ipi-ovn-winc-provisioncucushift-installer-rehearse-nutanix-ipi-ovn-winc-provision4. Updated Workflows (All Platforms)
Updated workflows to use new deprovision chains in
post:section:cucushift-installer-rehearse-aws-ipi-ovn-winc-workflowcucushift-installer-rehearse-azure-ipi-ovn-winc-workflowcucushift-installer-rehearse-gcp-ipi-ovn-winc-workflowcucushift-installer-rehearse-vsphere-ipi-ovn-winc-workflowcucushift-installer-rehearse-nutanix-ipi-ovn-winc-workflow5. New Platform "None" Chain and Workflow
cucushift-installer-rehearse-aws-upi-ovn-winc-provisioncucushift-installer-rehearse-aws-upi-ovn-winc-deprovisioncucushift-installer-rehearse-aws-upi-ovn-winc-workflowPlatform Coverage
How It Works
Provision Step Flow
terraform-windows-provisionercontainer image with all dependenciesbyoh-586,byoh-123) to avoid conflicts${ARTIFACT_DIR}/byoh_instance_name.txtfor destroy step${CLUSTER_PROFILE_DIR}/ssh-publickey(same key used by WMCO)CLUSTER_PROFILE_DIR(auto-provided by Prow)byoh.sh apply <name> <num> "" <version>${ARTIFACT_DIR}/terraform_byoh/<platform>/(persistent shared volume)${SHARED_DIR}/<ip>_windows_instance.txtformat for WMCO BYOH e2e testsDestroy Step Flow
terraform-windows-provisionercontainer image${ARTIFACT_DIR}/byoh_instance_name.txt(saved by provision)${CLUSTER_PROFILE_DIR}/ssh-publickey(required by byoh.sh)${ARTIFACT_DIR}/terraform_byoh/<platform>/(persistent shared volume)byoh.sh destroy <name> <num> "" <version>with all arguments matching provisionNote: No cloud CLI fallback - terraform state is guaranteed to exist with ARTIFACT_DIR fix
Integration with WMCO Tests
The instance info format is compatible with existing WMCO BYOH e2e tests:
Consumed by
windows-e2e-operator-test-byohConfiguration
Default Values
All values are overridable via environment variables:
Platform-Specific Settings
All platform-specific configuration (instance types, disk sizes, network settings) is auto-detected from the cluster or can be overridden via environment variables. See https://github.com/openshift/terraform-windows-provisioner for details.
Review Comments Addressed
All feedback from PR #71002 has been addressed:
@jianlinliu Comments
✅ Chicken-and-egg problem: Fixed via 2-PR approach
registry.ci.openshift.org/ci/terraform-windows-provisioner:latest✅ No terraform downloads: Terraform 1.9.5 pre-installed in Dockerfile
✅ No git clone: Scripts pre-copied to
/usr/local/share/byoh-provisioner/in image✅ State in ${SHARED_DIR}: CRITICAL FIX -
SHARED_DIRwas wrong!SHARED_DIR=/tmp/secretpoints to cluster-profile secret mount (container-local, NOT shared between steps)ARTIFACT_DIR=/logs/artifacts(actual persistent shared volume)${ARTIFACT_DIR}/terraform_byoh/<platform>/✅ Destroy not working: Fixed by:
BYOH_WINDOWS_VERSIONin destroy stepbyoh.sh destroyto match provision signature${ARTIFACT_DIR}/byoh_instance_name.txt✅ Unique Instance Naming: Prevents conflicts from previous failed runs
byoh-586,byoh-123, etc.${ARTIFACT_DIR}/byoh_instance_name.txtfor destroy step@jrvaldes Comments
Key Debugging & Fixes
Critical Issue: Secret Size Limit (3MB)
Problem Discovered: Provision succeeded but Secret update failed:
Root Cause: SHARED_DIR is backed by a Kubernetes Secret with a 3MB size limit. Initial approach tarred the entire
terraform_byoh/directory which includes:terraform.tfstate(~50-100KB).terraform/provider binaries and cache (several MB)Investigation:
${SHARED_DIR}/terraform_byoh/${PLATFORM}/terraform.tfstatefile is needed for destroySolution: Save only
terraform.tfstatefile to SHARED_DIRcp terraform.tfstate→${SHARED_DIR}/terraform_byoh_${PLATFORM}.tfstatecpback to${ARTIFACT_DIR}/terraform_byoh/${PLATFORM}/terraform.tfstate.terraform/stays in ARTIFACT_DIR (preserved for debugging, not shared)Impact: This was causing:
Other Fixes
PROW_JOB_ID(not available) to random 3-digit numberBYOH_INSTANCE_NAME="byoh-winc"from ref YAML (was overriding random suffix)Testing
Bootstrap PR Status
registry.ci.openshift.org/ci/terraform-windows-provisioner:latestsha256:a215c67c2d041d2bb6ee3640fed33d6e9ca9cb1a52c540a2be982aeb21904611Rehearsal Results
Validated with rehearsals on AWS IPI 4.17:
✅ Provision step succeeded: 16m17s runtime
✅ Destroy fixes validated: All arguments now passed correctly
Dependencies
This step-registry uses terraform-windows-provisioner container image for provisioning Windows nodes:
registry.ci.openshift.org/ci/terraform-windows-provisioner:latestci-operator/config/openshift/terraform-windows-provisioner/Container Image Includes:
Benefits of Pre-built Image:
Migration Path
This PR adds BYOH as a mandatory step to all winc chains. Existing Windows testing workflows will now provision nodes via both MachineSets (IPI platforms) and BYOH.
Future Work
Planned for separate PRs:
Files Changed
Related
/assign @jianlinliu
/cc @jrvaldes @weinliu