Skip to content

WINC-815: Add support for Windows Server 2022 on GCP clusters#1197

Merged
openshift-merge-robot merged 4 commits into
openshift:masterfrom
sebsoto:gcpCIReal
Aug 27, 2022
Merged

WINC-815: Add support for Windows Server 2022 on GCP clusters#1197
openshift-merge-robot merged 4 commits into
openshift:masterfrom
sebsoto:gcpCIReal

Conversation

@sebsoto
Copy link
Copy Markdown
Contributor

@sebsoto sebsoto commented Aug 20, 2022

This PR adds multiple commits around providing WMCO support for GCP clusters

  • Commits to have GCP to work with our test suite:

    • [controllers] Add ignore label to filter Machines
    • [e2e] Add GCP provider
  • Commits to add GCP functionality and docs to WMCO:

    • [secrets] Enable Administrator account if needed
    • [docs] Document GCP support

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 20, 2022
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2022
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 20, 2022

/test gcp-e2e-operator

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 20, 2022

/approve cancel

@openshift-ci openshift-ci Bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2022
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 20, 2022

/test gcp-e2e-operator

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 21, 2022

/test gcp-e2e-operator

@sebsoto sebsoto force-pushed the gcpCIReal branch 3 times, most recently from 38178c2 to f1b7ac4 Compare August 23, 2022 11:45
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2022
@sebsoto sebsoto changed the title [WIP] GCP Windows Server 2022 WINC-815: Add support for GCP Windows Server 2022 Aug 23, 2022
@sebsoto sebsoto changed the title WINC-815: Add support for GCP Windows Server 2022 WINC-815: Add support for Windows Server 2022 on GCP clusters Aug 23, 2022
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 23, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2022
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 23, 2022

/test gcp-e2e-operator

Copy link
Copy Markdown
Contributor

@aravindhp aravindhp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @sebsoto. Looks good. Just some minor comments and splitting up the controller using DNS to another PR.

Comment thread controllers/controllers.go Outdated

// GetAddress returns a non-ipv6 address that can be used to reach a Windows node. This can be either an ipv4
// or dns address.
// or dns address. DNS will be preferred, if available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to be a separate PR to be on the safe side given it changes behavior across all clouds.

}

_, err := getInternalIPAddress(machine.Status.Addresses)
_, err := GetAddress(machine.Status.Addresses)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is GetAddress defined?

Comment thread controllers/windowsmachine_controller.go
Comment thread pkg/secrets/secrets.go
return [System.Web.Security.Membership]::GeneratePassword(16, 2)
}

# Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside I wonder if we can do this on Azure also? It is also making me wonder if we should just create a new core user instead like we wanted to in the past.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can re-open https://issues.redhat.com/browse/WINC-430
This could cause issues with people using custom images

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause issues with people using custom images

Why would creating a core user cause issues?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of a case where the Administrator user has been set up with certain permissions in the image, or theres security software installed special casing the Administrator user. Its nothing we cant doc around.

Comment thread pkg/secrets/secrets.go
@sebsoto sebsoto force-pushed the gcpCIReal branch 2 times, most recently from 04b60a1 to d8f1ad4 Compare August 23, 2022 20:04
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 23, 2022

/test gcp-e2e-operator

Comment thread controllers/windowsmachine_controller.go
Comment thread pkg/secrets/secrets.go
return [System.Web.Security.Membership]::GeneratePassword(16, 2)
}

# Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause issues with people using custom images

Why would creating a core user cause issues?

Comment thread pkg/secrets/secrets.go
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 23, 2022

/test gcp-e2e-operator

Comment thread pkg/secrets/secrets.go
# The capi user doesn't exist, ensure the Administrator account is enabled if it exists
# If neither users exist, an error will be written to the console, but the script will still continue
$UserAccount = Get-LocalUser -Name "Administrator"
if( ($UserAccount -ne $null) -and (!$UserAccount.Enabled) ) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit extra spaces after if( and ).

Comment thread controllers/windowsmachine_controller.go
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2022
Comment thread README.md Outdated

Not having these labels will result in the Windows node not being marked as a worker.

If the Machine spec has the label `windowsmachineconfig.openshift.io/ignore=true`, the Machine will be ignored by WMCO,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this information be moved to the hacking.md file? Wondering if this would be useful in the README if we don't expect users to use this other than for development and testing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I was back and forth about this. I can move it there

We have been using the presence of the Windows label on Machines as the
differentiator between Machines used to test the Windows Machine
controller and those used to test the ConfigMap controller. This worked
because the Windows Machine controller would filter out any Machines
that did not have that label.

This is now an issue as this label is now being used by the MAPI Machine
controllers, indicating whether or not Windows specific steps should
occur when creating the backing VM. Without this label on the Machines
we are spinning up for BYOH, the VMs will not be usable. This is true
now for GCP, but it may also become true for other providers.

This commit introduces a new `ignore` label, this label will be used by
the WindowsMachine controller to filter out Machines for reconciliation,
allowing us to continue using the Machine API to create Windows Machines
for BYOH testing purposes.
Enables creating MachineSets on the GCP platform
Completes https://issues.redhat.com/browse/WINC-851
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 24, 2022

/test gcp-e2e-operator

Comment on lines +313 to 315
if tc.CloudProvider.GetType() == config.VSpherePlatformType ||
tc.CloudProvider.GetType() == config.GCPPlatformType || tc.CloudProvider.GetType() == config.AzurePlatformType {
powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebsoto the cloud provider is not the driving factor here. This modification is required due to Windows Server 2022.

Consider using the existing tc.windowsServerVersion flag in the condition statement for a more accurate decision. See this commit.

For example:

	if tc.windowsServerVersion != "2019" {
		powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"")
	}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrvaldes thats a good point, but I don't think that method is usable until the AWS 2022 PR is merged. If I try and do that now, the AWS 2022 jobs will fail

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is the jobs that are set for use with 2022 will fail due to the wrong powershell parsing being used for them

Copy link
Copy Markdown
Contributor

@jrvaldes jrvaldes Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable was introduced in openshift/release#29791, and is ready to use.

Copy link
Copy Markdown
Contributor Author

@sebsoto sebsoto Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but aws-e2e-ccm-install and the aws upgrade job have that env var set to ""
this will cause the test suite to detect them as running server 2022 and use the wrong powershell syntax because of that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. the Windows Server 2019 is hard-coded.

My point is that we should stop using tc.CloudProvider.GetType() as criteria to tweak the PS commands. It was valid with AWS+WS2022 and is now a reality with GCP+WS2022.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot do powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"") on VMs running Windows server 2019. The AWS upgrade and ccm jobs are currently running 2019, but checking that environment variable will tell us that they are running 2022.

Copy link
Copy Markdown
Contributor

@saifshaikh48 saifshaikh48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, just a few minor comments

Comment thread docs/HACKING.md Outdated
## Preventing Windows Machines from being configured by WMCO

If the Machine spec has the label `windowsmachineconfig.openshift.io/ignore=true`, the Machine will be ignored by WMCO,
and will not be configured into a Windows node. This can be helpful when debugging userdata changes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Machine will be ignored by WMCO and will not be configured into a Windows node

unless it is added as an entry to the windows-instances ConfigMap, right? Since this is hack readme, might be good to call this out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm avoiding calling that out specifically because I dont want anyone to assume thats okay to do.
I'll change the language to specifically call out the Machine controller will ignore the Machine though.

Comment thread pkg/secrets/secrets.go Outdated
}

# Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator
if((Get-LocalUser |Where-Object {$_.Name -eq "capi"}) -eq $null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: please put a space after the pipe and remove the extra space before -eq (there's 2)

Comment thread docs/machineset-gcp.md

_\<zone_suffix\>_ should be replaced with the suffix of the chosen zone, such as `a`.

_\<project_id\>_ should be replaced with the GCP project this cluster was created under.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a gcloud command to fetch this info? gcloud config get-value project maybe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could give the wrong value if the default project they have set up on their shell isn't the one that the cluster exists on

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow, is it safe to use it in the machineset.sh script?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its fine to use there as that script is meant for our use, not for customers

This commit changes the Windows userdata secret to create the
Administrator account if the capi user doesn't exist, and if
the account is not enabled already. This is required to support GCP
clusters, as GCP standard Windows images have the Administrator account
disabled.
* Documents how a user can create a Windows MachineSet in a GCP cluster
* Updates pre-requisites to provide information on GCP
@jrvaldes
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2022
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 24, 2022

/test gcp-e2e-operator

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 24, 2022

/retest

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 2 against base HEAD cc18c77 and 8 for PR HEAD 4c2264c in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 1 against base HEAD cc18c77 and 7 for PR HEAD 4c2264c in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD cc18c77 and 6 for PR HEAD 4c2264c in total

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 25, 2022

/retest

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 25, 2022

Azure CI failing due to node logs test issues.
Opened #1214 to address this

@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 25, 2022

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2022
@sebsoto
Copy link
Copy Markdown
Contributor Author

sebsoto commented Aug 26, 2022

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2022
@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 2 against base HEAD 7d47218 and 5 for PR HEAD 4c2264c in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 27, 2022

@sebsoto: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 44fc3ac into openshift:master Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants