WINC-815: Add support for Windows Server 2022 on GCP clusters#1197
Conversation
|
/test gcp-e2e-operator |
|
/approve cancel |
|
/test gcp-e2e-operator |
|
/test gcp-e2e-operator |
38178c2 to
f1b7ac4
Compare
|
/test gcp-e2e-operator |
|
|
||
| // GetAddress returns a non-ipv6 address that can be used to reach a Windows node. This can be either an ipv4 | ||
| // or dns address. | ||
| // or dns address. DNS will be preferred, if available. |
There was a problem hiding this comment.
This change needs to be a separate PR to be on the safe side given it changes behavior across all clouds.
| } | ||
|
|
||
| _, err := getInternalIPAddress(machine.Status.Addresses) | ||
| _, err := GetAddress(machine.Status.Addresses) |
There was a problem hiding this comment.
Where is GetAddress defined?
| return [System.Web.Security.Membership]::GeneratePassword(16, 2) | ||
| } | ||
|
|
||
| # Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator |
There was a problem hiding this comment.
As an aside I wonder if we can do this on Azure also? It is also making me wonder if we should just create a new core user instead like we wanted to in the past.
There was a problem hiding this comment.
We can re-open https://issues.redhat.com/browse/WINC-430
This could cause issues with people using custom images
There was a problem hiding this comment.
This could cause issues with people using custom images
Why would creating a core user cause issues?
There was a problem hiding this comment.
I'm thinking of a case where the Administrator user has been set up with certain permissions in the image, or theres security software installed special casing the Administrator user. Its nothing we cant doc around.
04b60a1 to
d8f1ad4
Compare
|
/test gcp-e2e-operator |
| return [System.Web.Security.Membership]::GeneratePassword(16, 2) | ||
| } | ||
|
|
||
| # Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator |
There was a problem hiding this comment.
This could cause issues with people using custom images
Why would creating a core user cause issues?
|
/test gcp-e2e-operator |
| # The capi user doesn't exist, ensure the Administrator account is enabled if it exists | ||
| # If neither users exist, an error will be written to the console, but the script will still continue | ||
| $UserAccount = Get-LocalUser -Name "Administrator" | ||
| if( ($UserAccount -ne $null) -and (!$UserAccount.Enabled) ) { |
There was a problem hiding this comment.
Nit extra spaces after if( and ).
|
|
||
| Not having these labels will result in the Windows node not being marked as a worker. | ||
|
|
||
| If the Machine spec has the label `windowsmachineconfig.openshift.io/ignore=true`, the Machine will be ignored by WMCO, |
There was a problem hiding this comment.
should this information be moved to the hacking.md file? Wondering if this would be useful in the README if we don't expect users to use this other than for development and testing.
There was a problem hiding this comment.
That's true, I was back and forth about this. I can move it there
We have been using the presence of the Windows label on Machines as the differentiator between Machines used to test the Windows Machine controller and those used to test the ConfigMap controller. This worked because the Windows Machine controller would filter out any Machines that did not have that label. This is now an issue as this label is now being used by the MAPI Machine controllers, indicating whether or not Windows specific steps should occur when creating the backing VM. Without this label on the Machines we are spinning up for BYOH, the VMs will not be usable. This is true now for GCP, but it may also become true for other providers. This commit introduces a new `ignore` label, this label will be used by the WindowsMachine controller to filter out Machines for reconciliation, allowing us to continue using the Machine API to create Windows Machines for BYOH testing purposes.
Enables creating MachineSets on the GCP platform Completes https://issues.redhat.com/browse/WINC-851
|
/test gcp-e2e-operator |
| if tc.CloudProvider.GetType() == config.VSpherePlatformType || | ||
| tc.CloudProvider.GetType() == config.GCPPlatformType || tc.CloudProvider.GetType() == config.AzurePlatformType { | ||
| powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"") |
There was a problem hiding this comment.
@sebsoto the cloud provider is not the driving factor here. This modification is required due to Windows Server 2022.
Consider using the existing tc.windowsServerVersion flag in the condition statement for a more accurate decision. See this commit.
For example:
if tc.windowsServerVersion != "2019" {
powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"")
}There was a problem hiding this comment.
@jrvaldes thats a good point, but I don't think that method is usable until the AWS 2022 PR is merged. If I try and do that now, the AWS 2022 jobs will fail
There was a problem hiding this comment.
What I mean is the jobs that are set for use with 2022 will fail due to the wrong powershell parsing being used for them
There was a problem hiding this comment.
The environment variable was introduced in openshift/release#29791, and is ready to use.
There was a problem hiding this comment.
Yes but aws-e2e-ccm-install and the aws upgrade job have that env var set to ""
this will cause the test suite to detect them as running server 2022 and use the wrong powershell syntax because of that
There was a problem hiding this comment.
I don't think so. the Windows Server 2019 is hard-coded.
My point is that we should stop using tc.CloudProvider.GetType() as criteria to tweak the PS commands. It was valid with AWS+WS2022 and is now a reality with GCP+WS2022.
There was a problem hiding this comment.
we cannot do powershellDefaultCommand = strings.ReplaceAll(command, "\\\"", "\"") on VMs running Windows server 2019. The AWS upgrade and ccm jobs are currently running 2019, but checking that environment variable will tell us that they are running 2022.
saifshaikh48
left a comment
There was a problem hiding this comment.
Good stuff, just a few minor comments
| ## Preventing Windows Machines from being configured by WMCO | ||
|
|
||
| If the Machine spec has the label `windowsmachineconfig.openshift.io/ignore=true`, the Machine will be ignored by WMCO, | ||
| and will not be configured into a Windows node. This can be helpful when debugging userdata changes. |
There was a problem hiding this comment.
the Machine will be ignored by WMCO and will not be configured into a Windows node
unless it is added as an entry to the windows-instances ConfigMap, right? Since this is hack readme, might be good to call this out.
There was a problem hiding this comment.
I'm avoiding calling that out specifically because I dont want anyone to assume thats okay to do.
I'll change the language to specifically call out the Machine controller will ignore the Machine though.
| } | ||
|
|
||
| # Check if the capi user exists, this will be the case on Azure, and will be used instead of Administrator | ||
| if((Get-LocalUser |Where-Object {$_.Name -eq "capi"}) -eq $null) { |
There was a problem hiding this comment.
nits: please put a space after the pipe and remove the extra space before -eq (there's 2)
|
|
||
| _\<zone_suffix\>_ should be replaced with the suffix of the chosen zone, such as `a`. | ||
|
|
||
| _\<project_id\>_ should be replaced with the GCP project this cluster was created under. |
There was a problem hiding this comment.
Is there a gcloud command to fetch this info? gcloud config get-value project maybe?
There was a problem hiding this comment.
This could give the wrong value if the default project they have set up on their shell isn't the one that the cluster exists on
There was a problem hiding this comment.
I follow, is it safe to use it in the machineset.sh script?
There was a problem hiding this comment.
i think its fine to use there as that script is meant for our use, not for customers
This commit changes the Windows userdata secret to create the Administrator account if the capi user doesn't exist, and if the account is not enabled already. This is required to support GCP clusters, as GCP standard Windows images have the Administrator account disabled.
* Documents how a user can create a Windows MachineSet in a GCP cluster * Updates pre-requisites to provide information on GCP
|
/lgtm |
|
/test gcp-e2e-operator |
|
/retest |
|
/retest |
|
Azure CI failing due to node logs test issues. |
|
/hold |
|
/hold cancel |
|
@sebsoto: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR adds multiple commits around providing WMCO support for GCP clusters
Commits to have GCP to work with our test suite:
Commits to add GCP functionality and docs to WMCO: