feat: Enable Antrea plugin on Windows Nodes#3597
feat: Enable Antrea plugin on Windows Nodes#3597reachjainrahul wants to merge 3 commits intoAzure:masterfrom
Conversation
1684157 to
0ae30f8
Compare
|
@jackfrancis could you please help assign this review to right person? |
parts/k8s/addons/antrea-windows.yaml
Outdated
| # wins will rename the binary when executing it. So we need to copy the binary everytime before running it. | ||
| mkdir -force /host/k/antrea/bin | ||
| cp /k/antrea/bin/* /host/k/antrea/bin/ | ||
| C:/k/antrea/utils/wins.exe cli process run --path /k/antrea/bin/antrea-agent.exe --args "--config=/k/antrea/etc/antrea-agent.conf --logtostderr=false --log_dir=/k/antrea/logs/ --alsologtostderr --log_file_max_size=100 --log_file_max_num=4" --envs "KUBERNETES_SERVICE_HOST=$env:KUBERNETES_SERVICE_HOST KUBERNETES_SERVICE_PORT=$env:KUBERNETES_SERVICE_PORT ANTREA_SERVICE_HOST=$env:ANTREA_SERVICE_HOST ANTREA_SERVICE_PORT=$env:ANTREA_SERVICE_PORT NODE_NAME=$env:NODE_NAME" |
There was a problem hiding this comment.
Can we get this configured, installed and running without wins? We don't use wins to configure the rest of the components and currently don't want to introduce the component.
There was a problem hiding this comment.
Hi @jsturtevant , Antrea depends on wins to manage antrea-agent and kube-process processes from Pod.
It's also an official recommended way: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/
Are you just want avoid to introduce wins or any this kind of tools? Without wins like tools we will must start antrea-agent process directly from host and do lots of configuration for it.
There was a problem hiding this comment.
We don't want to introduce wins at this point for aks-engine. Wins is a different way of configuring a cluster and aks-engine comes from a time before that was a way of doing things. Introducing a second way of configuring things is out of scope for aks-engine at this point.
There was a problem hiding this comment.
OK, get it. We will add scripts help to start antre-agent without wins.
There was a problem hiding this comment.
@jsturtevant Is there any alternative way to start a process on host from a Pod because hostNetwork is not supported by Windows?
We have no alternative way to start process on host without wins.
We don't prefer to add a install script on the host to run antrea. The antrea will be out of control of Kubernetes control plane in this way.
CC. @reachjainrahul
There was a problem hiding this comment.
There currently isn't a way to start a process on the host from a POD. Services running on the host still have access to the k8s control plane, this is the way kubelet and the other cni's are running currently.
There is a draft KEP related to enabling privileged containers to solve this problem outside of k8s. Please add some comments to make sure it fits your needs: https://docs.google.com/document/d/12EUtMdWFxhTCfFrqhlBGWV70MkZZPOgxw0X-LTR0VAo/edit#heading=h.kpsimyb2q1sg
There was a problem hiding this comment.
@jsturtevant We have added antrea as a windows service. Could you please take a look at the updated review. Hopefully we can checkin soon.
I ran E2E tests on Windows and it passed with Antrea v0.9.0
| & Bcdedit.exe -set TESTSIGNING ON | ||
|
|
||
| # TODO: Discuss with AKS Folks, how to install Upstream OVS on Windows Node | ||
| # and sign it. |
There was a problem hiding this comment.
Can you explain more on the requirement for signing the ovs? For aks-engine users will they be able use this as is or is there another process that is required for them to get signed binaries?
There was a problem hiding this comment.
Antrea is using the Open Source OVS as the forwarding component, nd the Open Source OVS has no certificate from Microsoft. Hence, we have to enable the "testsigning" mode on the Windows host to ensure OVS datapath could forward packets correctly.
But there might be some security risks to enable the "testsigning" mode on a Windows host, so we want to get more thoughts from the AKS folks to see if we can we get signed for the Open Source OVS (then we don't need to enable the testsigning mode), or the risk for the "testsigning" mode is acceptable (then we could continue using the unsigned Open Source OVS).
There was a problem hiding this comment.
I don't have any experience with OVS so have a few more questions:
- For folks who don't use AKS and only use aks-engine but would like to use OVS how will they get the signed cert?
- Do they need to rely on Microsoft to provide a certificate? Will they want to provide a cert that they sign?
- In the case of AKS, will customers also want to provide there own certs?
Want to make sure there is a path for teams and users that don't use AKS as well.
@AbelHu might have some additional questions or insight as well.
There was a problem hiding this comment.
Without a Microsoft official certificate (https://docs.microsoft.com/en-us/windows-hardware/drivers/install/whql-release-signature), "testsigning" mode must be enabled even if the user generates a self-signed cert.
To acquire an official certificate, we need an organization. However, either the open source OVS community or the Linux Foundation doesn't provide the signed cert currently. So the OVS community recommends users to enable "testsigning" mode when using OVS on Windows host.
An official certificate requires all the binaries in OVS should be signed that also includes the installers, catalog files and driver files. The action owner should be an orgnization first, and has an account in Microsoft to trigger the signing process. A series of tests playlist that will run against some nodes which already have OVS deployed. Once all the tests have been passed, a signed package can be created and uploaded to Microsoft for verification and further AV scanning.
There was a problem hiding this comment.
so the binaries need to be signed and there isn't a cert that needs to be provided? This means Microsoft could sign the binaries out of band and distribute them many customers? We also require our binaries and powershell scripts to be signed (#3441 for example)
I had initially thought there was a requirement for a private cert that needed to be installed on the node per customer.
There was a problem hiding this comment.
Thanks this makes sense. We do something similar for other packages. Given this isn't GA yet we can leave the self signing for folks to try it out. As it matures and demand grows we can address signing. We will want to make sure that the components are not installed unless the cni is installed this way it doesn't get flagged for signing by any processes that might be checking.
There was a problem hiding this comment.
Hi @jsturtevant , the OVS is signed by Microsoft means WHQL certification for OVS binaries.
Following arr the steps to get the WHQL certification:
-
You need to pass the tests for HLK on Windows 10 family and above
(https://docs.microsoft.com/en-us/windows-hardware/test/hlk/) and
HCK on Windows 8 family
(https://docs.microsoft.com/en-us/windows-hardware/drivers/develop/run-the-hck-test-suites-in-the-wdk).
The above is a series of tests playlist that will be ran against some nodes which already have OVS deployed.
The tests are a bit finicky (to say at best) and vary between versions and types (Server / Client – Home, Professional
and Enterprise) of Windows OS. -
Once you pass all the tests you create a signed package which you upload to Microsoft for verification and
further AV scanning.
To note: The driver certificate that you use on the deployed nodes prior to HCK/HLK testing, has to be signed
using a different, more expensive certificate: EV certificate
(https://docs.microsoft.com/en-us/windows-hardware/drivers/dashboard/get-a-code-signing-certificate)
I’m not sure if they dropped this requirement but it was a huge issue when people wanted to sign drivers
for 8.1/2012 R2 and 10-RS4/ Server 2016.
Again an organization has to be behind that certificate.
WHQL signing means that Microsoft will cross sign the binaries that we pushed in our publisher package.
This will also allow the use of Secure Boot.
There was a problem hiding this comment.
When we address the signing aspect as discussed above we will also need include signing the AntreaStartWrapper and https://github.com/vmware-tanzu/antrea/releases/download/v0.9.0/Start.ps1.
There was a problem hiding this comment.
I'm not sure if the aks-engine team can perform the type of signing needed here.
We currently perform authenicode signing for many open source components (mainly kubernetes binaries) but that is intended to provide assurance that these binaries were built by Microsoft and not modified.
We don't currently have the capabilities to sign with certificates that allow code / driver execution.
There was a problem hiding this comment.
Also, enabling test signing on AKS will not be acceptable.
We'll need to figure out how to get OVS to be signed properly.
0ae30f8 to
6e41962
Compare
|
@jackfrancis @jsturtevant Could you please help review the updated patch? |
parts/k8s/windowsantreacnifunc.ps1
Outdated
| ) | ||
|
|
||
| Write-Log "Downloading Antrea Start Powershell script" | ||
| $StartPs = [Io.path]::Combine($KubeDir, "Start.ps1") |
There was a problem hiding this comment.
Can this be named AntreaStart on disk?
| } | ||
|
|
||
| if a.OrchestratorProfile.KubernetesConfig.NetworkPlugin == NetworkPluginAntrea { | ||
| return errors.Errorf("networkPlugin antrea for windows is not supported with ContainerRuntime=containerd") |
There was a problem hiding this comment.
Is support for containerd in the roadmap?
There was a problem hiding this comment.
We havent planned for containerd support yet.
| } | ||
|
|
||
| p.OrchestratorProfile.KubernetesConfig.NetworkPolicy = NetworkPolicyAntrea | ||
| if err := p.OrchestratorProfile.KubernetesConfig.validateNetworkPolicy(k8sVersion, true); err == nil { |
There was a problem hiding this comment.
looking a few lines above it seems the k8sversion passed in is 1.7.9. This should stay the same and check for 1.18+ should be added.
There was a problem hiding this comment.
Done.. Updated the patch
There was a problem hiding this comment.
@jsturtevant I explicitly added for 1.17. Do you still feel the need for 1.18 as 1.17 will cover validation failure
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #3597 +/- ##
==========================================
- Coverage 73.69% 72.82% -0.87%
==========================================
Files 147 149 +2
Lines 23164 23183 +19
==========================================
- Hits 17070 16884 -186
- Misses 4979 5180 +201
- Partials 1115 1119 +4
Continue to review full report at Codecov.
|
6e41962 to
24f7b5a
Compare
|
/azp run |
|
Commenter does not have sufficient privileges for PR 3597 in repo Azure/aks-engine |
|
/azp run pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
24f7b5a to
fb642aa
Compare
|
@jsturtevant can we merge this today? |
| } | ||
|
|
||
| if networkPolicy == NetworkPolicyAntrea && hasWindows && !common.IsKubernetesVersionGe(k8sVersion, "1.18.0") { | ||
| return errors.New("networkPolicy antrea for windows requires kubernetes version of 1.18 or higher") |
There was a problem hiding this comment.
Can you add a test for this?
|
Need to have tests pass and two additional items:
/azp run pr-e2e |
|
/azp run pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
I added antrea_windows.json, because we wanted to test Windows with 1.18 version only. Antrea for Linux can work from 1.15 onwards.. I see you set version to latest. I am OK with the change. Wondering, do you run E2E tests against fixed version, or its round robin, and one of the version from 1.15 to 1.18 is selected?? |
I hope with ur latest change cluster is coming up. If so, can we merge this today?? @jackfrancis |
|
@reachjainrahul the test was run against a 1.19 cluster. Here are some docs that can show you how you can run E2E tests locally, assuming you have an Azure subscription with quota to build a cluster. |
Sorry I missed your point. Is the cluster not coming with up 1.19? I verified on 1.18 and it worked for me earlier |
Another thing to note, this change includes bootstrap script for windows and it needs to be signed before it can be installed. When I bring up the cluster with 1.18, I point that to my storage location so that antrea bootstrap script gets called. Not sure if you facing any problem with 1.19 cluster due to this. |
Can you describe that workflow in detail? |
This is not specific to Antrea. I think any PowerShell script needs to be signed by Microsoft. For developers to test addition of ps script like windowsantreacnifunc.ps1, we use build provisioning script and provide a storage location. And in cluster.json, that location is provided. This workflow is specific to Aks engine. |
|
We can't consider merging this until it works. As it stands, installing the antrea addon with a Windows node pool doesn't produce working Windows nodes. |
@jsturtevant @jackfrancis whats the workflow to add PowerShell script like windowsantreacnifunc.ps1. May I know, how is provisioning script called in the test env ? And what happens when someone make changes to windows provisioning script. How does e2e tests picks up the change? Also Jack, I did verify cluster with 1.18 . And it works. Even I ran E2E tests on it. |
|
@reachjainrahul I am testing 1.18 now |
This is what I did. I am running the test with 1.19 now.. Lets see |
|
@reachjainrahul so you're saying that this cluster config can't be tested as-is? https://github.com/Azure/aks-engine/pull/3597/files#diff-629aa0331a7eeabadc497a6ad58c16ac (That would explain why things are not testing out as working.) @jsturtevant @marosset are you aware of need of having a custom built Is that requirement documented @reachjainrahul ? |
Windows Provisioning script is changed in this PR. We introduced new file windowsantreacnifunc.ps1 for installing OVS and Antrea. As per this documentation, inorder to test the change, we need to add provisioningScriptsPackageURL config https://github.com/Azure/aks-engine/blob/master/docs/topics/windows-provisioning-scripts.md I am not sure, how in CI (E2E test), this new change will be picked up. If the change is not picked, antrea wont be installed on worker node. |
|
@jackfrancis @jsturtevant @mboersma As part of this checkin, we are modifying master/parts/k8s/kuberneteswindowssetup.ps1 which is the bootstrap powershell script used to bring up windows node. Like install docker, CNI, OpenSSL and so on. Antrea installation happens via that script As per this documentation from AKS Engine, any powershell script needs to be signed. So, even kuberneteswindowssetup.ps1 is signed by Microsoft and put at some predefined location. Basically zip your files and put it under some storage location and modify cluster.json with 'provisioningScriptsPackageURL' and provide the location. This requirement is not from Antrea, but from AKS-Engine. It would be good to understand, if someone modifies this file kuberneteswindowssetup.ps1, whose responsibility is to sign the files @jsturtevant and upload it to storage. I believe it's out of band operation and done manually. The problem @jackfrancis is facing with E2E test is; this change of antrea installation is not picked up, as kuberneteswindowssetup.ps1 is still pointing to checked in version, rather than new version available with this change. |
Any updates ?? |
|
@reachjainrahul sorry for the delay. Could you re-base the changes. And I can generate a package that I can give to @jackfrancis for the e2e. If we get a good signal for those then we can merge and I will kick off the out of band signing process. |
Antrea can be enabled as networkPlugin and networkPolicy plugin on Windows node. Antrea windows plugin has dependency on Openvswitch. It requires signed version of upstream OVS. Here, we install OVS with test signing mode enabled on Windows node. Its not recommended for production. Also updating antrea version to v0.9.3
26bca6e to
ca8521d
Compare
I rebased and ran E2E tests. All passed. |
|
it looks like the ci provisioning script test was a not related to these changes. Re-triggered. |
|
@jackfrancis you can use https://pssigned11493.blob.core.windows.net/ps-signed/windows-provisioning-scripts-1604682769.zip with |
|
@jackfrancis @jsturtevant If we need another way to test these changes we can expose the cluster definition file used in the e2e tests as a Azure DevOps pipeline variable and trigger runs that test changes to the provisioning scripts along with different cluster shapes. |
|
@reachjainrahul I'm not seeing anything scheduled on the Windows node in the test config: Is that expected? |
Antrea for windows doesnt run as containers (POD). It runs as native windows process. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Reason for Change:
Antrea can be enabled as networkPlugin and networkPolicy plugin
on Windows node. Antrea windows plugin has dependency on Openvswitch.
It requires signed version of upstream OVS. Here, we install OVS
with test signing mode enabled on Windows node. Its not recommended
for production.
Also updating antrea version to v0.9.3
Issue Fixed:
Requirements:
Notes: