refactor: make cilium addon user-configurable#2480
Conversation
| wait_for_file 1200 1 $KUBELET_RUNTIME_CONFIG_SCRIPT_FILE || exit $ERR_FILE_WATCH_TIMEOUT | ||
| systemctlEnableAndStart kubelet || exit $ERR_KUBELET_START_FAIL | ||
| {{if HasCiliumNetworkPolicy}} | ||
| while [ ! -f /etc/cni/net.d/05-cilium.conf ]; do |
There was a problem hiding this comment.
This change is to give a little extra time for the kubelet to reconcile before returning success to the user. The current implementation of cilium is to deliver this CNI dependency via a daemonset, which means container networking doesn't work for the first few mins of the kubelet coming online.
|
|
||
| if kubernetesConfig != nil { | ||
| if kubernetesConfig.NetworkPlugin == NetworkPluginCilium { | ||
| cloudInitFiles["systemdBPFMount"] = getBase64EncodedGzippedCustomScript(systemdBPFMount, cs) |
There was a problem hiding this comment.
Moved this to the cloudInitFiles variable object because it's more idiomatic, and it saves ARM variable overhead.
| var err error | ||
| if !eng.ExpandedDefinition.Properties.HasLowPriorityScaleset() { | ||
| nodeList, err = node.GetReady() | ||
| nodes, err = node.GetWithRetry(1*time.Second, cfg.Timeout) |
There was a problem hiding this comment.
These changes came about due to poking at cilium clusters w/ E2E and seeing flakes due to the fact that a cilium cluster configuration initiates an OS reboot (thus the nodes may go offline during these tests and we need to retry).
| @@ -1,59 +0,0 @@ | |||
| { | |||
There was a problem hiding this comment.
This test doesn't make sense: there's only one cilium config, and that's for implementing NetworkPolicy (plus cilium IPAM). There is no exclusive networkPlugin=cilium cluster configuration.
| { | ||
| "name": "agentpool1", | ||
| "count": 2, | ||
| "vmSize": "Standard_D2_v3", |
There was a problem hiding this comment.
Cleaning up while I was here
Codecov Report
@@ Coverage Diff @@
## master #2480 +/- ##
==========================================
+ Coverage 72.55% 72.62% +0.07%
==========================================
Files 130 130
Lines 23932 24004 +72
==========================================
+ Hits 17363 17434 +71
- Misses 5542 5544 +2
+ Partials 1027 1026 -1 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, mboersma The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Rashmi/ciprod12042019 (#3) improvement: Update Containermonitoring addon for december release (#1) * Fixing yaml indentation issues for omsagent (#4) Fixing yaml indentation issues for omsagent * Fixing more indentation issues in omsagent yaml (#5) Fixing more indentation issues in omsagent yaml * updating templates_generated file * Fix: Updating generated file to fix unit tests (#6) fix: Updating generated file to fix unit tests * Adding test coverage for 1.16 and 1.17 (#7) Adding test coverage for 1.16 and 1.17 * Merging changes for omsagent latest release - ciprod01072020 after syncing from remote master(#8) * chore: use go template comments for generate proxy certs script (#2336) * fix: fix ARM dependency issues with vm user-specified extensions on node pools (#2398) * fix: fix ARM dependency issues if many extensions are specified for a node profile * fix scale up case for windows vhd case. (#2483) * refactor: make cilium addon user-configurable (#2480) * refactor: make cilium addon user-configurable * chore: clarify that cilium doesn't work w/ 1.16 and above, add validation * test: addons UT * test: go template UT * ci: use Standard_D8_v3 for cilium test, only run NetworkPolicy tests * fix: error message language * chore: remove debug fmt.Println * ci: revert back to Standard_D2_v3 * chore: upgrade cni-plugins to v0.7.6 (#2484) * fix: hard-coding hyper-v generation when using VHD URls as a quick unblock (#2487) * feat: Configuring docker log rotation for Windows nodes (#2478) * feat: Antrea plugin support in AKS Engine (#2407) * Antrea plugin support in AKS Engine * chore: clean up * chore: use ContainerImage * chore: generated code * refactor: Updating antrea yaml to 0.2.0 Co-authored-by: Jack Francis <jackfrancis@gmail.com> * chore: lint (#2493) * test: revert change to default kubernetes.json api model example (#2494) * chore: update cloud-provider-azure components to v0.4.0 (#2473) * chore: update cloud-provider-azure components to v0.4.0 See https://github.com/kubernetes-sigs/cloud-provider-azure/releases/tag/v0.4.0 * refactor: strip MCR constant to base hostname of URL * fix: fetch Azure cloud-manager images from /oss/kubernetes/ * refactor: make audit-policy and azure-cloud-provider addons user-configurable (#2496) * chore: pre-pull k8s v1.15.7-azs (#2490) * fix: Fix some path handling in collect-windows-logs script (#2488) * docs: remove mentions of old orchestrators (#2501) * chore: Targeting dec patches for windows VHD (#2505) * refactor: move StorageClass into azure-cloud-provider addon (#2497) * add "Standard_DS3_v2" to "AcceleratedNetworking" supported list (#2509) * ci: collect logs during E2E runs (#2520) * refactor: user-configurable flannel and scheduled maintenance addons (#2517) * chore: update Azure NPM to v1.0.31 (#2521) * feat: add support for Kubernetes 1.18.0-alpha.1 (#2503) * feat: add support for Kubernetes 1.18.0-alpha.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.18.md#changelog-since-v1170 * test: add 1.18 to Jenkinsfile * ci: disable kms for 1.18 * chore: move flannel 1.18 spec to containeraddons * chore: generated code * fix: use new cloudprovider implementation for 1.18 Co-authored-by: Jack Francis <jackfrancis@gmail.com> * test: don't test non-working >= 1.16 flannel + docker (#2524) * fix: apply new master node labels for k8s v1.18+ compatibility (#2467) * fix: apply new master node labels for k8s v1.18+ compatibility * test: check master labels in the future for back-compat * feat: cleaning up old kubelet/kubeproxy logs for Windows nodes (#2504) * feat: cleaning up old kubelet/kubeproxy logs for Windows nodes * Fixing path to look for logs * generated files * refactor: standardize to "addons", deprecate "containeraddons" (#2525) * fix: configure addons before setting kubelet config (#2513) * chore: update addon-resizer (#2527) See https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.7 * fix: aci-connector region is ignored (#2535) * test: use LOCATION env var for api model in E2E tests (#2542) * fix: promote system addons to system-cluster-critical (#2533) * test: use northeurope for byok testing (#2536) * Changes for omsagent-version-ciprod01072020 * Committing generated file Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Mark Rossetti <marosset@microsoft.com> Co-authored-by: Rohit <rjaini@microsoft.com> Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com> Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com> Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com> Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com> Co-authored-by: Wenjun Wu <wenjun.wu@live.com> Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com> Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com> * deleting github merge auto-generated files * Adding back 1.17 omsagent yaml changes * Updating generated file to address build failures Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Mark Rossetti <marosset@microsoft.com> Co-authored-by: Rohit <rjaini@microsoft.com> Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com> Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com> Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com> Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com> Co-authored-by: Wenjun Wu <wenjun.wu@live.com> Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com> Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Reason for Change:
This PR makes the
ciliumnetworkPolicy component a user-configurable addon, exposed via the existingkubernetesConfig.addonsinterface:Strictly speaking, the above doesn't make sense in the absence of a
networkPolicyw/ a value of"cilium"(validation has been added for that), but this change allows for per-container image overrides to aid future development of this surface area.Manual testing validated that our current 1.16 and 1.17 implementations do not work, so validation (and documentation) has been added to that effect.
Additionally, the simple
"networkPlugin": "ciilum"input pattern has been deprecated, as it doesn't make sense: the cilium IPAM implementation depends upon artifacts delivered by the daemonset in the spec. It's possible there's a way to decompose the current, monolithic spec into IPAM- and NetworkPolicy-distinct specs, but that is an exercise for the future. Because concretely thenetworkPolicyneeds to be"cilium", thus to deliver the "cilium" addon, we validate that"networkPlugin": "ciilum"is not allowed in the api model as input without"networkPolicy": "ciilum", to avoid misleading folks that it's in fact possible to use cilium just for IPAM (again, at present, that is not possible).Issue Fixed:
Related to #2251
Requirements:
Notes: