Bug 1868158: gcp, azure: Handle azure vips similar to GCP#2011
Bug 1868158: gcp, azure: Handle azure vips similar to GCP#2011openshift-merge-robot merged 1 commit intoopenshift:masterfrom squeed:azure-routes-controller
Conversation
|
@squeed: This pull request references Bugzilla bug 1868158, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-azure |
|
/cc @sttts |
|
/test e2e-gcp-op |
|
oops, typo'd the image key. Good thing the tests mostly failed... |
|
/refresh |
|
/test e2e-azure |
|
/retest |
There was a problem hiding this comment.
(This isn't a behavior change, just removing some dead code and a corresponding re-indentation)
It could happen if we somehow switch to RRDNS.
There was a problem hiding this comment.
you rely on a fresh base image (i.e. reboot) to remove the old static pod?
There was a problem hiding this comment.
We always reboot for config changes today, yes.
Though this gets into #1190 and in fact due to the way the MCO works today there will be a window where both are running unfortunately.
We probably need to change the new code to at least detect the case where the old static pod exists and exit.
There was a problem hiding this comment.
I could also keep it as the same filename; the filename definitely doesn't matter.
There was a problem hiding this comment.
The old static pod doesn't matter; it writes to /run/gcp-routes while the new one is /run/cloud-routes, so they can happily coexist (and should, until the service is swapped).
There was a problem hiding this comment.
a separate commit with just the copied file from gcp would help to review the differences.
There was a problem hiding this comment.
It's pretty different from GCP, so it needs a review.
|
azure quota limits /retest |
|
/hold holding so this doesn't merge until it looks like azure does what we want. |
yuqi-zhang
left a comment
There was a problem hiding this comment.
I don't really have the background knowledge to validate the functionality, so I think I should defer the lgtm to someone with more networking knowledge.
In terms of the operation here I suppose we're really just extending the existing GCP watcher to also work on Azure, which seems fine to me.
There was a problem hiding this comment.
Did you mean to continue here?
There was a problem hiding this comment.
It'd be nice to also add some platform specific descriptions to how the service operates on that platform, so its more clear how differences are handled
There was a problem hiding this comment.
Not sure what you mean exactly; apiserver-watcher is identical on azure and gzp. I did add pointers to the cloud-provider-specific scripts, so maybe that's helpful?
There was a problem hiding this comment.
We always reboot for config changes today, yes.
Though this gets into #1190 and in fact due to the way the MCO works today there will be a window where both are running unfortunately.
We probably need to change the new code to at least detect the case where the old static pod exists and exit.
There was a problem hiding this comment.
Please always use http://redsymbol.net/articles/unofficial-bash-strict-mode/
Also it's really unfortunate we keep accumulating this nontrivial bash code; like I said in the OVS review it is possible today to have this in the MCD since we pull that binary and execute on the host.
There was a problem hiding this comment.
I agree; I don't like adding all this bash. If it helps, I extract it and run it through shellcheck automatically. I could probably add that to make verify.
For 4.7, should we add an item to rewrite all this in go?
There was a problem hiding this comment.
I would love this to be in go! defer to @cgwalters / @runcom on whether waiting to 4.7 makes sense.
|
/approve |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/lgtm cancel I'll look at fixing it. |
This PR does the following things: - Rename gcp-routes-controller to apiserver-watcher, since it is generic - Remove obsolete service-management mode from gcp-routes-controller - Change downfile directory to /run/cloud-routes from /run/gcp-routes - Write $VIP.up as well as $VIP.down - Add an azure routes script that fixes hairpin. Background: Azure hosts cannot hairpin back to themselves over a load balancer. Thus, we need to redirect traffic to the apiserver vip to ourselves via iptables. However, we should only do this when our local apiserver is running. The apiserver-watcher drops a $VIP.up and $VIP.down file, accordingly, depending on the state of the apiserver. Then, we add or remove iptables rules that short-circuit the load balancer. Unlike GCP, we don't need to do this for external traffic, only local clients.
|
I'm holding the mutex 🔒 around force pushing updates here. |
|
/test e2e-azure |
|
/retest |
Thanks |
|
OK we have a green azure run here: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2011/pull-ci-openshift-machine-config-operator-master-e2e-azure/1304091886448807936 |
|
Confirmed we fixed the ordering cycle by looking at the journal from the current run versus the previous: /approve |
|
@squeed: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Eh we had prior approvals on the old code and the new one just fixes systemd ordering issues so |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, mfojtik, squeed The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@squeed: All pull requests linked via external trackers have merged: Bugzilla bug 1868158 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
YAY! |
The original introduction of this service probably used `gcpRoutesController` which happens to be the same as the MCO image because we didn't have a reference to it, and plumbing the image substitution through all the abstraction layers in the code is certainly not obvious. Prep for openshift#2011 which wants to abstract the GCP work to also handle Azure and it was confusing that `machine-config-daemon-pull.service` was referencing an image with a GCP name.
This PR does the following things:
Background: Azure hosts cannot hairpin back to themselves over a load balancer. Thus, we need to redirect traffic to the apiserver vip to ourselves via iptables. However, we should only do this when our local apiserver is running.
The apiserver-watcher drops a $VIP.up and $VIP.down file, accordingly, depending on the state of the apiserver. Then, we add or remove iptables rules that short-circuit the load balancer.
Unlike GCP, we don't need to do this for external traffic, only local clients.
- How to verify it
Install on azure, ensure connections to the internal API load balancer are reliable - both when the local apiserver process is running and stopped.
- Description for the changelog
Masters on azure can now reliably connect to the apiserver service, without encountering hairpin issues