Skip to content

Drop AWS UPI control-plane Machines and compute MachineSets#1649

Merged
openshift-merge-robot merged 5 commits intoopenshift:masterfrom
wking:remove-machine-sets
May 2, 2019
Merged

Drop AWS UPI control-plane Machines and compute MachineSets#1649
openshift-merge-robot merged 5 commits intoopenshift:masterfrom
wking:remove-machine-sets

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Apr 18, 2019

I'm still testing this, but folks keep talking about it, so pushing my current status somewhere public ;).

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 18, 2019
Comment thread docs/user/aws/install_upi.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit provide instructions of using their favorite editor to edit the specific file, and also example of how it would look.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit provide instructions of using their favorite editor to edit the specific file

What's their favorite editor, sed? ;)

and also example of how it would look.

If I make this a sed command, can I skip the example? I want to minimize the examples we have to update whenever we bump something in the install-config.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For in repo docs I think sed is fine provided it's easy enough to follow and we feel it won't create a headache to maintain. For openshift-docs we need to have an example. So decide if it's easier to have them look approximately the same or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the only line we want them to change in the install-config.yaml file in this flow, I vote for sed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you're fine with sed i'm fine with it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a sed approach with 5e34eb130 -> cc7d56dea (also rebases onto master). It's a bit fiddly, for reasons discussed in the cc7d56dea commit message. Is it worth it? I'm not sure what else we'd do for POSIX-compliant YAML edits, but we could always give an example in... Go? Python?... that makes a YAML-aware edit if we don't mind leaving folks without whatever tool we pick to translate themselves instead of copy/pasting.

Comment thread docs/user/aws/install_upi.md Outdated
@wking wking force-pushed the remove-machine-sets branch 2 times, most recently from 84e5a80 to 5f7554b Compare April 18, 2019 23:38
Comment thread upi/aws/cloudformation/01_vpc.yaml Outdated
@wking wking force-pushed the remove-machine-sets branch from 5f7554b to 5e34eb1 Compare April 18, 2019 23:52
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 18, 2019
@cuppett
Copy link
Copy Markdown
Member

cuppett commented Apr 19, 2019 via email

@wking wking force-pushed the remove-machine-sets branch 2 times, most recently from cc7d56d to 0dbbc1f Compare April 23, 2019 21:59
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 23, 2019

cc7d56dea. -> 0dbbc1f80 gets us closer, but the cluster is still stuck on the authentication operator waiting for a route:

$ oc get -o jsonpath='{.status.conditions[?(@.type == "Failing")].message}{"\n"}' clusteroperator authentication
Failing: error checking current version: unable to check route health: failed to GET route: EOF

I'm tearing down to take another run...

@wking wking force-pushed the remove-machine-sets branch from 0dbbc1f to aa5238f Compare April 23, 2019 22:46
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 23, 2019

In both cases, burrowing cluster specific names in at the VPC level will cause harm.

Summarizing some already-brief out-of-band discussion, my main concern is with pushing broken manifests into the cluster. The three approaches to avoiding that are:

a. Matching your environment to the installer's existing expectations (the route I'm taking here).
b. create manifests and removing manifests you don't need (I'm taking this route here too).
c. create manifests and edit your manifests to match your environment.

My main concern with (c), and to a lesser extend with the currently-unavoidable (b), is that the manifests target is not stable per here and here. We need to revisit that now that we're no longer SemVering the installer, but by having the example flow stick to (a) as much as possible with words addressing (b) and (c), we will hopefully be more stable than by building the example flow around (c) with words addressing (a) and (b).

@wking wking force-pushed the remove-machine-sets branch from aa5238f to c94286b Compare April 24, 2019 04:53
@wking wking changed the title WIP: Drop AWS UPI control-plane Machines Drop AWS UPI control-plane Machines Apr 24, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2019
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 24, 2019

Ok, this pass everything went through. So I've pulled the WIP tag off the subject and dropped the FIXME note with aa5238fd2 -> c94286bc9. Ready for another round of review :).

Comment thread docs/user/aws/install_upi.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be rm -f openshift/99_openshift-cluster-api_master-*.yaml, or else would hit such error:
$ ./openshift-install create ignition-configs --dir ./upi_2019-04-24-23-57-58
FATAL failed to fetch Bootstrap Ignition Config: failed to load asset "Master Machines": master machine manifests are required if you also provide openshift/99_openshift-cluster-api_master-user-data-secret.yaml

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ ./openshift-install create ignition-configs --dir ./upi_2019-04-24-23-57-58
FATAL failed to fetch Bootstrap Ignition Config: failed to load asset "Master Machines": master machine manifests are required if you also provide openshift/99_openshift-cluster-api_master-user-data-secret.yaml

What version installer are you using? We dropped that check in March.

Copy link
Copy Markdown
Contributor

@jianlinliu jianlinliu Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# ./openshift-install version ./openshift-install v4.1.0-201904211700-dirty built from commit f3b726cc151f5a3d66bc7e23e81b3013f1347a7e release image registry.svc.ci.openshift.org/ocp/release:4.1.0-rc.0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a broken commit hash:

$ git show f3b726cc151f5a3d66bc7e23e81b3013f1347a7e
fatal: bad object f3b726cc151f5a3d66bc7e23e81b3013f1347a7e

Maybe https://bugzilla.redhat.com/show_bug.cgi?id=1694183 . That makes it hard to know what code was included in your installer. Checking 4.1.0-rc.0:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep ' installer '
  installer                                     https://github.com/openshift/installer                                     49092083ce354636d63dd25140ba10e113b54452

That commit exists in our repository, and it's well after #1493 landed:

$ git log --first-parent --format='%ad %h %d %s' --date=iso -96 49092083ce354636d63dd25140ba10e113b54452 | tail -n1
2019-03-28 19:03:31 -0700 b4403e685  Merge pull request #1493 from abhinavdahiya/fix_machines_platform

Pulling the installer out of that release with a recent oc gives a nominal version that matches yours, which suggests that your buggy installer does include #1493:

$ oc version --client
Client Version: version.Info{Major:"4", Minor:"0+", GitVersion:"v4.0.0-alpha.0+c0c0569-2185", GitCommit:"c0c0569cfd", GitTreeState:"clean", BuildDate:"2019-04-25T19:48:12Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
$ oc adm release extract --command openshift-install quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0
$ sha256sum openshift-install
f8ee2f8a466fc551347b8fa8fad897a08a408fe85ca7ebab97dcb08fe6e1ce73  openshift-install
$ ./openshift-install version
./openshift-install v4.1.0-201904211700-dirty
built from commit f3b726cc151f5a3d66bc7e23e81b3013f1347a7e
release image quay.io/openshift-release-dev/ocp-release@sha256:345ec9351ecc1d78c16cf0853fe0ef2d9f48dd493da5fdffc18fa18f45707867

Pulling the ART-build tarball from the mirrors gives the same version information:

$ wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.1.0-rc.0/openshift-install-linux-4.1.0-rc.0.tar.gz
$ sha256sum openshift-install-linux-4.1.0-rc.0.tar.gz 
b187f874dbfd81a378fe83bba11b81882d50f2855dd6c6bb2d9c4a7724708009  openshift-install-linux-4.1.0-rc.0.tar.gz
$ tar xvf openshift-install-linux-4.1.0-rc.0.tar.gz 
openshift-install
$ sha256sum openshift-install
f8ee2f8a466fc551347b8fa8fad897a08a408fe85ca7ebab97dcb08fe6e1ce73  openshift-install
$ ./openshift-install version
./openshift-install v4.1.0-201904211700-dirty
built from commit f3b726cc151f5a3d66bc7e23e81b3013f1347a7e
release image quay.io/openshift-release-dev/ocp-release@sha256:345ec9351ecc1d78c16cf0853fe0ef2d9f48dd493da5fdffc18fa18f45707867

so at least we're consistent. It's not clear to me why your version output includes release image registry.svc.ci.openshift.org/ocp/release:4.1.0-rc.0; oc adm release extract is supposed to be pinning that by digest, not by a potentially-floating tag. But I don't see your string in this installer:

$ strings openshift-install | grep 'required if you also provide'
...no hits...

And I cannot reproduce your error:

$ mkdir testing
$ cat <<EOF >test/install-config.yaml
> apiVersion: v1
> baseDomain: devcluster.openshift.com
> metadata:
>   name: wking
> platform:
>   aws:
>     region: us-west-2
> pullSecret: |
>   {
>     REDACTED
>   }
> EOF
$ AWS_PROFILE=openshift-dev ./openshift-install --dir testing create manifests
INFO Consuming "Install Config" from target directory
$ rm -f testing/openshift/99_openshift-cluster-api_master-machines-*.yaml
removed ‘testing/openshift/99_openshift-cluster-api_master-machines-0.yaml’
removed ‘testing/openshift/99_openshift-cluster-api_master-machines-1.yaml’
removed ‘testing/openshift/99_openshift-cluster-api_master-machines-2.yaml’
$ AWS_PROFILE=openshift-dev ./openshift-install --dir testing create ignition-configs
INFO Consuming "Openshift Manifests" from target directory 
INFO Consuming "Common Manifests" from target directory 
INFO Consuming "Master Machines" from target directory 
INFO Consuming "Worker Machines" from target directory 
$ echo "$?"
0

So, where are we diverging?

Copy link
Copy Markdown
Contributor

@jianlinliu jianlinliu Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why your version output includes release image registry.svc.ci.openshift.org/ocp/release:4.1.0-rc.0; oc adm release extract is supposed to be pinning that by digest, not by a potentially-floating tag.

What difference I only see is you was extracting installer from quay.io, but I was using some internal nightly pre-release build to extract installer.

I removed my previous installer bin, and extract it. This time, I can not reproduced either. Really interesting.
$ oc adm release extract --command='openshift-install' registry.svc.ci.openshift.org/ocp/release:4.1.0-rc.0

$ rm -f ./test1/openshift/99_openshift-cluster-api_master-machines-*.yaml

$ ./openshift-install create ignition-configs --dir test1
INFO Consuming "Master Machines" from target directory
INFO Consuming "Openshift Manifests" from target directory
INFO Consuming "Worker Machines" from target directory
INFO Consuming "Common Manifests" from target directory

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my testing result, both 'api' and 'api-int' are required.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything inside the cluster should use api-int. Are you sure you picked up the other related changes when testing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, let me check my local shell script if update accordingly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my fault, forget to change IgnitionLocation url accordingly. Ignore this comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seem using 'api-int' would hit x509 issue. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1697968#c13 more details.

@wking wking force-pushed the remove-machine-sets branch from 79b05ec to cf205b7 Compare April 30, 2019 08:06
@wking wking changed the title Drop AWS UPI control-plane Machines Drop AWS UPI control-plane Machines and compute MachineSets Apr 30, 2019
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 30, 2019

Based on some offline discussion, I've pushed 79b05ec63 -> cf205b774 removing the compute MachineSets as well. That gets us down to a single track (CloudFormation-launched compute nodes) in the docs and simplifies things (e.g. we no longer need the hairy sed for zeroing compute replicas). Folks are still free to opt-in to the machine API if they want, but that will be up to docs outside this repository (e.g. those in flight with openshift/machine-api-operator#306 or in openshift/openshift-docs).

@cuppett
Copy link
Copy Markdown
Member

cuppett commented Apr 30, 2019

Based on some offline discussion, I've pushed 79b05ec -> cf205b7 removing the compute MachineSets as well. That gets us down to a single track (CloudFormation-launched compute nodes) in the docs and simplifies things (e.g. we no longer need the hairy sed for zeroing compute replicas). Folks are still free to opt-in to the machine API if they want, but that will be up to docs outside this repository (e.g. those in flight with openshift/machine-api-operator#306 or in openshift/openshift-docs).

By doing this, we no longer need the changes to 01_vpc.yaml. Can we revert those? Keeping the VPC to a minimal requirement (having kubernetes.io = shared tag on the subnets), we identify the needs of the ingress operator without being prescriptive in subnet naming and locking it to a cluster (the default manifests in the machine API was what was driving it).

@trown
Copy link
Copy Markdown

trown commented Apr 30, 2019

/retest

@trown
Copy link
Copy Markdown

trown commented Apr 30, 2019

/test e2e-openstack

wking added 2 commits May 2, 2019 09:27
Folks are free to opt-in to the machine API during a UPI flow, but
creating Machine(Set)s that match their host environment requires
matching a few properties (subnet, securityGroups, ...).  Our default
templates are unlikely to do that out of the box, so just remove them
with the standard flow.  Users who want to wade in can do so, and I've
adjusted our CloudFormation templates to set the same tags as our IPI
assets to make this easier.  But with the rm call, other folks don't
have to worry about broken Machine(Set)s in their cluster confusing
the machine API or other admins.

The awkward join syntax for subnet names is because YAML doesn't
support nesting !s [1]:

  You can't nest short form functions consecutively, so a pattern like
  !GetAZs !Ref is invalid.

Also fix a few unrelated nits, e.g. the unused VpcId property in
06_cluster_worker_node.yaml.

[1]: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-getavailabilityzones.html#w2ab1c21c24c36c17b8
Catching up with 13e4b70 (data/aws: create an api-int dns name,
2019-04-11, openshift#1601), now that 052fcee (asset/manifests: use internal
apiserver name, 2019-04-17, openshift#1633) has moved some internal assets over
to that name.
@wking wking force-pushed the remove-machine-sets branch from cf205b7 to 3ed1eb7 Compare May 2, 2019 16:28
Comment thread upi/aws/cloudformation/01_vpc.yaml Outdated
wking added 2 commits May 2, 2019 10:26
The point of these templates is to provide the bare minimum needed to
get our cluster off the ground.  Things like resource names and
auxilliary tags are nice to have in a production deploy for admin
orientation, but you can have a healthy cluster without them, so I'm
culling them in this commit.  Users, who may not be using our
CloudFormation templates at all, should now have an easier time seeing
what they need to set, and where they can go their own way.

We need to keep the kubernetes.io/cluster/... tags on instances to
avoid:

  May 01 21:31:53 ip-10-0-57-198 hyperkube[2311]: E0501 21:31:53.061462    2311 tags.go:95] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
To make it easier to recover these for:

  $ openshift-install gather bootstrap ...
@wking wking force-pushed the remove-machine-sets branch from 4d3a5c4 to d23d02a Compare May 2, 2019 17:28
@abhinavdahiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 2, 2019
This isn't strictly required, because we're removing the resulting
MachineSets right afterwards.  It's setting the stage for a future
where 'replicas: 0' means "no MachineSets" instead of "we'll make you
some dummy MachineSets".  And we can always remove the sed later if
that future ends up not happening.

The sed is based on [1], to replace 'replicas' only for the compute
pool (and not the control-plane pool).  While it should be
POSIX-compliant (and not specific to GNU sed or other
implementations), it is a bit finicky for a few reasons:

* The range matching will not detect matches in the first line, but
  'replicas' will always follow its parent 'compute', so we don't have
  to worry about first-line matches.

* 'compute' sorts before 'controlPlane', so we don't have to worry
  about their 'replicas: ' coming first.

* 'baseDomain' is the only other property that sorts before 'compute',
  but 'replicas: ' is not a legal substring for its domain-name value,
  so we don't have to worry about accidentally matching that.

* While all of the above mean we're safe for now, this approach could
  break down if we add additional properties in the future that sort
  before 'compute' but do allow 'replicas: ' as a valid substring.

[1]: https://stackoverflow.com/a/33416489
@wking wking force-pushed the remove-machine-sets branch from d23d02a to c22d042 Compare May 2, 2019 18:41
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 2, 2019
@abhinavdahiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 2, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/retest

@openshift-merge-robot openshift-merge-robot merged commit 708d7dd into openshift:master May 2, 2019
@wking wking deleted the remove-machine-sets branch May 4, 2019 02:53
wking added a commit to wking/openshift-installer that referenced this pull request Sep 24, 2019
We grew this in c22d042 (docs/user/aws/install_upi: Add 'sed' call
to zero compute replicas, 2019-05-02, openshift#1649) to set the stage for
changing the 'replicas: 0' semantics from "we'll make you some dummy
MachineSets" to "we won't make you MachineSets".  But that hasn't
happened yet, and since 64f96df (scheduler: Use schedulable masters
if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0' for
compute has also meant "add the 'worker' role to control-plane nodes".
That leads to racy problems when ingress comes through a load
balancer, because Kubernetes load balancers exclude control-plane
nodes from their target set [1,2] (although this may get relaxed
soonish [3]).  If the router pods get scheduled on the control plane
machines due to the 'worker' role, they are not reachable from the
load balancer and ingress routing breaks [4].  Seth says:

> pod nodeSelectors are not like taints/tolerations.  They only have
> effect at scheduling time.  They are not continually enforced.

which means that attempting to address this issue as a day-2 operation
would mean removing the 'worker' role from the control-plane nodes and
then manually evicting the router pods to force rescheduling.  So
until we get the changes from [3], it's easier to just drop this
section and keep the 'worker' role off the control-plane machines
entirely.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1
[2]: kubernetes/kubernetes#65618
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073
wking added a commit to wking/openshift-installer that referenced this pull request Oct 2, 2019
We grew replicas-zeroing in c22d042 (docs/user/aws/install_upi: Add
'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the
stage for changing the 'replicas: 0' semantics from "we'll make you
some dummy MachineSets" to "we won't make you MachineSets".  But that
hasn't happened yet, and since 64f96df (scheduler: Use schedulable
masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0'
for compute has also meant "add the 'worker' role to control-plane
nodes".  That leads to racy problems when ingress comes through a load
balancer, because Kubernetes load balancers exclude control-plane
nodes from their target set [1,2] (although this may get relaxed
soonish [3]).  If the router pods get scheduled on the control plane
machines due to the 'worker' role, they are not reachable from the
load balancer and ingress routing breaks [4].  Seth says:

> pod nodeSelectors are not like taints/tolerations.  They only have
> effect at scheduling time.  They are not continually enforced.

which means that attempting to address this issue as a day-2 operation
would mean removing the 'worker' role from the control-plane nodes and
then manually evicting the router pods to force rescheduling.  So
until we get the changes from [3], we can either drop the zeroing [5]
or adjust the scheduler configuration to remove the effect of the
zeroing.  In both cases, this is a change we'll want to revert later
once we bump Kubernetes to pick up a fix for the service load-balancer
targets.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1
[2]: kubernetes/kubernetes#65618
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073
[5]: openshift#2402
jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
We grew replicas-zeroing in c22d042 (docs/user/aws/install_upi: Add
'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the
stage for changing the 'replicas: 0' semantics from "we'll make you
some dummy MachineSets" to "we won't make you MachineSets".  But that
hasn't happened yet, and since 64f96df (scheduler: Use schedulable
masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0'
for compute has also meant "add the 'worker' role to control-plane
nodes".  That leads to racy problems when ingress comes through a load
balancer, because Kubernetes load balancers exclude control-plane
nodes from their target set [1,2] (although this may get relaxed
soonish [3]).  If the router pods get scheduled on the control plane
machines due to the 'worker' role, they are not reachable from the
load balancer and ingress routing breaks [4].  Seth says:

> pod nodeSelectors are not like taints/tolerations.  They only have
> effect at scheduling time.  They are not continually enforced.

which means that attempting to address this issue as a day-2 operation
would mean removing the 'worker' role from the control-plane nodes and
then manually evicting the router pods to force rescheduling.  So
until we get the changes from [3], we can either drop the zeroing [5]
or adjust the scheduler configuration to remove the effect of the
zeroing.  In both cases, this is a change we'll want to revert later
once we bump Kubernetes to pick up a fix for the service load-balancer
targets.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1
[2]: kubernetes/kubernetes#65618
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073
[5]: openshift#2402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants