Skip to content

WIP bootkube.sh populate complete list of etcd endpoints during bootstrap#2998

Closed
hexfusion wants to merge 2 commits intoopenshift:masterfrom
hexfusion:fix-4.4-rhcos+ceo
Closed

WIP bootkube.sh populate complete list of etcd endpoints during bootstrap#2998
hexfusion wants to merge 2 commits intoopenshift:masterfrom
hexfusion:fix-4.4-rhcos+ceo

Conversation

@hexfusion
Copy link
Copy Markdown
Contributor

@hexfusion hexfusion commented Jan 28, 2020

This PR attempts to reduce initial bootstrap complexity caused by only populating the bootstrap endpoint. By feeding apiserver the entire list during bootstrap we avoid the scenario where cluster-etcd-operator completes scaling up to 4 members. The result of this scaling is the host-etcd service is also adjusted to reflect all of the scaled etcd endpoints. Meanwhile, the cluster-kube-apiserver-operator has not yet rolled out the new static pod assets in the correct revision. So when we reap the bootstrap node we leave apiserver with a single backend endpoint pointing at the bootstrap node that is no longer alive.

In the previous version of the etcd client balancer, this would have proven overly disruptive but the new balancer handles the sub connection round-robin failover very gracefully.

We consider this a short term solution while we improve the timings and complexity around this process.

Requires openshift/cluster-etcd-operator#60

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2020
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@hexfusion
Copy link
Copy Markdown
Contributor Author

level=fatal msg="failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to fetch dependency of "Bootstrap Ignition Config": failed to fetch dependency of "Master Machines": failed to generate asset "Platform Credentials Check": validate AWS credentials: checking install permissions: error simulating policy: Throttling: Rate exceeded\n\tstatus code: 400, request id: 6cbb243d-b72a-4332-99f9-4d0f4f16e829"

limit flake

/test e2e-aws

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-upgrade

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp

One last try but it appears we will need openshift/cluster-etcd-operator#60

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2020
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@hexfusion
Copy link
Copy Markdown
Contributor Author

After testing we have decided to continue with openshift/cluster-etcd-operator#58 and will continue that work on the installer via #3005

@hexfusion hexfusion closed this Jan 28, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@hexfusion: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-fips 21aa688 link /test e2e-aws-fips
ci/prow/e2e-libvirt 21aa688 link /test e2e-libvirt
ci/prow/e2e-gcp 21aa688 link /test e2e-gcp
ci/prow/e2e-azure 21aa688 link /test e2e-azure
ci/prow/e2e-aws-scaleup-rhel7 21aa688 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants