Skip to content

etcd: remove etcd entries from mdns#1556

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mandre:etcd-records
Jun 9, 2020
Merged

etcd: remove etcd entries from mdns#1556
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mandre:etcd-records

Conversation

@mandre
Copy link
Copy Markdown
Member

@mandre mandre commented Mar 11, 2020

Similar to openshift/installer#3265, this
removes the etcd entries from mdns config because etcd no longer uses
DNS.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 11, 2020
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stop using the etcd discovery domain all together...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find where to get the base domain from. Any suggestion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the baremetal-runtime-cfg has a way to provide the cluster domain at runtime.. that might be an option?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it here to {{ .Cluster.Domain }} but many of the templates are not rendered via baremetal-runtimecfg, so I'd rather leave {{ .EtcdDiscoveryDomain }} alone until we decide to effectively get rid of it globally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it here to {{ .Cluster.Domain }} but many of the templates are not rendered via baremetal-runtimecfg, so I'd rather leave {{ .EtcdDiscoveryDomain }} alone until we decide to effectively get rid of it globally.

api-int. is plain wrong imo, it works because that's the equal today. so i don't see why we would continue to use it..?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I misunderstood your first comment - I'll make the change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just checking that the DNS server is responding. It doesn't matter what record we look up. api-int happens to be a known name that should always be there.

Copy link
Copy Markdown
Contributor

@abhinavdahiya abhinavdahiya Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just checking that the DNS server is responding. It doesn't matter what record we look up. api-int happens to be a known name that should always be there.

api-int.<cluster domain> is a known DNS name that should always be there, and not api-int.<etcd discovery domain>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Github ate the domain part of your comment. I thought you were objecting to the use of api-int in general. :-)

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Mar 12, 2020

@mandre I'd be very happy to see this and backported to 4.4. It will reduce the risk of people accidentally depending on "naughty" things

@retroflexer
Copy link
Copy Markdown

/retest

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Mar 12, 2020

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere

/cc rgolangh celebdor cybertron jcpowermac

Could you have a look? This PR touches the BM, openStack, ovirt and vsphere mDNS config.

@cybertron
Copy link
Copy Markdown
Member

I've tested this on baremetal, and while the deployment works, it appears to be breaking our loadbalancing. This is because we repurposed the SRV record for getting the haproxy backends, and if we remove it we end up with no backends and thus no load balancing.

I'm looking into whether there is another way we can handle that, but because baremetal-runtimecfg runs so early in the deployment I'm not sure whether we can use the same methods that etcd is to get the node addresses. If any other platforms are also using haproxy for loadbalancing they should probably verify that this doesn't result in an invalid haproxy config too.

@jcpowermac
Copy link
Copy Markdown
Contributor

For vSphere IPI (starting with 4.5) we were following baremetal. So if baremetal works most likely vsphere will as well. The current test is for UPI only.

@rgolangh
Copy link
Copy Markdown
Contributor

@celebdor oVirt should fail on the same thing.

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Mar 13, 2020

I've tested this on baremetal, and while the deployment works, it appears to be breaking our loadbalancing. This is because we repurposed the SRV record for getting the haproxy backends, and if we remove it we end up with no backends and thus no load balancing.

I'm looking into whether there is another way we can handle that, but because baremetal-runtimecfg runs so early in the deployment I'm not sure whether we can use the same methods that etcd is to get the node addresses. If any other platforms are also using haproxy for loadbalancing they should probably verify that this doesn't result in an invalid haproxy config too.

Indeed, I hadn't notice the broken haproxy config. Putting this on hold for now since it breaks on all platforms.
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2020
@mandre
Copy link
Copy Markdown
Member Author

mandre commented Apr 20, 2020

SRV records should no longer be a dependency for haproxy config thanks to openshift/baremetal-runtimecfg#52.

@cybertron
Copy link
Copy Markdown
Member

I verified that this works on baremetal now so should be good to go from our end.

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Apr 22, 2020

I've also checked that it worked for OpenStack. I was waiting for #1666 to merge to avoid merge conflicts but could push the rebased patch so everyone can try with their platform.

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Apr 23, 2020

/retest

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Apr 23, 2020

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere

@rgolangh
Copy link
Copy Markdown
Contributor

rgolangh commented May 3, 2020

/lgtm ovirt

Thanks @mandre !

@dougsland
Copy link
Copy Markdown
Contributor

/test e2e-aws
/test e2e-aws-scaleup-rhel7
/test e2e-vsphere
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2020
@dougsland
Copy link
Copy Markdown
Contributor

@mandre @rgolangh looks like we also need to update: templates/worker/00-worker/ovirt/files/ovirt-non-virtual-ip.yaml:121 ?

I see: api_vip, dns_vip, ingress_vip = sys.argv[1:4]

@jcpowermac
Copy link
Copy Markdown
Contributor

@patrickdillon

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the https://github.com/openshift/api/blob/95abe2d2f4223d5931e418bf8e4d3773d16b42c0/config/v1/types_infrastructure.go#L74-L78

So that we are not assuming the api-int subdomain exists on EtcdDiscoveryDomain. Because that is technically not true or part of the API.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check no longer exists, cf #1666

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this check was removed in #1775

Similar to openshift/installer#3265, this
removes the etcd entries from mdns config because etcd no longer uses
DNS.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2020
@mandre
Copy link
Copy Markdown
Member Author

mandre commented Jun 3, 2020

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere

@mandre
Copy link
Copy Markdown
Member Author

mandre commented Jun 3, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2020
@mandre
Copy link
Copy Markdown
Member Author

mandre commented Jun 3, 2020

/test e2e-openstack

@dougsland
Copy link
Copy Markdown
Contributor

/test e2e-aws-scaleup-rhel7

Copy link
Copy Markdown
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for baremetal.

@dougsland
Copy link
Copy Markdown
Contributor

/test e2e-aws-scaleup-rhel7

@sdodson
Copy link
Copy Markdown
Member

sdodson commented Jun 9, 2020

/lgtm
@ericavonb PTAL, needs approved

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2020
@ericavonb
Copy link
Copy Markdown

/lgtm

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, dougsland, ericavonb, mandre, rgolangh, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2020
@openshift-merge-robot openshift-merge-robot merged commit 97d5ed6 into openshift:master Jun 9, 2020
@cybertron
Copy link
Copy Markdown
Member

/cherry-pick release-4.5

@openshift-cherrypick-robot
Copy link
Copy Markdown

@cybertron: new pull request created: #1917

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.