Skip to content

[baremetal & friends] Move on-prem api-int record to dnsmasq#2374

Closed
cybertron wants to merge 1 commit intoopenshift:masterfrom
cybertron:api-int-dnsmasq
Closed

[baremetal & friends] Move on-prem api-int record to dnsmasq#2374
cybertron wants to merge 1 commit intoopenshift:masterfrom
cybertron:api-int-dnsmasq

Conversation

@cybertron
Copy link
Copy Markdown
Member

This is in preparation for moving the cluster-hosted network services
to a separate operator. With coredns no longer running as a static
pod, it will not be usable for providing the api-int record needed
for the node to register.

We decided to use dnsmasq instead of /etc/hosts because when a
deployer wants to use an external loadbalancer it will be necessary
to change the api-int record. If it's in /etc/hosts, that will require
restarting many/all of the pods to pick up the change. Using dnsmasq
allows us to just change the record in dnsmasq and SIGHUP it.

To allow dnsmasq and coredns to coexist on the node, coredns is moved
to port 5333 and dnsmasq has a server entry added to send queries
for the cluster domain to coredns.

- Description for the changelog
Move api-int record for on-prem platforms to dnsmasq service.

@cybertron
Copy link
Copy Markdown
Member Author

/hold
/cc @yboaron @celebdor

The operator design is still in progress, but this is what we're currently planning. I've tested it with the operator and it works.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2021
@cybertron
Copy link
Copy Markdown
Member Author

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere

@darkmuggle
Copy link
Copy Markdown

/hold

@crawford @miabbott looks like others had my idea too -- moving from /etc/hosts to dnsmasq.

From the perspective of the MCO, I think using dnsmasq is cleaner than manipulating /etc/hosts. I like this PR a lot better than the other PRs for manipulating hostnames.

Given the discussions about dnsmasq and its supportability, accepting this PR would make dnsmasq an explicit dependency.
Before we can accept this PR, we'll need ACK's from the RHCOS team that we're good to use dnsmasq.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps using this as an ExecStartPre and then using ExecStart=/usr/bin/dnsmasq -k?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This is a holdover from the nodeip-configuration service I stole the pattern from, but in that case the podman call is the only thing being run.

@cybertron
Copy link
Copy Markdown
Member Author

/hold

@crawford @miabbott looks like others had my idea too -- moving from /etc/hosts to dnsmasq.

From the perspective of the MCO, I think using dnsmasq is cleaner than manipulating /etc/hosts. I like this PR a lot better than the other PRs for manipulating hostnames.

One of our primary motivations for wanting this over /etc/hosts is that we need to be able to modify the records, potentially after initial deployment. With /etc/hosts I believe we'd have to restart all of the pods on the system to pick up changes. With dnsmasq, we just make the change, SIGHUP it (or the dbus equivalent), and all of the pods will use the new address immediately.

@ashcrow
Copy link
Copy Markdown
Member

ashcrow commented Feb 10, 2021

Before we can accept this PR, we'll need ACK's from the RHCOS team that we're good to use dnsmasq.

I've spoken with a few folks and the consensus is that using dnsmasq is acceptable.

@darkmuggle
Copy link
Copy Markdown

/unhold
/retest

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2021
@darkmuggle darkmuggle added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2021
@yboaron
Copy link
Copy Markdown
Contributor

yboaron commented Feb 11, 2021

Don't you need to update CoreDNS port also for 'friends' platforms files (e.g: https://github.com/openshift/machine-config-operator/blob/master/templates/common/vsphere/files/coredns-corefile.yaml) ?

@cybertron
Copy link
Copy Markdown
Member Author

Don't you need to update CoreDNS port also for 'friends' platforms files (e.g: https://github.com/openshift/machine-config-operator/blob/master/templates/common/vsphere/files/coredns-corefile.yaml) ?

Hmm, that seems bad. It means both the on-prem and the vsphere template are going to be written to the same location. I guess it must happen to work out, but we should probably work with the vsphere team to converge those files.

This is in preparation for moving the cluster-hosted network services
to a separate operator. With coredns no longer running as a static
pod, it will not be usable for providing the api-int record needed
for the node to register.

We decided to use dnsmasq instead of /etc/hosts because when a
deployer wants to use an external loadbalancer it will be necessary
to change the api-int record. If it's in /etc/hosts, that will require
restarting many/all of the pods to pick up the change. Using dnsmasq
allows us to just change the record in dnsmasq and SIGHUP it.

To allow dnsmasq and coredns to coexist on the node, coredns is moved
to port 5333 and dnsmasq has a server entry added to send queries
for the cluster domain to coredns.
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
To complete the pull request process, please assign kikisdeliveryservice after the PR has been reviewed.
You can assign the PR to them by writing /assign @kikisdeliveryservice in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cybertron
Copy link
Copy Markdown
Member Author

Okay, I moved the config rendering to Pre and removed the Reload command because we aren't using it and I'm not sure it was working correctly anyway. I also have #2410 up to de-dupe the Corefiles because I think that's something we should do anyway. We either need to merge that or I'll need to change the port in those configs as well.

/hold

@cybertron
Copy link
Copy Markdown
Member Author

This is probably going to be superseded by #2450 but since that's a more significant change there's a possibility it will be nacked by the associated enhancement review.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2021

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 50839585c39ebbd35ddd68a537bc09db01b511fc link /test e2e-openstack
ci/prow/e2e-ovirt 50839585c39ebbd35ddd68a537bc09db01b511fc link /test e2e-ovirt
ci/prow/e2e-vsphere 50839585c39ebbd35ddd68a537bc09db01b511fc link /test e2e-vsphere
ci/prow/e2e-metal-ipi eb88780 link /test e2e-metal-ipi
ci/prow/okd-e2e-aws eb88780 link /test okd-e2e-aws
ci/prow/e2e-agnostic-upgrade eb88780 link /test e2e-agnostic-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2021
@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2021
@openshift-bot
Copy link
Copy Markdown
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci Bot closed this Oct 19, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 19, 2021

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants