Skip to content

Bug 1936022: Fix spurious reconciliation of DNS daemonset and service#243

Merged
openshift-merge-robot merged 4 commits intoopenshift:masterfrom
Miciah:log-diffs-when-reconciling-resources
Mar 23, 2021
Merged

Bug 1936022: Fix spurious reconciliation of DNS daemonset and service#243
openshift-merge-robot merged 4 commits intoopenshift:masterfrom
Miciah:log-diffs-when-reconciling-resources

Conversation

@Miciah
Copy link
Copy Markdown
Contributor

@Miciah Miciah commented Mar 4, 2021

Log diffs when reconciling resources

This PR improves log output when reconciling managed resources to log the diff between the current and updated resources when performing an update.

  • pkg/operator/controller/controller_cluster_role.go (updateDNSClusterRole):
  • pkg/operator/controller/controller_dns_daemonset.go (updateDNSDaemonSet):
  • pkg/operator/controller/controller_dns_service.go (updateDNSService):
  • pkg/operator/controller/controller_service_monitor.go (updateDNSServiceMonitor): Log the diff between the current and updated resources when performing an update.

daemonsetConfigChanged: Fix for terminationGracePeriodSeconds defaulting

When comparing daemonsets to determine whether an update is required, treat implicit and explicit default values for terminationGracePeriodSeconds as equal.

Before this PR, the update logic would keep trying to revert the default value that the API set for the daemonset's terminationGracePeriodSeconds field.

Follow-up to #205, which added the logic which reconciles terminationGracePeriodSeconds.

  • pkg/operator/controller/controller_dns_daemonset.go (daemonsetConfigChanged): Use the new cmpTerminationGracePeriodSeconds function to compare terminationGracePeriodSeconds values so that two daemonsets are considered equal if the only difference between them is that one does not specify terminationGracePeriodSeconds and the other daemonset has the default value.
    (cmpTerminationGracePeriodSeconds): New function.
  • pkg/operator/controller/controller_dns_daemonset_test.go (TestDaemonsetConfigChanged): Verify that daemonsetConfigChanged returns false if the only mutation is that terminationGracePeriodSeconds has been updated from empty to the default value.

serviceChanged: Fix for clusterIPs defaulting

Ignore the clusterIPs field when comparing services to determine whether an update is required.

Before this PR, the update logic would keep trying to revert the value that the API set for the service's clusterIPs field.

The clusterIPs field is new in Kubernetes 1.20 (OpenShift 4.7).

  • pkg/operator/controller/controller_dns_service.go (serviceChanged): Ignore the services' clusterIPs field.
  • pkg/operator/controller/controller_dns_service_test.go (TestDNSServiceChanged): Verify that serviceChanged returns false if the only mutation is that the clusterIPs field's value has changed.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 5, 2021

/test e2e-aws-operator

@sgreene570
Copy link
Copy Markdown
Contributor

This is fantastic! Should we look into doing something similar for the ingress operator?

@sgreene570
Copy link
Copy Markdown
Contributor

@Miciah Miciah force-pushed the log-diffs-when-reconciling-resources branch 2 times, most recently from e6cc6c6 to 0403d59 Compare March 6, 2021 00:25
@Miciah Miciah changed the title Log diffs when reconciling resources Bug 1936022: Fix spurious reconciliation of DNS daemonset and service Mar 6, 2021
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@Miciah: An error was encountered updating to the POST state for bug 1936022 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. code 32000: Red Hat Bugzilla's database reported a query serialization error. Most likely this occurred because another user or process attempted to change the same data that you were attempting to change. Please press Back and retry the transaction.
 UPDATE bugs SET bug_status = ? WHERE bug_id = ? at /var/www/html/bugzilla/Bugzilla/Object.pm line 544. 

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

Bug 1936022: Fix spurious reconciliation of DNS daemonset and service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Mar 6, 2021
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@Miciah: This pull request references Bugzilla bug 1936022, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @lihongan

Details

In response to this:

Bug 1936022: Fix spurious reconciliation of DNS daemonset and service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 6, 2021
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 6, 2021

This is fantastic! Should we look into doing something similar for the ingress operator?

Thanks! Yeah, the ingress operator has similar diff output for deployments but not for other resources, and it would be useful to have it for all resources. I will probably make a similar change in the ingress operator for the resources other than deployments.

Looking at DNS operator logs with these changes
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/243/pull-ci-openshift-cluster-dns-operator-master-e2e-aws/1367511193526210560/artifacts/e2e-aws/gather-extra/artifacts/pods/openshift-dns-operator_dns-operator-5cb7fcdbbc-rz42g_dns-operator.log

There's an abundance of white-space in the diff outputs... anything we can do to help with that? (Or do we not really care?)

I don't know a good way to improve the whitespace with our logging solution. I used sed -ne 's/.*msg=\(".*"\)$/\1/gp' | jq -r . to extract and format the relevant information.

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 6, 2021

We should backport this to OpenShift 4.7, which added the clusterIPs field in the API and added the logic to reconcile terminationGracePeriodSeconds in the ingress operator.
/cherry-pick release-4.7

@openshift-cherrypick-robot
Copy link
Copy Markdown

@Miciah: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you.

Details

In response to this:

We should backport this to OpenShift 4.7, which added the clusterIPs field in the API and added the logic to reconcile terminationGracePeriodSeconds in the ingress operator.
/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 8, 2021

/retest

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 8, 2021

/test e2e-upgrade

@sgreene570
Copy link
Copy Markdown
Contributor

I don't know a good way to improve the whitespace with our logging solution. I used sed -ne 's/.*msg=\(".*"\)$/\1/gp' | jq -r . to extract and format the relevant information.

Good enough for me 😁

{
description: "if the termination grace period is defaulted",
mutate: func(daemonset *appsv1.DaemonSet) {
thirty := int64(30)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the API defaulted value right from the corev1 API?
(Not that I anticipate the value changing anytime soon, but IMO using the API constant would make the test cleaner)

https://github.com/kubernetes/api/blob/master/core/v1/types.go#L2601

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty! Thanks for the pointer! Done.

@sgreene570
Copy link
Copy Markdown
Contributor

sgreene570 commented Mar 8, 2021

This PR improves log output when reconciling managed resources to log the diff between the current and updated resources when performing an update.

  • pkg/operator/controller/controller_cluster_role.go (updateDNSClusterRole):
  • pkg/operator/controller/controller_dns_daemonset.go (updateDNSDaemonSet):
  • pkg/operator/controller/controller_dns_service.go (updateDNSService):
  • pkg/operator/controller/controller_service_monitor.go (updateDNSServiceMonitor): Log the diff between the current and updated resources when performing an update.

@Miciah any reason to not also print diffs for corefile config map updates?

Miciah added 4 commits March 21, 2021 23:22
This commit improves log output when reconciling managed resources to log
the diff between the current and updated resources when performing an
update.

* pkg/operator/controller/controller_cluster_role.go
(updateDNSClusterRole):
* pkg/operator/controller/controller_dns_configmap.go (updateDNSConfigMap):
* pkg/operator/controller/controller_dns_daemonset.go (updateDNSDaemonSet):
* pkg/operator/controller/controller_dns_service.go (updateDNSService):
* pkg/operator/controller/controller_service_monitor.go
(updateDNSServiceMonitor): Log the diff between the current and updated
resources when performing an update.
When comparing daemonsets to determine whether an update is required, treat
implicit and explicit default values for terminationGracePeriodSeconds as
equal.

Before this commit, the update logic would keep trying to revert the
default value that the API set for the daemonset's
terminationGracePeriodSeconds field.

Follow-up to commit f094ddf, which added
the logic which reconciles terminationGracePeriodSeconds.

This commit is related to bug 1936022.

https://bugzilla.redhat.com/show_bug.cgi?id=1936022

* pkg/operator/controller/controller_dns_daemonset.go
(daemonsetConfigChanged): Use the new cmpTerminationGracePeriodSeconds
function to compare terminationGracePeriodSeconds values so that two
daemonsets are considered equal if the only difference between them is that
one does not specify terminationGracePeriodSeconds and the other daemonset
has the default value.
(cmpTerminationGracePeriodSeconds): New function.
* pkg/operator/controller/controller_dns_daemonset_test.go
(TestDaemonsetConfigChanged): Verify that daemonsetConfigChanged returns
false if the only mutation is that terminationGracePeriodSeconds has been
updated from empty to the default value.
* pkg/operator/controller/controller_dns_daemonset.go (volumeDefaultMode):
Delete const.
(cmpConfigMapVolumeSource): Use corev1.ConfigMapVolumeSourceDefaultMode.
(cmpSecretVolumeSource): Use corev1.SecretVolumeSourceDefaultMode.
* pkg/operator/controller/controller_dns_daemonset_test.go
(TestDaemonsetConfigChanged): Use corev1.ConfigMapVolumeSourceDefaultMode
and corev1.SecretVolumeSourceDefaultMode.
Ignore the clusterIPs field when comparing services to determine whether an
update is required.

Before this commit, the update logic would keep trying to revert the value
that the API set for the service's clusterIPs field.

The clusterIPs field is new in Kubernetes 1.20 (OpenShift 4.7).

This commit is related to bug 1936022.

https://bugzilla.redhat.com/show_bug.cgi?id=1936022

* pkg/operator/controller/controller_dns_service.go
(serviceChanged): Ignore the services' clusterIPs field.
* pkg/operator/controller/controller_dns_service_test.go
(TestDNSServiceChanged): Verify that serviceChanged returns false if the
only mutation is that the clusterIPs field's value has changed.
@Miciah Miciah force-pushed the log-diffs-when-reconciling-resources branch from 0403d59 to d90ad01 Compare March 22, 2021 03:29
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 22, 2021

@Miciah any reason to not also print diffs for corefile config map updates?

Nope! Done. Thanks!

@sgreene570
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2021
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4d6ef5f into openshift:master Mar 23, 2021
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@Miciah: All pull requests linked via external trackers have merged:

Bugzilla bug 1936022 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1936022: Fix spurious reconciliation of DNS daemonset and service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@Miciah: new pull request created: #250

Details

In response to this:

We should backport this to OpenShift 4.7, which added the clusterIPs field in the API and added the logic to reconcile terminationGracePeriodSeconds in the ingress operator.
/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants