Skip to content

Bug 1895360: pkg/daemon: don't delete a file if its replaced with a dropin#2196

Merged
openshift-merge-robot merged 3 commits intoopenshift:masterfrom
vrutkovs:check-dropins-on-stale-removal
Nov 19, 2020
Merged

Bug 1895360: pkg/daemon: don't delete a file if its replaced with a dropin#2196
openshift-merge-robot merged 3 commits intoopenshift:masterfrom
vrutkovs:check-dropins-on-stale-removal

Conversation

@vrutkovs
Copy link
Copy Markdown
Contributor

@vrutkovs vrutkovs commented Nov 3, 2020

Some systemd service settings may be defined in .Storage.Files and in .Systemd.Units.Dropins. Dropins is preferable, so users may want to migrate to the way.

However if a file representing a dropin was defined in .Storage.Files and converted into .Systemd.Units.Dropin in new version of MC, it would be placed by Ignition's dropin implementation and then garbage-collected by MCD's deleteStaleData.

This change ensures we're checking Systemd.Units.Dropins before attempting to remove this file. This is crucial for OKD 4.5 -> 4.6 upgrades - kubelet MCO config was defined as a file in 4.5 and converted into a dropin in 4.6

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2020
@vrutkovs vrutkovs force-pushed the check-dropins-on-stale-removal branch 4 times, most recently from 360b99a to 92e733e Compare November 3, 2020 11:57
@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

kikisdeliveryservice commented Nov 3, 2020

hey @vrutkovs this new func seems like it should have a test, can you add one?

@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 3, 2020

Sure, will add a unit test (verifying that it fixes OKD upgrade first). Not sure how to approach unittesting here - do you know any test which checks deleteStaleFiles?

@vrutkovs vrutkovs force-pushed the check-dropins-on-stale-removal branch 3 times, most recently from b2212a7 to b26c28c Compare November 5, 2020 10:51
@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 5, 2020

/retest

@vrutkovs vrutkovs changed the title WIP pkg/daemon: don't delete a file if its replaced with a dropin pkg/daemon: don't delete a file if its replaced with a dropin Nov 6, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2020
@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 6, 2020

/cc @kikisdeliveryservice

Added a unit test, verified that OKD 4.5 can be upgraded using MCO with this patch

@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 6, 2020

Flakes

/retest

@vrutkovs vrutkovs changed the title pkg/daemon: don't delete a file if its replaced with a dropin Bug 1895360: pkg/daemon: don't delete a file if its replaced with a dropin Nov 6, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Nov 6, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@vrutkovs: This pull request references Bugzilla bug 1895360, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1895360: pkg/daemon: don't delete a file if its replaced with a dropin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 6, 2020

/cherry-pick release-4.6

@openshift-cherrypick-robot
Copy link
Copy Markdown

@vrutkovs: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 8, 2020

/skip

e2e-gcp-op is permafailing

@vrutkovs
Copy link
Copy Markdown
Contributor Author

vrutkovs commented Nov 8, 2020

/skip

Copy link
Copy Markdown
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch should fix the bug in question, however this doesn't seem very bulletproof. Consider the following scenarios:

  1. systemd units in the unit section would have this problem as well right? if a systemd unit that was originally being written as a file gets replaced by a unit section, we should also correctly handle that.
  2. the opposite case is ironically not handled as well, where a dropin that was replaced by a file would also not work (why would you ever do this though). I guess in general anything cross-dependency between unit and file sections is not handled well.
  3. if there was a dropin originally on the system (call it version A). which the MCD made a backup of when you wrote the file via a MC (call it version B), and then you try to replace with a dropin (call it version C), it looks like the logic will end up writing version A to the file instead of version C like we want, since we perform a backup and then a no-op. A rare edge case but maybe we should account for it anyways for correctness.

Those 3 scenarios are just some that came to mind (my assessment might not be entirely correct here). I am not opposed to merging this fix as is but we should consider the interactions between files and unit sections a bit deeper if that's the route we take. I'd be ok to merge this as is as well (maybe fix scenario 3?) provided we don't regress any behaviour

Comment thread pkg/daemon/update.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff looks somewhat confusing, but this line should be the essence of the change right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I had to rework the whole function as gosec was complaining about nested ifs in e79a186 (#2196)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way github is showing diff is indeed confusing

Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

19 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@vrutkovs
Copy link
Copy Markdown
Contributor Author

/hold

Not sure why gcp-op won't pass. Holding it for >4 hours, so that fresh release image would be built

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2020
@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

/hold

Not sure why gcp-op won't pass. Holding it for >4 hours, so that fresh release image would be built

we are waiting for #2229 to merge so feel free to hold it and check it tomorrow or later so avoid endless retests

@vrutkovs
Copy link
Copy Markdown
Contributor Author

/retest
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2020
@vrutkovs
Copy link
Copy Markdown
Contributor Author

/refresh

@openshift-merge-robot openshift-merge-robot merged commit 4fca79d into openshift:master Nov 19, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@vrutkovs: All pull requests linked via external trackers have merged:

Bugzilla bug 1895360 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1895360: pkg/daemon: don't delete a file if its replaced with a dropin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@vrutkovs: new pull request created: #2246

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants