Skip to content

Bug 2070805: pkg/cvo/updatepayload: Prune previous payload downloads#760

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:robust-repeat-version-collection
Apr 4, 2022
Merged

Bug 2070805: pkg/cvo/updatepayload: Prune previous payload downloads#760
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:robust-repeat-version-collection

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Apr 4, 2022

Avoid:

mv: inter-device move failed: '/manifests' to '/etc/cvo/updatepayloads/HbO7IDc7tyIg9utw3sd_tg/manifests/manifests'; unable to remove target: Directory not empty

by pruning all scratch directories before downloading a new payload. Also shift the job pruning up here too, so it's all collected together. This avoids sticking subsequent download jobs after a corrupted download attempt (e.g. if something terminates a download job part way through its mv call, subsequent download ). Reproducing this issue locally, with /tmp on a different filesystem:

$ mkdir -p a /tmp/a/a
$ /bin/mv a /tmp
/bin/mv: inter-device move failed: 'a' to '/tmp/a'; unable to remove target: Directory not empty
$ /bin/mv --version
mv (GNU coreutils) 8.31
...

The directory pruning will also avoid leaking previous sets of manifests. Before this commit, the /etc/cvo/updatepayloads directories on the control plane nodes would gradually accumulate more and more release manifest sets as new targets were downloaded for inspection. The manifest sets aren't that big:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.10.6-x86_64
Extracted release payload from digest sha256:88b394e633e09dc23aa1f1a61ededd8e52478edf34b51a7dbbb21d9abde2511a created at 2022-03-23T07:34:52Z
$ du -hs manifests
6.5M    manifests

But still, better to have the pruning than leak forever.

@openshift-ci openshift-ci Bot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 4, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

@wking: This pull request references Bugzilla bug 2070805, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

Bug 2070805: pkg/cvo/updatepayload: Prune previous payload downloads

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested review from jiajliu, jottofar and vrutkovs April 4, 2022 06:17
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2022
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 4, 2022

/bugzilla refresh

@openshift-ci openshift-ci Bot removed the bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. label Apr 4, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

@wking: This pull request references Bugzilla bug 2070805, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Apr 4, 2022
@wking wking force-pushed the robust-repeat-version-collection branch 6 times, most recently from 1ed364d to 925b7d9 Compare April 4, 2022 06:56
Avoid [1]:

  mv: inter-device move failed: '/manifests' to '/etc/cvo/updatepayloads/HbO7IDc7tyIg9utw3sd_tg/manifests/manifests'; unable to remove target: Directory not empty

by pruning all scratch directories before downloading a new payload.
Also shift the job pruning up here too, so it's all collected
together.

For jobs, pending and active jobs are now added to the deletion queue
too, and we pivot to not retaining any old jobs.  Either the older
jobs were for other targets, in which case we no longer care about
them, or they are looking at the same directory we're about to target
with the job we're about to launch, in which case they will likely
step on each other's toes.  If we want to return to keeping more jobs
around, we'd need to pivot to target directory names that are more
unique than just "pullspec hash", e.g. by including a job-launch
timestap.

The directory pruning avoids sticking subsequent download jobs after a
corrupted download attempt (e.g. if something terminates a download
job part way through its 'mv' call, subsequent download ).
Reproducing this issue locally, with /tmp on a different filesystem:

  $ mkdir -p a /tmp/a/a
  $ /bin/mv a /tmp
  /bin/mv: inter-device move failed: 'a' to '/tmp/a'; unable to remove target: Directory not empty
  $ /bin/mv --version
  mv (GNU coreutils) 8.31
  ...

The directory pruning will also avoid leaking previous sets of
manifests.  Before this commit, the /etc/cvo/updatepayloads
directories on the control plane nodes would gradually accumulate more
and more release manifest sets as new targets were downloaded for
inspection.  The manifest sets aren't that big:

  $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.10.6-x86_64
  Extracted release payload from digest sha256:88b394e633e09dc23aa1f1a61ededd8e52478edf34b51a7dbbb21d9abde2511a created at 2022-03-23T07:34:52Z
  $ du -hs manifests
  6.5M    manifests

But still, better to have the pruning than leak forever.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2070805#c0
@wking wking force-pushed the robust-repeat-version-collection branch from 925b7d9 to 507b474 Compare April 4, 2022 06:58
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 4, 2022

pods should successfully create sandboxes by not timing out is the only failure, and that's orthogonal.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

pods should successfully create sandboxes by not timing out is the only failure, and that's orthogonal.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment thread pkg/cvo/updatepayload.go
@@ -248,12 +255,6 @@ func (r *payloadRetriever) pruneJobs(ctx context.Context, retain int) error {
// Ignore jobs not beginning with operatorName
case !strings.HasPrefix(job.Name, r.operatorName+"-"):
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that we no longer have to ignore jobs that have not yet started or are still active? Will these never show up now because they will have been pruned?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we are getting ready to launch the new job, we have already given up on the old job, so we are pruning those, even if they were still pending or running. The new job will succeed, or not, and will remain around until the CVO decides to launch a successor.

@jottofar
Copy link
Copy Markdown
Contributor

jottofar commented Apr 4, 2022

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 4, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 779b102 into openshift:master Apr 4, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 4, 2022

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 2070805 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2070805: pkg/cvo/updatepayload: Prune previous payload downloads

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the robust-repeat-version-collection branch April 4, 2022 22:32
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 18, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

And because we want that removal to also cover "a previous job run
half-populated the target release directory", I've removed the
pre-fetch ValidateDirectory check.  In cases where we already had
valid content locally, that's one extra download, which seems like a
lower price than assuming a previous partial download was actually
complete.  If we see many extra downloads, we use atomic renames or
something so it's easy for an incoming CVO to distinguish between
content that was completely populated by a previous job run from
content that was incompletely populated.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 18, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

And because we want that removal to also cover "a previous job run
half-populated the target release directory", I've removed the
pre-fetch ValidateDirectory check.  In cases where we already had
valid content locally, that's one extra download, which seems like a
lower price than assuming a previous partial download was actually
complete.  If we see many extra downloads, we use atomic renames or
something so it's easy for an incoming CVO to distinguish between
content that was completely populated by a previous job run from
content that was incompletely populated.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

And because we want that removal to also cover "a previous job run
half-populated the target release directory", I've removed the
pre-fetch ValidateDirectory check.  In cases where we already had
valid content locally, that's one extra download, which seems like a
lower price than assuming a previous partial download was actually
complete.  If we see many extra downloads, we use atomic renames or
something so it's easy for an incoming CVO to distinguish between
content that was completely populated by a previous job run from
content that was incompletely populated.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  See [1,2] for POSIX 2018 requiring mv to be
atomic where possible (e.g. the source and target are on the same
filesystem).

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  See [1,2] for POSIX 2018 requiring mv to be
atomic where possible (e.g. the source and target are on the same
filesystem).

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  See [1,2] for POSIX 2018 requiring mv to be
atomic where possible (e.g. the source and target are on the same
filesystem).

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  See [1,2] for POSIX 2018 requiring mv to be
atomic where possible (e.g. the source and target are on the same
filesystem).

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 19, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  POSIX 2018 requires mv to use rename [1].
While rename discusses atomicity in the informative Rationale section,
there's nothing in the normative description that mentions atomicity
[2].  Still, "atomic renames" are a common dance on Linux, especially
when renaming within a directory, where it is a single directory table
being edited.  [3] talks through the pattern, and points out the need
for a pre-rename fsync or similar if you want to avoid:

1. Open a/file, write to it, and close the file descriptor.
2. Kernel holding file contents in memory, not yet on disk.
3. Atomically move a/ to b/.
4. Crash, losing any kernel-memory caches.
5. b/file exists, but it may be empty or partial.

I think the risk of kernel-crashes leaving us with currupted manifests
is small enough that we can ignore it, to save the cost of calling
fsync.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
[3]: https://lwn.net/Articles/789600/
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 27, 2022
Some backround on recent changes:

* 507b474 (pkg/cvo/updatepayload: Prune previous payload downloads,
  2022-04-03, openshift#760) attempted to add CVO-side directory removal, but
  that failed because the CVO mounts the shared volume 'readOnly:
  true'.

* a5af89d (pkg/cvo/updatepayload: Shift previous-download removal
  into the job, 2022-04-18, openshift#765) shifted removal into the job itself.
  As far as I can tell, this worked.

* c45a981 (pkg/cvo/updatepayload: Use initContainers instead of
  shell &&-chains, 2022-04-20, openshift#765) addressed concerns with unquoted
  shell arguments by pivoting to initContainers and dropping the
  shell.  This broke the * pathname expansion that rm depends on to
  find directories to remove.

This commit returns to using the shell to invoke the rm call, so we
get pathname expansion back [1].  But I avoid the possibility of
unquoted argument injection by using workingDir to bring in baseDir.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_06
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 27, 2022
Some background on recent changes:

* 507b474 (pkg/cvo/updatepayload: Prune previous payload downloads,
  2022-04-03, openshift#760) attempted to add CVO-side directory removal, but
  that failed because the CVO mounts the shared volume 'readOnly:
  true'.

* a5af89d (pkg/cvo/updatepayload: Shift previous-download removal
  into the job, 2022-04-18, openshift#765) shifted removal into the job itself.
  As far as I can tell, this worked.

* c45a981 (pkg/cvo/updatepayload: Use initContainers instead of
  shell &&-chains, 2022-04-20, openshift#765) addressed concerns with unquoted
  shell arguments by pivoting to initContainers and dropping the
  shell.  This broke the * pathname expansion that rm depends on to
  find directories to remove.

This commit returns to using the shell to invoke the rm call, so we
get pathname expansion back [1].  But I avoid the possibility of
unquoted argument injection by using workingDir to bring in baseDir.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_06
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 28, 2022
I'd attempted to add this cleanup on the CVO side in 507b474
(pkg/cvo/updatepayload: Prune previous payload downloads, 2022-04-03, openshift#760).
But Evgeni pointed out that that doesn't work because the CVO's volume
mount is 'readOnly: true':

  2022-04-11T20:42:32.056428784Z W0411 20:42:32.056387       1 updatepayload.go:149] failed to prune update payload directory: unlinkat /etc/cvo/updatepayloads/brRTeZnZSIkZ2J2YMdQozg: read-only file system

This commit shifts removal into the job.

To avoid issues with loading half-populated directories, I've also
adjusted the job to populate {baseDir}/{targetName}-{randomSuffix},
and then rename that populated result to {baseDir}/{targetName}.  That
way, future targetUpdatePayloadDir calls using the ValidateDirectory
precheck will successfully distinguish "previous job got everything
for this release" (where we don't want a new download job) from
"previous job got some bits over, but not everything" (where we do
want a new download job).  POSIX 2018 requires mv to use rename [1].
While rename discusses atomicity in the informative Rationale section,
there's nothing in the normative description that mentions atomicity
[2].  Still, "atomic renames" are a common dance on Linux, especially
when renaming within a directory, where it is a single directory table
being edited.  [3] talks through the pattern, and points out the need
for a pre-rename fsync or similar if you want to avoid:

1. Open a/file, write to it, and close the file descriptor.
2. Kernel holding file contents in memory, not yet on disk.
3. Atomically move a/ to b/.
4. Crash, losing any kernel-memory caches.
5. b/file exists, but it may be empty or partial.

I think the risk of kernel-crashes leaving us with currupted manifests
is small enough that we can ignore it, to save the cost of calling
fsync.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
[3]: https://lwn.net/Articles/789600/
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 28, 2022
Some background on recent changes:

* 507b474 (pkg/cvo/updatepayload: Prune previous payload downloads,
  2022-04-03, openshift#760) attempted to add CVO-side directory removal, but
  that failed because the CVO mounts the shared volume 'readOnly:
  true'.

* a5af89d (pkg/cvo/updatepayload: Shift previous-download removal
  into the job, 2022-04-18, openshift#765) shifted removal into the job itself.
  As far as I can tell, this worked.

* c45a981 (pkg/cvo/updatepayload: Use initContainers instead of
  shell &&-chains, 2022-04-20, openshift#765) addressed concerns with unquoted
  shell arguments by pivoting to initContainers and dropping the
  shell.  This broke the * pathname expansion that rm depends on to
  find directories to remove.

This commit returns to using the shell to invoke the rm call, so we
get pathname expansion back [1].  But I avoid the possibility of
unquoted argument injection by using workingDir to bring in baseDir.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_06
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 28, 2022
Some background on recent changes:

* 507b474 (pkg/cvo/updatepayload: Prune previous payload downloads,
  2022-04-03, openshift#760) attempted to add CVO-side directory removal, but
  that failed because the CVO mounts the shared volume 'readOnly:
  true'.

* a5af89d (pkg/cvo/updatepayload: Shift previous-download removal
  into the job, 2022-04-18, openshift#765) shifted removal into the job itself.
  As far as I can tell, this worked.

* c45a981 (pkg/cvo/updatepayload: Use initContainers instead of
  shell &&-chains, 2022-04-20, openshift#765) addressed concerns with unquoted
  shell arguments by pivoting to initContainers and dropping the
  shell.  This broke the * pathname expansion that rm depends on to
  find directories to remove.

This commit returns to using the shell to invoke the rm call, so we
get pathname expansion back [1].  But I avoid the possibility of
unquoted argument injection by using workingDir to bring in baseDir.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants