Skip to content

[fcos] retry pulling mcd image#1273

Closed
vrutkovs wants to merge 1 commit intoopenshift:fcosfrom
vrutkovs:fcos-retry-pulling-mcd
Closed

[fcos] retry pulling mcd image#1273
vrutkovs wants to merge 1 commit intoopenshift:fcosfrom
vrutkovs:fcos-retry-pulling-mcd

Conversation

@vrutkovs
Copy link
Copy Markdown
Contributor

@vrutkovs vrutkovs commented Nov 19, 2019

- What I did

  • Changed type of machine-config-daemon-pull.service to simple (there is no kubelet on base FCOS and it hasn't pivoted yet)
  • Added systemd options to restart failed pulls

- How to verify it

This would cut amount of flakes due to registry pulls

- Description for the changelog
Retry pulling MCD image on FCOS systems

TODO:

Nov 19 12:53:57 ip-10-0-2-229 bootkube.sh[1094]: error: /var/lib/containers/storage/overlay/ad28b6934666a530eb061fdedf6d552006cadac9c318559eb90eca0f4cbe4fdd/merged/srv/repo: opendir(/var/lib/containers/storage/overlay/ad28b6934666a530eb061fdedf6d552006cadac9c318559eb90eca0f4cbe4fdd/merged/srv/repo): No such file or directory
Nov 19 12:53:57 ip-10-0-2-229 bootkube.sh[1094]: error: error running ostree refs --repo /var/lib/containers/storage/overlay/ad28b6934666a530eb061fdedf6d552006cadac9c318559eb90eca0f4cbe4fdd/merged/srv/repo: : exit status 1

/cc @LorbusChris

It doesn't have to be oneshot, as there is no kubelet on FCOS
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 19, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2019
@vrutkovs
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@vrutkovs: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws b88f101 link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

[Service]
# Need oneshot to delay kubelet
Type=oneshot
Type=simple
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is no kubelet on base FCOS and it hasn't pivoted yet)

I don't quite get how this is related? We need to block kubelet on RHCOS for sure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fcos branch, this is not going to master yet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I would just be curious to understand why you made this change. It's not necessary for Restart= is it?

ExecStart=/usr/bin/chmod +x /usr/local/bin/machine-config-daemon
ExecStart=/usr/sbin/restorecon /usr/local/bin/machine-config-daemon
Restart=on-failure
RestartSec=30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, though at some point we will need to bubble up provisioning failures.

@vrutkovs
Copy link
Copy Markdown
Contributor Author

Superseded by #1279

@vrutkovs vrutkovs closed this Nov 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants