Skip to content

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube#6611

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
sadasu:sadasu-2072202
Jan 26, 2023
Merged

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube#6611
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
sadasu:sadasu-2072202

Conversation

@sadasu
Copy link
Copy Markdown
Contributor

@sadasu sadasu commented Nov 21, 2022

The check to resolve the API and Internal API server URLs were performed together within the bootkube service right after MCO was started. This resulted in false negatives because the API Server became periodically unavailable during the bootstrap process.
To make these checks better, this PR splits the check into 2 parts:

  1. URLs resolvable - This can be checked early and hence this is the only check that happens before MCO is started.
  2. URLs reachable - This check was causing the false -ives and hence has been moved to the end of bootkube service when we expect the API Server to be stable.

This split has resulted in these checks happening in 2 stages "resolve-api(-int)-url" and "check-api(-int)-url" so the success and failure of these stages can be reported individually. Also, although a stage may report failure, they will not cause the bootstrap process to stop. The output from these stages are available in the analyse output to diagnose any issues.

@openshift-ci openshift-ci Bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 21, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2022

@sadasu: This pull request references Bugzilla bug 2072202, which is invalid:

  • expected the bug to target the "4.13.0" release, but it targets "---" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested review from barbacbd and r4f4 November 21, 2022 21:44
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Nov 21, 2022

/bugzilla refresh

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2022

@sadasu: This pull request references Bugzilla bug 2072202, which is invalid:

  • expected the bug to target the "4.13.0" release, but it targets "---" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu changed the title Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube OCPBUGSM-42958: Check for reachability of API and API-Int URLs later in bootkube Nov 21, 2022
@openshift-ci openshift-ci Bot removed bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 21, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2022

@sadasu: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

OCPBUGSM-42958: Check for reachability of API and API-Int URLs later in bootkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Nov 21, 2022

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sadasu: No Jira bug is referenced in the title of this pull request.
To reference a bug, add 'OCPBUGS-XXX:' to the title of this pull request and request another bug refresh with /jira refresh.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu changed the title OCPBUGSM-42958: Check for reachability of API and API-Int URLs later in bootkube Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube Nov 22, 2022
@openshift-ci openshift-ci Bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 22, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 22, 2022

@sadasu: This pull request references Bugzilla bug 2072202, which is invalid:

  • expected the bug to target the "4.13.0" release, but it targets "---" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Nov 22, 2022

/test e2e-aws-ovn

Copy link
Copy Markdown
Contributor

@barbacbd barbacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some redundant checks in things like if [[ ! -z "${API_INT_SERVER_URL}" ]] ; then where the function call inside will make the same check. The outside check doesn't appear necessary

@r4f4
Copy link
Copy Markdown
Contributor

r4f4 commented Nov 23, 2022

@sadasu The BZ linked in this PR was reported against OCP-4.9. Do you intend to backport this change? If so, we'll need to create a bug in Jira.

@r4f4
Copy link
Copy Markdown
Contributor

r4f4 commented Nov 23, 2022

/bugzilla refresh

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 23, 2022

@r4f4: This pull request references Bugzilla bug 2072202, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@r4f4
Copy link
Copy Markdown
Contributor

r4f4 commented Nov 23, 2022

/bugzilla refresh

@openshift-ci openshift-ci Bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 23, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 23, 2022

@r4f4: This pull request references Bugzilla bug 2072202, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.13.0) matches configured target release for branch (4.13.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (yunjiang@redhat.com), skipping review request.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Dec 6, 2022

/retest-required

Comment thread data/data/bootstrap/files/usr/local/bin/bootkube.sh.template Outdated
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 4, 2023

/retest-required

1 similar comment
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 18, 2023

/retest-required

Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some context about the motivation for this PR and how this fixes the issue?

I assume the problem is we're trying to reduce false negatives. Pretty much every failed install I look at says the API server is down, but the install has progressed to a point where the API server must have been up.

I think this is due to the fact that the API server may become periodically unavailable throughout the bootstrap process. So perhaps we want to relax our requirements in this check and only resolve the api url rather than checking for success? WDYT?

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 18, 2023

@sadasu: This pull request references Bugzilla bug 2072202, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.13.0) matches configured target release for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (yunjiang@redhat.com), skipping review request.

Details

In response to this:

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 18, 2023

Can you add some context about the motivation for this PR and how this fixes the issue?

I assume the problem is we're trying to reduce false negatives. Pretty much every failed install I look at says the API server is down, but the install has progressed to a point where the API server must have been up.

I think this is due to the fact that the API server may become periodically unavailable throughout the bootstrap process. So perhaps we want to relax our requirements in this check and only resolve the api url rather than checking for success? WDYT?

Hopefully, #6611 (comment) provides the necessary context. Yes, we can just limit ourselves to just the check to see if the URLs are resolvable. This is an attempt to provide some additional diagnostics.

@patrickdillon
Copy link
Copy Markdown
Contributor

Hopefully, #6611 (comment) provides the necessary context.

Oh yeah, sorry I missed that comment. So do you want to do:

Yes, we can just limit ourselves to just the check to see if the URLs are resolvable.

Or just do this first and then continue to step down if needed? Personally I think we should just do the resolution check and we can do more later if needed, but I do not feel strongly.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 19, 2023

Or just do this first and then continue to step down if needed? Personally I think we should just do the resolution check and we can do more later if needed, but I do not feel strongly.

I would like to step down to just resolving the URL later, if these changes don't serve us.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 19, 2023

/retest-required

@patrickdillon
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2023
Copy link
Copy Markdown
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jan 20, 2023
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 23, 2023

/retest-required

@patrickdillon
Copy link
Copy Markdown
Contributor

/skip

@patrickdillon
Copy link
Copy Markdown
Contributor

/override ci/prow/e2e-openstack

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 23, 2023

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-openstack

Details

In response to this:

/override ci/prow/e2e-openstack

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 25, 2023

/test ci/prow/e2e-aws-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 25, 2023

@sadasu: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test agent-integration-tests
  • /test aro-unit
  • /test e2e-agent-compact-ipv4
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-upi
  • /test e2e-azure-ovn
  • /test e2e-azure-ovn-upi
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upi
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack-ovn
  • /test e2e-vsphere-ovn
  • /test e2e-vsphere-upi
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test okd-images
  • /test okd-scos-images
  • /test okd-unit
  • /test okd-verify-codegen
  • /test openstack-manifests
  • /test shellcheck
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv6
  • /test e2e-alibaba
  • /test e2e-aws-ovn-disruptive
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-imdsv2
  • /test e2e-aws-ovn-proxy
  • /test e2e-aws-ovn-shared-vpc
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-upi-proxy
  • /test e2e-azure-ovn-resourcegroup
  • /test e2e-azure-ovn-shared-vpc
  • /test e2e-azurestack
  • /test e2e-azurestack-upi
  • /test e2e-crc
  • /test e2e-gcp-ovn-shared-vpc
  • /test e2e-gcp-ovn-xpn
  • /test e2e-gcp-upgrade
  • /test e2e-gcp-upi-xpn
  • /test e2e-ibmcloud-ovn
  • /test e2e-libvirt
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-sdn
  • /test e2e-metal-ipi-sdn-swapped-hosts
  • /test e2e-metal-ipi-sdn-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-nutanix-ovn
  • /test e2e-nutanix-sdn
  • /test e2e-openstack-kuryr
  • /test e2e-openstack-proxy
  • /test e2e-openstack-sdn-parallel
  • /test e2e-openstack-upi
  • /test e2e-ovirt-sdn
  • /test e2e-vsphere-upi-zones
  • /test e2e-vsphere-zones
  • /test okd-e2e-aws-ovn
  • /test okd-e2e-aws-ovn-upgrade
  • /test okd-e2e-gcp
  • /test okd-e2e-gcp-ovn-upgrade
  • /test okd-e2e-vsphere
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-e2e-aws-upgrade
  • /test okd-scos-e2e-gcp
  • /test okd-scos-e2e-gcp-ovn-upgrade
  • /test okd-scos-e2e-vsphere
  • /test okd-scos-unit
  • /test okd-scos-verify-codegen
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-master-aro-unit
  • pull-ci-openshift-installer-master-e2e-agent-ha-dualstack
  • pull-ci-openshift-installer-master-e2e-aws-ovn
  • pull-ci-openshift-installer-master-e2e-aws-ovn-disruptive
  • pull-ci-openshift-installer-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-installer-master-e2e-aws-ovn-workers-rhel8
  • pull-ci-openshift-installer-master-e2e-vsphere-upi-zones
  • pull-ci-openshift-installer-master-gofmt
  • pull-ci-openshift-installer-master-golint
  • pull-ci-openshift-installer-master-govet
  • pull-ci-openshift-installer-master-images
  • pull-ci-openshift-installer-master-okd-e2e-aws-ovn
  • pull-ci-openshift-installer-master-okd-e2e-aws-ovn-upgrade
  • pull-ci-openshift-installer-master-okd-images
  • pull-ci-openshift-installer-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-installer-master-okd-scos-e2e-aws-upgrade
  • pull-ci-openshift-installer-master-okd-scos-images
  • pull-ci-openshift-installer-master-okd-scos-unit
  • pull-ci-openshift-installer-master-okd-scos-verify-codegen
  • pull-ci-openshift-installer-master-okd-unit
  • pull-ci-openshift-installer-master-okd-verify-codegen
  • pull-ci-openshift-installer-master-shellcheck
  • pull-ci-openshift-installer-master-tf-fmt
  • pull-ci-openshift-installer-master-tf-lint
  • pull-ci-openshift-installer-master-unit
  • pull-ci-openshift-installer-master-verify-codegen
  • pull-ci-openshift-installer-master-verify-vendor
  • pull-ci-openshift-installer-master-yaml-lint
Details

In response to this:

/test ci/prow/e2e-aws-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Jan 25, 2023

/test e2e-aws-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 25, 2023

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-compact 060a84a3db16eab6150d0088b78102158ecf1f49 link true /test e2e-agent-compact
ci/prow/e2e-agent-sno 060a84a3db16eab6150d0088b78102158ecf1f49 link false /test e2e-agent-sno
ci/prow/e2e-agent-compact-ipv4 060a84a3db16eab6150d0088b78102158ecf1f49 link true /test e2e-agent-compact-ipv4
ci/prow/okd-scos-e2e-aws-ovn 5832239 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-libvirt 5832239 link false /test e2e-libvirt
ci/prow/okd-e2e-aws-ovn-upgrade 5832239 link false /test okd-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-disruptive 5832239 link false /test e2e-aws-ovn-disruptive
ci/prow/okd-scos-e2e-aws-upgrade 5832239 link false /test okd-scos-e2e-aws-upgrade
ci/prow/e2e-ibmcloud-ovn 5832239 link false /test e2e-ibmcloud-ovn
ci/prow/e2e-aws-ovn-upgrade 5832239 link false /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD a611870 and 2 for PR HEAD 5832239 in total

@openshift-merge-robot openshift-merge-robot merged commit 148d2fb into openshift:master Jan 26, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 26, 2023

@sadasu: All pull requests linked via external trackers have merged:

Bugzilla bug 2072202 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2072202: Check for reachability of API and API-Int URLs later in bootkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants