Skip to content

Bug 1847185: fix: GetLabelsForVolume panic issue for azure disk PV#25121

Merged
mfojtik merged 1 commit intoopenshift:masterfrom
enxebre:bug-1847185
Jun 19, 2020
Merged

Bug 1847185: fix: GetLabelsForVolume panic issue for azure disk PV#25121
mfojtik merged 1 commit intoopenshift:masterfrom
enxebre:bug-1847185

Conversation

@enxebre
Copy link
Copy Markdown
Member

@enxebre enxebre commented Jun 16, 2020

This prevents GetAzureDiskLabels from panicking when c.DisksClient is nil.
This panicking makes the API server to crash loop for some ARO clusters as elaborated in https://bugzilla.redhat.com/show_bug.cgi?id=1847185

kubernetes/kubernetes#92166
kubernetes/kubernetes#92167

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. label Jun 16, 2020
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request references Bugzilla bug 1847185, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1847185: fix: GetLabelsForVolume panic issue for azure disk PV

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 16, 2020
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2020
@openshift-ci-robot openshift-ci-robot added the vendor-update Touching vendor dir or related files label Jun 16, 2020
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 16, 2020
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request references Bugzilla bug 1847185, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request references Bugzilla bug 1847185, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1847185: fix: GetLabelsForVolume panic issue for azure disk PV

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/retest

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/cherry-pick release-4.5

@openshift-cherrypick-robot
Copy link
Copy Markdown

@enxebre: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/retest

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/hold cancel

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

cc @sttts @mfojtik @jim-minter PTAL

@jim-minter
Copy link
Copy Markdown
Contributor

@enxebre I'm encouraging upstream to fix this more comprehensively if it's possible. Might be worth waiting a little to see what happens?

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/hold
as per #25121 (comment)

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 16, 2020

/hold cancel
Added latest upstream changes. I'd be happy to get this as it is to fix panicking and prevent any other cluster from getting to a crash looping state and so we can get it into 4.5 before freezing and backport to 4.4 asap.

Then if ARO still needs a way to instantiate the azure client for the api server can be discussed separately. I think getting back to https://github.com/Azure/ARO-RP/pull/487/files#diff-f23154e33f71b30de1fae50fbf2b1dadL56-L60 might be an option. FWIW cloud providers are planned to go out of tree in kubernetes 1.21 / OCP 4.8.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2020
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 17, 2020

/retest

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 17, 2020

/retest

@jim-minter
Copy link
Copy Markdown
Contributor

Added latest upstream changes. I'd be happy to get this as it is to fix panicking and prevent any other cluster from getting to a crash looping state and so we can get it into 4.5 before freezing and backport to 4.4 asap.

@enxebre I agree. We'd like to get this into 4.3 urgently.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 18, 2020

@jim-minter as per @mjudeikis this has been validated to prevent ARO from panicking.
@sttts @mfojtik can we get labels here?

@mfojtik
Copy link
Copy Markdown
Contributor

mfojtik commented Jun 18, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020
@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, mfojtik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sttts
Copy link
Copy Markdown
Contributor

sttts commented Jun 19, 2020

/retest

@mfojtik mfojtik merged commit a4ab462 into openshift:master Jun 19, 2020
@mfojtik
Copy link
Copy Markdown
Contributor

mfojtik commented Jun 19, 2020

merging with button, the queue is hosed and this is critical fix needed in ARO 4.5.

@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: All pull requests linked via external trackers have merged: openshift/origin#25121. Bugzilla bug 1847185 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1847185: fix: GetLabelsForVolume panic issue for azure disk PV

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@enxebre: new pull request created: #25168

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mfojtik
Copy link
Copy Markdown
Contributor

mfojtik commented Jun 19, 2020

@eparis @jwforres @stevekuznetsov 3 days of retesting this simple fix that is actually causing 40% of our ARO cluster to be broken... i would like to know how we can mitigate this in future.

@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-serial 4691565 link /test e2e-aws-serial
ci/prow/e2e-aws-fips 4691565 link /test e2e-aws-fips
ci/prow/e2e-cmd 4691565 link /test e2e-cmd
ci/prow/e2e-gcp 4691565 link /test e2e-gcp
ci/prow/e2e-gcp-upgrade 4691565 link /test e2e-gcp-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants