OCPBUGS-235: pkg/clusterconditions/promql: Cap PromQL queries at 5 minutes#825
Conversation
In some clusters, these PromQL queries can hang for hours, possibly forever [1]. I think we have a 30s default KeepAlive timeout [2], but apparently there's enough socket traffic to keep from tripping that. This adds a 5m cap to the PromQL calls, although I'm not particularly attached to that particular number. We can always raise it if we start seeing timeouts in Insights for queries where taking that long seems reasonable. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2109374#c12 [2]: https://pkg.go.dev/github.com/prometheus/client_golang/api#pkg-variables
|
@openshift-cherrypick-robot: An error was encountered cloning bug for cherrypick for bug 2115564 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
response code 401 not 200
Please contact an administrator to resolve this issue, then request a bug refresh with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@openshift-cherrypick-robot: This pull request references [Jira Issue OCPBUGS-235](https://issues.redhat.com//browse/OCPBUGS-235), which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
openshift/release#31462 landed: /jira refresh |
|
@wking: This pull request references [Jira Issue OCPBUGS-235](https://issues.redhat.com//browse/OCPBUGS-235), which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/label backport-risk-assessed |
|
/label cherry-pick-approved |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@openshift-cherrypick-robot: All pull requests linked via external trackers have merged: [Jira Issue OCPBUGS-235](https://issues.redhat.com//browse/OCPBUGS-235) has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an automated cherry-pick of #815
/assign wking