-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OCPBUGS-4190: 1sec #27574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-4190: 1sec #27574
Conversation
|
@derekhiggins: This pull request references Jira Issue OCPBUGS-4190, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@derekhiggins: This pull request references Jira Issue OCPBUGS-4190, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold I don't think this will work out, absence of disruption data usually means we lack the runs to make a determination, it's an explicit state very different from just assuming one second. We'll surpass one second almost all the time, failing jobs widely. Whatever test is using the nil is probably where you need to update so it knows how to understand it. |
|
The jira unfortunately does not link to any job runs or include any stack traces, guessing it was around line 186? It looks like that is where the fix is needed. |
|
It looks like I missed this code path in: 9d2719f but you can see the approach taken in the synthetic tests, and we should be able to do similar around line 186. ExpectNoDisruptionForDuration(
f,
*allowedDisruption,
end.Sub(start),
events,
fmt.Sprintf("%s was unreachable during disruption: %v", t.backend.GetLocator(), disruptionDetails),
)Thanks for noticing and going after a fix Derek. |
|
@derekhiggins: This pull request references Jira Issue OCPBUGS-4190, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
|
||
| allowedDisruption, disruptionDetails, err := t.getAllowedDisruption(f) | ||
| if allowedDisruption == nil{ | ||
| framework.Logf(fmt.Sprintf("Skipping: %s: No historical data", t.testName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried calling skipper.Skipf(.... here but this resulted in the entire upgrade disruption test being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the e2eskipper won't work here. You would have to write a FrameworkSkipf in test/extended/util/disruption/disruption.go to generate a skipped test, which probably requires adding that functionality to TestSummaries.
Yup, it was line 186,
I couldn't tell how to skip the test when it had already started without skipping the entire set of tests, so I ended up
no prob |
|
/hold Also the seems to trigger on aws jobs (from pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade) |
| stopCh := make(chan struct{}) | ||
| defer close(stopCh) | ||
|
|
||
| allowedDisruption, disruptionDetails, err := t.getAllowedDisruption(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to fix this "err" isn't being checked in the correct place.
Here's an example I stumbled across https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-metal-ipi-upgrade-ovn-ipv6/1597863031415508992 I attempted a partial fix of the problem of alwaysAllowOneSecond returning nil #27574, but after looking at this I think something like this approach is correct, and if we get nil that means no data. |
|
/lgtm Held just so someone can check the optional presubmits, if they look healthy feel free to clear. I will try to remember. Thanks again for fixing this Derek. |
I'm still trying to figure out why my skip triggered on e2e-gcp-ovn-upgrade, any ideas if this would be expected? I'm deploying a gcp cluster at the moment to see if I can find out |
|
cat ./pkg/synthetictests/allowedbackenddisruption/query_results.json | jq '.[] | select(.Platform == "gcp") | select(.FromRelease == "4.13") | select(.BackendName == "image-registry-new-connections") | select(.Network == "ovn")' This implies that we do not have the required 100 runs per 3 weeks for that particular combo. We do have minor upgrades though, from 4.12 -> 4.13, but not micro. Because it's not in the datafile, it is expected this would get skipped with your PR. |
Test for allowedDisruption duration early in test and skip if not present.
that makes sense but not sure why these weren't failing with the same nil dereference so, anyways, I've fixed the whitespace in the PR |
|
/retest-required |
|
@derekhiggins: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekhiggins, dgoodwin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
/unhold |
|
@derekhiggins: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-4190 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.12 |
|
@derekhiggins: new pull request created: #27610 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Discovered that we are getting no data for image registry since Dec 7th due to PR openshift#27574 which shut off the entire monitor if we have less than the required 100 runs to appear in the file of allowances to enforce. I guess at this time, we did not have enough runs with image registry for 4.13. (which is unusual) Skip the test, but let the monitor run.
Discovered that we are getting no data for image registry since Dec 7th due to PR openshift#27574 which shut off the entire monitor if we have less than the required 100 runs to appear in the file of allowances to enforce. I guess at this time, we did not have enough runs with image registry for 4.13. (which is unusual) Skip the test, but let the monitor run.
Discovered that we are getting no data for image registry since Dec 7th due to PR openshift#27574 which shut off the entire monitor if we have less than the required 100 runs to appear in the file of allowances to enforce. I guess at this time, we did not have enough runs with image registry for 4.13. (which is unusual) Skip the test, but let the monitor run.
Test for allowedDisruption duration early in test and skip
if not present.
-- old
Returning nil (when no historical data is present) causes a nil pointer dereference in "Image registry" disruption tests expecting a value.