-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Bug 1970315: testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update #26208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/test e2e-aws-upgrade |
92d2deb to
124818d
Compare
|
@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
pkg/synthetictests/networking.go
Outdated
| if deletionTime == nil { | ||
| // mark sandboxes errors as flakes if networking is being updated | ||
| // these pods eventually get created | ||
| operatorsProgressing := intervalcreation.IntervalsFromEvents_OperatorProgressing(events, event.From, event.To) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this could be O(N^2) on a pretty big N, have you verified that IntervalsFromEvents uses binary search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems IntervalsFromEvents_OperatorProgressing is O(N)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list should be sorted, so if you know from to you can do a binary search o(logn) to find the start and then same for the end. Or maybe just at the beginning do a single pass and calculate all the intervals that the operator is progressing (which should be very small O) and then just do the smaller loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see intervals.go / monitor.go for a method that already uses sort.Search() to do this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked this to use monitorapi functions:
CopyAndSortto create a copy of events and sort them by typeIntervalsFromEvents_OperatorProgressingto build a list of operator progressing eventssort.Searchto find events for network/machine-config
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh The main branch will open for development of next OCP version. Recalculating validity of PRs linked to this PR. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1970315, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
1 similar comment
|
/retest |
ravisantoshgudimetla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thank you for working on this @vrutkovs
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: petr-muller, ravisantoshgudimetla, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
6 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/test e2e-aws-upgrade |
|
/cherrypick release-4.8 |
|
@vrutkovs: once the present PR merges, I will cherry-pick it on top of release-4.8 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/skip e2e-aws-upgrade failing due to openshift/release#19836 |
|
/retest |
|
/test e2e-metal-ipi-ovn-ipv6 |
1 similar comment
|
/test e2e-metal-ipi-ovn-ipv6 |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@vrutkovs: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@vrutkovs: All pull requests linked via external trackers have merged: Bugzilla bug 1970315 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@vrutkovs: new pull request created: #26297 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
"pods should successfully create sandboxes" test should mark pod events as flakes if network is being updated.
During network update CNI binaries may be in the middle of update. This may cause sandbox errors like:
error adding container to network "ovn-kubernetes": failed to send CNI request: Post "http://dummy/": EOFMultus: [openshift-dns/dns-default-nbkz2]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the conditionerror adding container to network "openshift-sdn": failed to find plugin "openshift-sdn" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni]"never deleted" search in 4.7 -> 4.8 upgrades for last 7 days
If these events are occuring during network/machine-config update and sandboxes eventually get created (i.e. the pod never gets deleted) these events are marked as flakes.
Test runs: