Verify ready replicas in e2e test TestMinScale#6783
Verify ready replicas in e2e test TestMinScale#6783knative-prow-robot merged 1 commit intoknative:masterfrom
Conversation
|
Hi @MIBc. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
@MIBc thanks for doing this. The original issue contains a note that says:
Have you taken that into account? Seems like we're only checking ReadyReplicas from the deployment but all of our components actually key-off of the Ready addresses of the revision's private service. |
|
/assign @vagababov @tanzeeb |
|
I have read the note carefully. May be we can check the endponit number of revision before it is ready. @markusthoemmes |
|
Yes, please check endpoints, since that what is drives internal logic, rather than deployment state. |
markusthoemmes
left a comment
There was a problem hiding this comment.
Thanks, I think this is almost there! Some small touchups from my end.
| } | ||
|
|
||
| // Revision becomes ready | ||
| t.Log("Waiting for revision becomes ready") |
There was a problem hiding this comment.
| t.Log("Waiting for revision becomes ready") | |
| t.Log("Waiting for revision to become ready") |
| } | ||
|
|
||
| // Route becomes ready | ||
| t.Log("Waiting for route becomes ready") |
There was a problem hiding this comment.
| t.Log("Waiting for route becomes ready") | |
| t.Log("Waiting for route to become ready") |
|
|
||
| // Before becoming ready, observe revision endpoints | ||
| endpointName := revName + "-private" | ||
| t.Log("Waiting for ready endponits to scale to minScale before revison becoming ready") |
There was a problem hiding this comment.
| t.Log("Waiting for ready endponits to scale to minScale before revison becoming ready") | |
| t.Log("Waiting for ready endpoints to scale to minScale before the revision becomes ready") |
| }) | ||
| } | ||
|
|
||
| func waitForDesiredEndpoints(t *testing.T, clients *test.Clients, endpointName string, cond func(int32) bool) error { |
There was a problem hiding this comment.
I think we can even go as far as using this as waitForDesiredScale instead of adding these calls in addition.
| func waitForDesiredEndpoints(t *testing.T, clients *test.Clients, endpointName string, cond func(int32) bool) error { | ||
| endpoints := clients.KubeClient.Kube.CoreV1().Endpoints(test.ServingNamespace) | ||
|
|
||
| return wait.PollImmediate(time.Second, 10*time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
| return wait.PollImmediate(time.Second, 10*time.Minute, func() (bool, error) { | |
| return wait.PollImmediate(time.Second, 1*time.Minute, func() (bool, error) { |
To stay the same as above?
| } | ||
|
|
||
| // Before becoming ready, observe revision endpoints | ||
| endpointName := revName + "-private" |
There was a problem hiding this comment.
Can we fetch this via API? See
serving/test/e2e/autoscale_test.go
Lines 283 to 302 in afbd55a
| if len(endpoint.Subsets) == 0 || len(endpoint.Subsets[0].Addresses) == 0 { | ||
| return false, nil | ||
| } | ||
| return cond(int32(len(endpoint.Subsets[0].Addresses))), nil |
There was a problem hiding this comment.
I think this should use the resources.ReadyAddressCount helper. See
serving/test/e2e/autoscale_test.go
Lines 283 to 302 in afbd55a
There was a problem hiding this comment.
Thanks for reviewing. I will improve it.
|
/retest |
|
|
||
| sks, err := clients.NetworkingClient.ServerlessServices.Get(revName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| t.Fatalf("Error retrieving sks %q: %w", revName, err) |
There was a problem hiding this comment.
%w works only with fmt.Errorf.
| t.Fatalf("Error retrieving sks %q: %w", revName, err) | |
| t.Fatalf("Error retrieving sks %q: %v", revName, err) |
markusthoemmes
left a comment
There was a problem hiding this comment.
/lgtm
/approve
Thanks for the changes!
/hold
for the outstanding nits.
| endpoints := clients.KubeClient.Kube.CoreV1().Endpoints(test.ServingNamespace) | ||
|
|
||
| return wait.PollImmediate(time.Second, 1*time.Minute, func() (bool, error) { | ||
| deployment, err := deployments.Get(deploymentName, metav1.GetOptions{}) |
There was a problem hiding this comment.
I don't think fetching the deployment is actually required anymore. Shall we remove all mentions of it and redo the error messages above to not include it as well?
There was a problem hiding this comment.
Yes. If endpoints satisfy the scale, the deploy satisfies too.
| revName := latestRevisionName(t, clients, names.Config) | ||
| deploymentName := revName + "-deployment" | ||
|
|
||
| sks, err := clients.NetworkingClient.ServerlessServices.Get(revName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Oops, per the test failure this likely needs a loop to wait for the SKS to actually contain a PrivateServiceName and be there.
There was a problem hiding this comment.
Yes. I have find that.
|
The following jobs failed:
Automatically retrying due to test flakiness... |
|
/retest |
markusthoemmes
left a comment
There was a problem hiding this comment.
I wanted to let the int64 nit pass and merge, but the SKS poll runs at a danger of returning a wrong result so... sorry but here are my last two comments hopefully 😂 . Thanks for being patient!
| } | ||
|
|
||
| return cond(*deployment.Spec.Replicas), nil | ||
| return cond(int32(resources.ReadyAddressCount(endpoint))), nil |
There was a problem hiding this comment.
Nit: Let's change the cond funcs to just take int64 to avoid unnecessary casts (doesn't matter performance wise, but will pollute the code eventually)
| if err != nil { | ||
| t.Fatalf("Failed to get sks after it was seen to be live: %v", err) | ||
| } | ||
| return sks.Status.PrivateServiceName |
There was a problem hiding this comment.
Is this guaranteed to be set? 🤔 AFAIK we need to go through at least one reconcilation cycle to set this. I'd propose to cache it from the above like so:
func serverlessServicesName(t *testing.T, clients *test.Clients, revisionName string) string {
var privateServiceName string
if err := wait.PollImmediate(time.Second, 1*time.Minute, func() (bool, error) {
_, err := clients.NetworkingClient.ServerlessServices.Get(revisionName, metav1.GetOptions{})
if err != nil {
return false, nil
}
privateServiceName = sks.Status.PrivateServiceName
if privateServiceName == "" {
return false, nil
}
return true, nil
}); err != nil {
t.Fatalf("Error retrieving sks %q: %v", revisionName, err)
}
return privateServiceName
}That should make this safe against potential inconsistencies. I'd also bump the poll timeout to 1 minute here probably.
Fixex knative#6716 * Waiting for ready endponits to scale to minScale * Move `teardown` function in front of the `Fatalf`. Otherwise the k8s resources may not delete.
markusthoemmes
left a comment
There was a problem hiding this comment.
/lgtm
/approve
Noice! 🎉
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markusthoemmes, MIBc The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
Fixex #6716