We observed multiple knative intermittents reconcile failure, that is the controller met some problems in-middle, and mark the knative service status as "Failed". But given the controllers will do retry, so the knative service itself becomes "ready=true" in the end.
Anyway, these intermittent errors gave the end-user a lot of confusion when using cli tool with the default "wait" approach, since the cli tool will report the fail whenever the "ready=false" detected (as reported in cli issue knative/client#1023 as well)
Below are the reconcile failure in-middle issues. We can look at the details, and then discuss further solution:
>> kn service create <ksvc-name> --image <image-name> ....
Creating service '<ksvc-name>' in namespace ...:
1.014s The Configuration is still working to reflect the latest desired specification.
30.665s Revision creation failed with message: Post "https://172.21.0.1:443/apis/serving.knative.dev/v1/namespaces/48sz6i7t4tc/revisions": http2: server sent GOAWAY and closed the connection; LastStreamID=355, ErrCode=NO_ERROR, debug="".
30.759s Configuration "<ksvc-name>" does not have any ready Revision.
31.539s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
Error: RevisionFailed: Revision creation failed with message: Post "https://172.21.0.1:443/apis/serving.knative.dev/v1/namespaces/48sz6i7t4tc/revisions": http2: server sent GOAWAY and closed the connection; LastStreamID=355, ErrCode=NO_ERROR, debug="".
Run 'kn --help' for usage
'CreationFailed' Failed to create Revision: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": Post https://webhook.knative-serving.svc:443/defaulting?timeout=10s: context deadline exceeded"
kubectl get ksvc <ksvc-name> -o json
{
"address": {
"url": "http://<ksvc-name>.48sz6i7t4tc.svc.cluster.local"
},
"conditions": [
{
"lastTransitionTime": "2021-01-03T16:08:41Z",
"status": "True",
"type": "ConfigurationsReady"
},
{
"lastTransitionTime": "2021-01-03T16:08:43Z",
"status": "True",
"type": "Ready"
},
{
"lastTransitionTime": "2021-01-03T16:08:43Z",
"status": "True",
"type": "RoutesReady"
}
],
"latestCreatedRevisionName": "<ksvc-name>-vxgph-1",
"latestReadyRevisionName": "<ksvc-name>-vxgph-1",
"observedGeneration": 1,
"traffic": [
{
"latestRevision": true,
"percent": 100,
"revisionName": "<ksvc-name>-vxgph-1"
}
],
"url": "http://<ksvc-name>.48sz6i7t4tc.test.xxx.xxx"
}
kn service create <ksvc-name> --image <image-name>
Creating service '<ksvc-name>' in namespace ...:
1.405s The Configuration is still working to reflect the latest desired specification.
6.503s The Route is still working to reflect the latest desired specification.
12.083s ...
15.248s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
6.504s ...
9.279s Ingress has not yet been reconciled.
58.913s Ingress reconciliation failed
Error: ReconcileIngressFailed: Ingress reconciliation failed
Run 'kn --help' for usage
failed to create VirtualService: Internal error occurred: failed to call webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: context deadline exceeded
kn service create <ksvc-name> --image <image-name>
Creating service '<ksvc-name>' in namespace ...:
0.234s The Configuration is still working to reflect the latest desired specification.
0.235s Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.
0.289s The Route is still working to reflect the latest desired specification.
0.654s Configuration "<ksvc-name>" does not have any ready Revision.
1.011s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
Error: RevisionFailed: Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.
Run 'kn --help' for usage
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.116Z","logger":"controller.service-controller.knative.dev-serving-pkg-reconciler-service.Reconciler","caller":"service/reconciler.go:330","msg":"Updating status with: v1.ServiceStatus{\n \tStatus: v1.Status{\n- \t\tObservedGeneration: 0,\n+ \t\tObservedGeneration: 1,\n- \t\tConditions: nil,\n+ \t\tConditions: v1.Conditions{\n+ \t\t\t{\n+ \t\t\t\tType: \"ConfigurationsReady\",\n+ \t\t\t\tStatus: \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t\tReason: \"OutOfDate\",\n+ \t\t\t\tMessage: \"The Configuration is still working to reflect the latest desired\"...,\n+ \t\t\t},\n+ \t\t\t{\n+ \t\t\t\tType: \"Ready\",\n+ \t\t\t\tStatus: \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t\tReason: \"OutOfDate\",\n+ \t\t\t\tMessage: \"The Configuration is still working to reflect the latest desired\"...,\n+ \t\t\t},\n+ \t\t\t{\n+ \t\t\t\tType: \"RoutesReady\",\n+ \t\t\t\tStatus: \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t},\n+ \t\t},\n \t\tAnnotations: nil,\n \t},\n \tConfigurationStatusFields: {},\n \tRouteStatusFields: {},\n }\n","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller","knative.dev/traceid":"fd00bc08-8f07-4793-af94-4813199901ff","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller info {"level":"info","ts":"2021-01-01T14:33:00.117Z","logger":"controller.service-controller.event-broadcaster","caller":"record/event.go:278","msg":"Event(v1.ObjectReference{Kind:\"Service\", Namespace:\"2y8lduk2d75\", Name:\"<ksvc-name>\", UID:\"433d081b-75d3-415b-94aa-ba277c7fe7e3\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"362985101\", FieldPath:\"\"}): type: 'Normal' reason: 'Created' Created Configuration \"<ksvc-name>\"","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.135Z","logger":"controller.configuration-controller.knative.dev-serving-pkg-reconciler-configuration.Reconciler","caller":"configuration/reconciler.go:330","msg":"Updating status with: v1.ConfigurationStatus{\n \tStatus: v1.Status{\n- \t\tObservedGeneration: 0,\n+ \t\tObservedGeneration: 1,\n- \t\tConditions: nil,\n+ \t\tConditions: v1.Conditions{\n+ \t\t\t{\n+ \t\t\t\tType: \"Ready\",\n+ \t\t\t\tStatus: \"False\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.134904845 +0\"...},\n+ \t\t\t\tReason: \"RevisionFailed\",\n+ \t\t\t\tMessage: \"Revision creation failed with message: request declared a Conten\"...,\n+ \t\t\t},\n+ \t\t},\n \t\tAnnotations: nil,\n \t},\n \tConfigurationStatusFields: {},\n }\n","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller","knative.dev/traceid":"794d6e85-0b19-49e1-9957-41634f6276d5","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller info {"level":"info","ts":"2021-01-01T14:33:00.135Z","logger":"controller.configuration-controller.event-broadcaster","caller":"record/event.go:278","msg":"Event(v1.ObjectReference{Kind:\"Configuration\", Namespace:\"2y8lduk2d75\", Name:\"<ksvc-name>\", UID:\"9a9913af-fee1-42f4-9769-b57f2b29c384\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"362985103\", FieldPath:\"\"}): type: 'Warning' reason: 'CreationFailed' Failed to create Revision: request declared a Content-Length of 1546 but only wrote 0 bytes","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.176Z","logger":"controller.service-controller.knative.dev-serving-pkg-reconciler-service.Reconciler","caller":"service/service.go:79","msg":"Configuration Conditions = v1.Conditions{apis.Condition{Type:\"Ready\", Status:\"False\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0x0, ext:63745108380, loc:(*time.Location)(0x3217da0)}}}, Reason:\"RevisionFailed\", Message:\"Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.\"}}","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller","knative.dev/traceid":"6a78d5c7-ac6c-44a8-b48e-02e1170194a9","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Ja
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller error {"level":"error","ts":"2021-01-01T14:33:00.178Z","logger":"controller.configuration-controller.knative.dev-serving-pkg-reconciler-configuration.Reconciler","caller":"configuration/reconciler.go:302","msg":"Returned an error","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller","knative.dev/traceid":"794d6e85-0b19-49e1-9957-41634f6276d5","knative.dev/key":"2y8lduk2d75/<ksvc-name>","targetMethod":"ReconcileKind","targetMethod":"ReconcileKind","error":"failed to create Revision: request declared a Content-Length of 1546 but only wrote 0 bytes","stacktrace":"knative.dev/serving/pkg/client/injection/reconciler/serving/v1/configuration.(*reconcilerImpl).Reconcile\n\tknative.dev/serving/pkg/client/injection/reconciler/serving/v1/configuration/reconciler.go:302\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/controller/controller.go:513\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/controller/controller.go:451"}
all of these 3 errors are related to network intermittent connection errors. As in a cloud environment, it is the nature that the network is not reliable all the time.
Given knative will do retry to tolerate the network issues, can we think out a better approach to facilitate the user experience from the serving side and client cmd to avoid the in-middle creation failure invisible to the end-user?
In what area(s)?
What version of Knative?
Expected Behavior
What we observed
We observed multiple knative intermittents reconcile failure, that is the controller met some problems in-middle, and mark the knative service status as "Failed". But given the controllers will do retry, so the knative service itself becomes "ready=true" in the end.
Anyway, these intermittent errors gave the end-user a lot of confusion when using cli tool with the default "wait" approach, since the cli tool will report the fail whenever the "ready=false" detected (as reported in cli issue knative/client#1023 as well)
Below are the reconcile failure in-middle issues. We can look at the details, and then discuss further solution:
Intermittent error 1
kn:kn, if we wait for a while, the final status is ready.Intermittent error 2
kn, if we wait for a while, the final status is ready.Intermittent error 3
kn, if we wait for a while, the final status is ready.Summary
all of these 3 errors are related to network intermittent connection errors. As in a cloud environment, it is the nature that the network is not reliable all the time.
Given knative will do retry to tolerate the network issues, can we think out a better approach to facilitate the user experience from the serving side and client cmd to avoid the in-middle
creation failureinvisible to the end-user?