Skip to content

Knative reconcile fails in-middle, but be ready eventually for service creation #10511

@ShiningTian

Description

@ShiningTian

In what area(s)?

/area API

What version of Knative?

0.18.x
0.19.x

Expected Behavior

What we observed

We observed multiple knative intermittents reconcile failure, that is the controller met some problems in-middle, and mark the knative service status as "Failed". But given the controllers will do retry, so the knative service itself becomes "ready=true" in the end.

Anyway, these intermittent errors gave the end-user a lot of confusion when using cli tool with the default "wait" approach, since the cli tool will report the fail whenever the "ready=false" detected (as reported in cli issue knative/client#1023 as well)

Below are the reconcile failure in-middle issues. We can look at the details, and then discuss further solution:

Intermittent error 1
  • Error Msg from kn:
>> kn service create <ksvc-name> --image <image-name> ....
Creating service '<ksvc-name>' in namespace ...:

  1.014s The Configuration is still working to reflect the latest desired specification.
 30.665s Revision creation failed with message: Post "https://172.21.0.1:443/apis/serving.knative.dev/v1/namespaces/48sz6i7t4tc/revisions": http2: server sent GOAWAY and closed the connection; LastStreamID=355, ErrCode=NO_ERROR, debug="".
 30.759s Configuration "<ksvc-name>" does not have any ready Revision.
 31.539s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
Error: RevisionFailed: Revision creation failed with message: Post "https://172.21.0.1:443/apis/serving.knative.dev/v1/namespaces/48sz6i7t4tc/revisions": http2: server sent GOAWAY and closed the connection; LastStreamID=355, ErrCode=NO_ERROR, debug="".
Run 'kn --help' for usage
  • Key error detected in reconcile is the failure when posting to webhook.
'CreationFailed' Failed to create Revision: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": Post https://webhook.knative-serving.svc:443/defaulting?timeout=10s: context deadline exceeded"
  • After we got the error from kn, if we wait for a while, the final status is ready.
kubectl get ksvc <ksvc-name> -o json
{
  "address": {
    "url": "http://<ksvc-name>.48sz6i7t4tc.svc.cluster.local"
  },
  "conditions": [
    {
      "lastTransitionTime": "2021-01-03T16:08:41Z",
      "status": "True",
      "type": "ConfigurationsReady"
    },
    {
      "lastTransitionTime": "2021-01-03T16:08:43Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2021-01-03T16:08:43Z",
      "status": "True",
      "type": "RoutesReady"
    }
  ],
  "latestCreatedRevisionName": "<ksvc-name>-vxgph-1",
  "latestReadyRevisionName": "<ksvc-name>-vxgph-1",
  "observedGeneration": 1,
  "traffic": [
    {
      "latestRevision": true,
      "percent": 100,
      "revisionName": "<ksvc-name>-vxgph-1"
    }
  ],
  "url": "http://<ksvc-name>.48sz6i7t4tc.test.xxx.xxx"
}
Intermittent error 2
  • Error Msg:
kn service create <ksvc-name> --image <image-name>  
Creating service '<ksvc-name>' in namespace ...:

  1.405s The Configuration is still working to reflect the latest desired specification.
  6.503s The Route is still working to reflect the latest desired specification.
 12.083s ...
 15.248s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
  6.504s ...
  9.279s Ingress has not yet been reconciled.
 58.913s Ingress reconciliation failed
Error: ReconcileIngressFailed: Ingress reconciliation failed
Run 'kn --help' for usage
  • As reported in case 1, we observed the webhook connection failure as well in logs.
failed to create VirtualService: Internal error occurred: failed to call webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: context deadline exceeded
  • After we got the above error from kn, if we wait for a while, the final status is ready.
Intermittent error 3
  • Error Msg:
kn service create <ksvc-name>  --image <image-name> 
Creating service '<ksvc-name>' in namespace ...:

  0.234s The Configuration is still working to reflect the latest desired specification.
  0.235s Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.
  0.289s The Route is still working to reflect the latest desired specification.
  0.654s Configuration "<ksvc-name>" does not have any ready Revision.
  1.011s Configuration "<ksvc-name>" is waiting for a Revision to become ready.
Error: RevisionFailed: Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.
Run 'kn --help' for usage
  • Below is the related logs. I can't tell the exact error, but it still looks like a network issue:
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.116Z","logger":"controller.service-controller.knative.dev-serving-pkg-reconciler-service.Reconciler","caller":"service/reconciler.go:330","msg":"Updating status with:   v1.ServiceStatus{\n  \tStatus: v1.Status{\n- \t\tObservedGeneration: 0,\n+ \t\tObservedGeneration: 1,\n- \t\tConditions:         nil,\n+ \t\tConditions: v1.Conditions{\n+ \t\t\t{\n+ \t\t\t\tType:               \"ConfigurationsReady\",\n+ \t\t\t\tStatus:             \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t\tReason:             \"OutOfDate\",\n+ \t\t\t\tMessage:            \"The Configuration is still working to reflect the latest desired\"...,\n+ \t\t\t},\n+ \t\t\t{\n+ \t\t\t\tType:               \"Ready\",\n+ \t\t\t\tStatus:             \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t\tReason:             \"OutOfDate\",\n+ \t\t\t\tMessage:            \"The Configuration is still working to reflect the latest desired\"...,\n+ \t\t\t},\n+ \t\t\t{\n+ \t\t\t\tType:               \"RoutesReady\",\n+ \t\t\t\tStatus:             \"Unknown\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.115982123 +0\"...},\n+ \t\t\t},\n+ \t\t},\n  \t\tAnnotations: nil,\n  \t},\n  \tConfigurationStatusFields: {},\n  \tRouteStatusFields:         {},\n  }\n","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller","knative.dev/traceid":"fd00bc08-8f07-4793-af94-4813199901ff","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller info {"level":"info","ts":"2021-01-01T14:33:00.117Z","logger":"controller.service-controller.event-broadcaster","caller":"record/event.go:278","msg":"Event(v1.ObjectReference{Kind:\"Service\", Namespace:\"2y8lduk2d75\", Name:\"<ksvc-name>\", UID:\"433d081b-75d3-415b-94aa-ba277c7fe7e3\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"362985101\", FieldPath:\"\"}): type: 'Normal' reason: 'Created' Created Configuration \"<ksvc-name>\"","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.135Z","logger":"controller.configuration-controller.knative.dev-serving-pkg-reconciler-configuration.Reconciler","caller":"configuration/reconciler.go:330","msg":"Updating status with:   v1.ConfigurationStatus{\n  \tStatus: v1.Status{\n- \t\tObservedGeneration: 0,\n+ \t\tObservedGeneration: 1,\n- \t\tConditions:         nil,\n+ \t\tConditions: v1.Conditions{\n+ \t\t\t{\n+ \t\t\t\tType:               \"Ready\",\n+ \t\t\t\tStatus:             \"False\",\n+ \t\t\t\tLastTransitionTime: apis.VolatileTime{Inner: s\"2021-01-01 14:33:00.134904845 +0\"...},\n+ \t\t\t\tReason:             \"RevisionFailed\",\n+ \t\t\t\tMessage:            \"Revision creation failed with message: request declared a Conten\"...,\n+ \t\t\t},\n+ \t\t},\n  \t\tAnnotations: nil,\n  \t},\n  \tConfigurationStatusFields: {},\n  }\n","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller","knative.dev/traceid":"794d6e85-0b19-49e1-9957-41634f6276d5","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller info {"level":"info","ts":"2021-01-01T14:33:00.135Z","logger":"controller.configuration-controller.event-broadcaster","caller":"record/event.go:278","msg":"Event(v1.ObjectReference{Kind:\"Configuration\", Namespace:\"2y8lduk2d75\", Name:\"<ksvc-name>\", UID:\"9a9913af-fee1-42f4-9769-b57f2b29c384\", APIVersion:\"serving.knative.dev/v1\", ResourceVersion:\"362985103\", FieldPath:\"\"}): type: 'Warning' reason: 'CreationFailed' Failed to create Revision: request declared a Content-Length of 1546 but only wrote 0 bytes","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller"} 
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller debug {"level":"debug","ts":"2021-01-01T14:33:00.176Z","logger":"controller.service-controller.knative.dev-serving-pkg-reconciler-service.Reconciler","caller":"service/service.go:79","msg":"Configuration Conditions = v1.Conditions{apis.Condition{Type:\"Ready\", Status:\"False\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0x0, ext:63745108380, loc:(*time.Location)(0x3217da0)}}}, Reason:\"RevisionFailed\", Message:\"Revision creation failed with message: request declared a Content-Length of 1546 but only wrote 0 bytes.\"}}","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"service-controller","knative.dev/traceid":"6a78d5c7-ac6c-44a8-b48e-02e1170194a9","knative.dev/key":"2y8lduk2d75/<ksvc-name>"}
Ja
Jan 1 22:33:01 controller-64b86bcdd4-dm72t controller error {"level":"error","ts":"2021-01-01T14:33:00.178Z","logger":"controller.configuration-controller.knative.dev-serving-pkg-reconciler-configuration.Reconciler","caller":"configuration/reconciler.go:302","msg":"Returned an error","commit":"0e65ea7","knative.dev/pod":"controller-64b86bcdd4-dm72t","knative.dev/controller":"configuration-controller","knative.dev/traceid":"794d6e85-0b19-49e1-9957-41634f6276d5","knative.dev/key":"2y8lduk2d75/<ksvc-name>","targetMethod":"ReconcileKind","targetMethod":"ReconcileKind","error":"failed to create Revision: request declared a Content-Length of 1546 but only wrote 0 bytes","stacktrace":"knative.dev/serving/pkg/client/injection/reconciler/serving/v1/configuration.(*reconcilerImpl).Reconcile\n\tknative.dev/serving/pkg/client/injection/reconciler/serving/v1/configuration/reconciler.go:302\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/controller/controller.go:513\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/controller/controller.go:451"} 
  • After we got the above error from kn, if we wait for a while, the final status is ready.

Summary

all of these 3 errors are related to network intermittent connection errors. As in a cloud environment, it is the nature that the network is not reliable all the time.

Given knative will do retry to tolerate the network issues, can we think out a better approach to facilitate the user experience from the serving side and client cmd to avoid the in-middle creation failure invisible to the end-user?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/APIAPI objects and controllersgood first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIssues which should be fixed (post-triage)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions