Skip to content

Failed to sync with ReplicaFailure in ksvc creation sometimes #9857

@cdlliuy

Description

@cdlliuy

The problem happens in release 0.17, but should not be a regression issue on 0.17.

When creating knative application in a namespace in which limit range min/max specified, i.e. limit range min for cpu 10m, sometimes I can get the expected error msg of 'pod creation forbidden', but sometimes not and just knative application creation failed with ProgressDeadlineExceeded.

This is the output for the expected behaviour:

$ kn service create test3 --image docker.io/cdlliuy/kn-helloworld -n ca482111-7675 --request cpu=1m --limit cpu=1m  --force
Replacing service 'test3' in namespace 'ca482111-7675':
  0.363s Configuration "test3" is waiting for a Revision to become ready.
  2.084s Revision "test3-bqdlg-1" failed with message: pods "test3-bqdlg-1-deployment-7dcfc469f6-658tj" is forbidden: minimum cpu usage per Container is 10m, but request is 1m.
  2.121s Configuration "test3" does not have any ready Revision.
  2.315s ...
  2.356s Configuration "test3" is waiting for a Revision to become ready.
Error: RevisionFailed: Revision "test3-bqdlg-1" failed with message: pods "test3-bqdlg-1-deployment-7dcfc469f6-658tj" is forbidden: minimum cpu usage per Container is 10m, but request is 1m.
Run 'kn --help' for usage

But with similar cmd (just another ksvc name), it hangs..

$ kn service create test4 --image docker.io/cdlliuy/kn-helloworld -n ca482111-7675 --request cpu=1m --limit cpu=1m  --force
Creating service 'test4' in namespace 'ca482111-7675':
  0.219s The Route is still working to reflect the latest desired specification.
  0.291s Configuration "test4" is waiting for a Revision to become ready.
^C

Checking the deployment status of the latter one, the ReplicaFailure is caught.

  - lastTransitionTime: "2020-10-19T05:33:57Z"
    lastUpdateTime: "2020-10-19T05:33:57Z"
    message: 'pods "test4-hdqhd-1-deployment-576b96bc76-rb6tj" is forbidden: minimum
      cpu usage per Container is 10m, but request is 1m'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure

But for revision ..

  - lastTransitionTime: "2020-10-19T05:33:57Z"
    reason: Deploying
    status: Unknown
    type: ContainerHealthy
  - lastTransitionTime: "2020-10-19T05:36:28Z"
    message: Initial scale was never achieved
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Ready
  - lastTransitionTime: "2020-10-19T05:36:28Z"
    message: Initial scale was never achieved
    reason: ProgressDeadlineExceeded
    status: "False"
    type: ResourcesAvailable

In knative controller log output, given there is no enough logs exposed in https://github.com/knative/serving/blob/release-0.17/pkg/reconciler/revision/reconcile_resources.go#L62-L78, it is hard to say whether the deployment status changes triggered the revision reconcile in the unexpected case.

I think it is a kind of race condition. Any insight ?

Metadata

Metadata

Labels

area/APIAPI objects and controllersgood first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIssues which should be fixed (post-triage)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions