-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Failing pod never times out #6504
Copy link
Copy link
Open
Labels
area/APIAPI objects and controllersAPI objects and controllersarea/autoscalehelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.triage/acceptedIssues which should be fixed (post-triage)Issues which should be fixed (post-triage)
Milestone
Metadata
Metadata
Assignees
Labels
area/APIAPI objects and controllersAPI objects and controllersarea/autoscalehelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.triage/acceptedIssues which should be fixed (post-triage)Issues which should be fixed (post-triage)
In what area(s)?
/area autoscale
What version of Knative?
HEAD
Expected Behavior
When a pod fails to start properly it should eventually be terminated.
Actual Behavior
If an instance/pod fails to start and it is the first time the revision is starting a pod then the pod will eventually be terminated. But, if the first instance of the revision starts ok, then scales down to zero, if the next instance/pod that is created fails to start then the pod will continually crash-loop (which is expected) but it'll never be terminated and never goes away.
It seems like there should be consistency between a "first time pod" and a "2+ time pod" w.r.t. what happens when it crashes.
Steps to Reproduce the Problem
You can reproduce this by running this bash script:
The image used will crash if it is started after the time (hour:min) of the CRASH env var. So, in the case of
bugsvcwe create the ksvc before CRASH time, let it scale down to zero, then hit it after CRASH time so that the pod fails. KnServicebugsvc2crashes immediately to show how the pod will be removed (for me after about 2 minutes) whilebugsvc's pod seems to live forever (or at least a LOT longer).