Currently we have the scale-to-zero-grace-period flag. But it's original purpose has been to avoid scaling to 0 before the network is reprogrammed. So over the time we spent quite some effort to make sure this number can be reduced or even made 0, in some cases (and it is if the revision has been backed by activator for that amount of seconds, as been requested before).
Unfortunately the documentation of the flag lead people to believe that it is a guaranteed time to hang the last pod around after we'd want to scale to 0 (i.e. 0 requests over the stable-window).
One way to achieve that is to increase stable window, but that leads to sluggish autoscaler performance.
So, I heard from users, that they in fact want pods to hang around for longer periods of time, after stable-window seconds of 0 requests (e.g. if your pod startup is very expensive).
So the suggestion is to introduce a knob for that, something like last-pod-retention-time, which would do that, while preserving the functionality that scale-to-zero-grace-period bears.
Then the scale to 0 time would be max(inferred-scale-to-zero-grace, last-pod-retention-time).
In what area(s)?
/area API
/area autoscale
/cc @ahmetb @mattmoor @markusthoemmes
Currently we have the
scale-to-zero-grace-periodflag. But it's original purpose has been to avoid scaling to 0 before the network is reprogrammed. So over the time we spent quite some effort to make sure this number can be reduced or even made 0, in some cases (and it is if the revision has been backed by activator for that amount of seconds, as been requested before).Unfortunately the documentation of the flag lead people to believe that it is a guaranteed time to hang the last pod around after we'd want to scale to 0 (i.e. 0 requests over the
stable-window).One way to achieve that is to increase stable window, but that leads to sluggish autoscaler performance.
So, I heard from users, that they in fact want pods to hang around for longer periods of time, after
stable-windowseconds of 0 requests (e.g. if your pod startup is very expensive).So the suggestion is to introduce a knob for that, something like
last-pod-retention-time, which would do that, while preserving the functionality thatscale-to-zero-grace-periodbears.Then the scale to 0 time would be
max(inferred-scale-to-zero-grace, last-pod-retention-time).In what area(s)?
/area API
/area autoscale
/cc @ahmetb @mattmoor @markusthoemmes