Improve lag-based autoscaler config persistence#18745
Conversation
| AutoScalerConfig autoScalerError = mapper.convertValue( | ||
| ImmutableMap.of( | ||
| "enableTaskAutoScaler", | ||
| "true", | ||
| "taskCountMax", | ||
| "1", | ||
| "taskCountMin", | ||
| "4" | ||
| ), AutoScalerConfig.class | ||
| ); |
Check notice
Code scanning / CodeQL
Unread local variable Note test
Hi, @Fly-Style thanks for the change. Not sure the above makes sense to me.
Looks like you want to have a way to not "reset" the task count during re-submits, and have the value be sticky. I would prefer this not be the default behavior, but rather opt-in since:
|
|
Similarly, have you tested what happens when a supervisor is terminated (tombstoned) and then resubmitted without a taskCountStart? It'd be good to make sure that we don't end up merging an old supervisor (with same ID) data with potentially unrelated, new supervisor's data. |
|
@Fly-Style , I think @jtuglu1 makes a fair point. I too would prefer that How about the following approach instead?
This would ensure:
What do you think? |
|
At our deployment, we do want supervisors to stick to the current task count when resubmitted. The reasons are:
I'm open to some way of offering both behaviors. I personally feel like sticking to the current task count on resubmitting is more intuitive as a default (since it seems odd that changing schema would lead to a reset of the task count). But I can live with the current behavior as default, as long as there's some way to get the stick-to-current behavior. Currently, there isn't. |
kfaraz
left a comment
There was a problem hiding this comment.
Took an initial pass assuming that we continue with the original approach in this PR, i.e. update the taskCountStart when auto-scaling event occurs and carry forward the current value of taskCountStart when supervisor is resubmitted.
Overall code flow makes sense. Left some nitpicks here and there.
@jtuglu1 , this is already handled since the @Fly-Style , let's try to add an embedded test for |
Would love to, but in separate PR if you don't mind :) |
A possibility:
This I think allows @jtuglu1 and me to both have the behavior we want: @jtuglu1 would set |
|
Thanks for the suggestion, @gianm ! I think that would work nicely and meet all our needs. |
|
SGTM |
There was a problem hiding this comment.
@Fly-Style , based on the decided approach, we should do the following:
- Continue persisting
taskCount(and nottaskCountStart) whenever an auto-scale event occurs. - Leave
taskCountStartas an immutable config - While merging spec on resubmission, follow this order:
providedtaskCountStart> providedtaskCount> existingtaskCount> providedtaskCountMin. - Update docs and add a release note in PR description to reflect this change in behaviour.
Please let me know if anything seems ambiguous.
26b82e7 to
fa86de5
Compare
| // Either if autoscaler is absent or taskCountStart is specified - just return. | ||
| if (thisAutoScalerConfig == null || thisAutoScalerConfig.getTaskCountStart() != null) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Should we also return early if this.ioConfig.getTaskCount() is specified?
There was a problem hiding this comment.
Not really, taskCountStart has bigger priority :)
the priority: provided
taskCountStart> providedtaskCount> existingtaskCount> providedtaskCountMin.
426d296 to
9feffbc
Compare
kfaraz
left a comment
There was a problem hiding this comment.
Looks good, thanks for the changes @Fly-Style ! 🚀
9feffbc to
f9cd5ec
Compare
f9cd5ec to
c7a1468
Compare
93a1a37 to
f3d20e0
Compare
provided
taskCountStart> providedtaskCount> existingtaskCount> providedtaskCountMin.Key changed/added classes in this PR
SupervisorManagerSeekableStreamSupervisorSpecThis PR has: