Skip to content

Improve lag-based autoscaler config persistence#18745

Merged
kfaraz merged 9 commits intoapache:masterfrom
Fly-Style:fix/70147-autoscaler-persisted-cfg
Dec 12, 2025
Merged

Improve lag-based autoscaler config persistence#18745
kfaraz merged 9 commits intoapache:masterfrom
Fly-Style:fix/70147-autoscaler-persisted-cfg

Conversation

@Fly-Style
Copy link
Copy Markdown
Contributor

@Fly-Style Fly-Style commented Nov 14, 2025

  • When a supervisor is updated via API, the logic is following this order:
    provided taskCountStart > provided taskCount > existing taskCount > provided taskCountMin.
Key changed/added classes in this PR
  • SupervisorManager
  • SeekableStreamSupervisorSpec

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Comment on lines +569 to +578
AutoScalerConfig autoScalerError = mapper.convertValue(
ImmutableMap.of(
"enableTaskAutoScaler",
"true",
"taskCountMax",
"1",
"taskCountMin",
"4"
), AutoScalerConfig.class
);

Check notice

Code scanning / CodeQL

Unread local variable Note test

Variable 'AutoScalerConfig autoScalerError' is never read.
@Fly-Style Fly-Style marked this pull request as ready for review November 14, 2025 12:01
@jtuglu1
Copy link
Copy Markdown
Contributor

jtuglu1 commented Nov 14, 2025

When the autoscaler does a scaling action, it updates ioConfig.autoScalerConfig.taskCountStart, not ioConfig.taskCount

Hi, @Fly-Style thanks for the change. Not sure the above makes sense to me. autoScalerConfig.taskCountStart is supposed to be an immutable value used on supervisor submit to allow the scaler to not have to start at minimum task count (this makes having a smaller minimum easier/possible). Scaling actions should not touch this taskCountStart value (it is meant to be immutable value in the spec to all supervisor to return to baseline on resubmit). In a way, you can expect this to be the "average" task count. From the docs:

Optional config to specify the number of ingestion tasks to start with. When you enable the autoscaler, Druid ignores the value of taskCount in ioConfig and, if specified, starts with the taskCountStart number of tasks. Otherwise, defaults to taskCountMin.

ioConfig.taskCount is the currently running taskCount and should be what is updated. There are other things that rely on the ioConfig.taskCount value being updated/accurate, like the stopTaskCount calculations (both fixed and variable).

Looks like you want to have a way to not "reset" the task count during re-submits, and have the value be sticky. I would prefer this not be the default behavior, but rather opt-in since:

  • Your "current" task count may not accurately reflect the true current load of the system. "Resubmitting" the supervisor should be (IMO) equivalent to "reseting" the supervisor.
  • It's nice to have a way to "reset" this value back to the expected baseline task count. For supervisors running large (~1000s) of tasks often times supervisors can become bloated and not scale down fast enough. Resubmitting them allows task count to return to baseline.

@jtuglu1 jtuglu1 self-requested a review November 14, 2025 17:19
@jtuglu1
Copy link
Copy Markdown
Contributor

jtuglu1 commented Nov 14, 2025

Similarly, have you tested what happens when a supervisor is terminated (tombstoned) and then resubmitted without a taskCountStart? It'd be good to make sure that we don't end up merging an old supervisor (with same ID) data with potentially unrelated, new supervisor's data.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Dec 9, 2025

@Fly-Style , I think @jtuglu1 makes a fair point. I too would prefer that taskCountStart remain immutable and we retain the capability to reset the task count upon resubmission.

How about the following approach instead?

  • When submitting a supervisor with auto-scaler disabled, use the taskCount as is.
  • When submitting a (new or existing) supervisor with auto-scaler enabled, set taskCount = taskCountStart == null ? taskCountMin : taskCountStart.
    Persist the supervisor.
  • For any auto-scaling event, update the taskCount the same way that we do today.
  • When the supervisor starts, it should start with the taskCount as the starting number of tasks rather than taskCountStart.

This would ensure:

  • the issue with Overlord restarts is fixed
  • taskCountStart remains an immutable config
  • taskCount always reflects the current task count in the supervisor

What do you think?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 9, 2025

At our deployment, we do want supervisors to stick to the current task count when resubmitted. The reasons are:

  • Supervisors need to be updated periodically to do stuff like change tuning configs, change schemas, etc. When we do a change like this, we don't want the task count to change from whatever is currently running.
  • With the way we use autoscaling, the taskCountStart is just some fixed starter value that is the same for all tables, not anything actually related to that particular table. We rely on the autoscaler to find the ideal value. So, reverting back to the taskCountStart is not desirable.

I'm open to some way of offering both behaviors. I personally feel like sticking to the current task count on resubmitting is more intuitive as a default (since it seems odd that changing schema would lead to a reset of the task count). But I can live with the current behavior as default, as long as there's some way to get the stick-to-current behavior. Currently, there isn't.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took an initial pass assuming that we continue with the original approach in this PR, i.e. update the taskCountStart when auto-scaling event occurs and carry forward the current value of taskCountStart when supervisor is resubmitted.

Overall code flow makes sense. Left some nitpicks here and there.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Dec 9, 2025

Similarly, have you tested what happens when a supervisor is terminated (tombstoned) and then resubmitted without a taskCountStart? It'd be good to make sure that we don't end up merging an old supervisor (with same ID) data with potentially unrelated, new supervisor's data.

@jtuglu1 , this is already handled since the SupervisorManager contains only the non-tombstoned latest supervisor versions in memory. But we could add an embedded test to verify the same.

@Fly-Style , let's try to add an embedded test for LagBasedAutoScaler similar to what you are doing in the other PR for cost-based auto-scaler. The test could have a method which verifies that we do not pick up the taskCountStart from tombstoned versions, even if the supervisor id is the same.

@Fly-Style
Copy link
Copy Markdown
Contributor Author

let's try to add an embedded test for LagBasedAutoScaler similar to what you are doing in the other PR for cost-based auto-scaler.

Would love to, but in separate PR if you don't mind :)

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 9, 2025

I'm open to some way of offering both behaviors. I personally feel like sticking to the current task count on resubmitting is more intuitive as a default (since it seems odd that changing schema would lead to a reset of the task count). But I can live with the current behavior as default, as long as there's some way to get the stick-to-current behavior. Currently, there isn't.

A possibility:

  • When a supervisor is started through supervisor.start(), initial task count is always taken from taskCount. This keeps task count consistent through Overlord restarts, etc.
  • When a supervisor is posted through SupervisorResource#specPost, taskCount is set to:
    1. taskCountStart if that is nonnull
    2. else, the user-provided taskCount if that is nonnull
    3. else, the taskCount from the previous supervisor spec (in the DB) if one exists
    4. else, the user-provided taskCountMin

This I think allows @jtuglu1 and me to both have the behavior we want: @jtuglu1 would set taskCountStart and the task count would always reset to that. I would set neither taskCount nor taskCountStart, and the task count would start out at taskCountMin for a fresh supervisor, then retain the current count from then on.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Dec 10, 2025

Thanks for the suggestion, @gianm ! I think that would work nicely and meet all our needs.

@jtuglu1
Copy link
Copy Markdown
Contributor

jtuglu1 commented Dec 10, 2025

SGTM

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fly-Style , based on the decided approach, we should do the following:

  • Continue persisting taskCount (and not taskCountStart) whenever an auto-scale event occurs.
  • Leave taskCountStart as an immutable config
  • While merging spec on resubmission, follow this order:
    provided taskCountStart > provided taskCount > existing taskCount > provided taskCountMin.
  • Update docs and add a release note in PR description to reflect this change in behaviour.

Please let me know if anything seems ambiguous.

@Fly-Style
Copy link
Copy Markdown
Contributor Author

@gianm, @kfaraz, @jtuglu1 thanks a lot for a productive discussion! Happy to implement this according to all requests!

@Fly-Style Fly-Style force-pushed the fix/70147-autoscaler-persisted-cfg branch from 26b82e7 to fa86de5 Compare December 10, 2025 12:00
@Fly-Style Fly-Style requested a review from kfaraz December 10, 2025 13:09
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor final suggestions.

// Either if autoscaler is absent or taskCountStart is specified - just return.
if (thisAutoScalerConfig == null || thisAutoScalerConfig.getTaskCountStart() != null) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also return early if this.ioConfig.getTaskCount() is specified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, taskCountStart has bigger priority :)

the priority: provided taskCountStart > provided taskCount > existing taskCount > provided taskCountMin.

@Fly-Style Fly-Style force-pushed the fix/70147-autoscaler-persisted-cfg branch from 426d296 to 9feffbc Compare December 11, 2025 16:55
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the changes @Fly-Style ! 🚀

@Fly-Style Fly-Style force-pushed the fix/70147-autoscaler-persisted-cfg branch from 9feffbc to f9cd5ec Compare December 11, 2025 17:04
@Fly-Style Fly-Style force-pushed the fix/70147-autoscaler-persisted-cfg branch from f9cd5ec to c7a1468 Compare December 11, 2025 17:08
@Fly-Style Fly-Style force-pushed the fix/70147-autoscaler-persisted-cfg branch from 93a1a37 to f3d20e0 Compare December 11, 2025 17:27
@kfaraz kfaraz merged commit 4aedee9 into apache:master Dec 12, 2025
55 checks passed
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants