-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Adjust cost-based autoscaler algorithm #18936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2a79747
632e2a7
18c0c7d
41ff87a
d3bbf4b
caf6aa2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,24 +29,25 @@ | |
| public class WeightedCostFunction | ||
| { | ||
| private static final Logger log = new Logger(WeightedCostFunction.class); | ||
|
|
||
|
|
||
| /** | ||
| * Ideal idle ratio range boundaries. | ||
| * Idle ratio below MIN indicates tasks are overloaded (scale up needed). | ||
| * Idle ratio above MAX indicates tasks are underutilized (scale down needed). | ||
| * Represents the maximum multiplier factor applied to amplify lag-based costs in the cost computation process. | ||
| * This value is used to cap the lag amplification effect to prevent excessively high cost inflation | ||
| * caused by significant partition lag. | ||
| * It ensures that lag-related adjustments remain bounded within a reasonable range for stability of | ||
| * cost-based auto-scaling decisions. | ||
| */ | ||
| static final double IDEAL_IDLE_MIN = 0.2; | ||
| static final double IDEAL_IDLE_MAX = 0.6; | ||
|
|
||
| private static final double LAG_AMPLIFICATION_MAX_MULTIPLIER = 2.0; | ||
| private static final long LAG_AMPLIFICATION_MAX_LAG_PER_PARTITION = 500_000L; | ||
| /** | ||
| * Checks if the given idle ratio is within the ideal range [{@value #IDEAL_IDLE_MIN}, {@value #IDEAL_IDLE_MAX}]. | ||
| * When idle is in this range, optimal utilization has been achieved and no scaling is needed. | ||
| * It is used to calculate the denominator for the ramp formula in the cost | ||
| * computation logic. This value represents the difference between the maximum lag per | ||
| * partition (LAG_AMPLIFICATION_MAX_LAG_PER_PARTITION) and the extra scaling activation | ||
| * lag threshold (CostBasedAutoScaler.EXTRA_SCALING_ACTIVATION_LAG_THRESHOLD). | ||
| * <p> | ||
| * It is impacting how the cost model evaluates scaling decisions during high-lag sceario. | ||
| */ | ||
| public static boolean isIdleInIdealRange(double idleRatio) | ||
| { | ||
| return idleRatio >= IDEAL_IDLE_MIN && idleRatio <= IDEAL_IDLE_MAX; | ||
| } | ||
| private static final double RAMP_DENOMINATOR = | ||
| LAG_AMPLIFICATION_MAX_LAG_PER_PARTITION - (double) CostBasedAutoScaler.EXTRA_SCALING_LAG_PER_PARTITION_THRESHOLD; | ||
|
|
||
| /** | ||
| * Computes cost for a given task count using compute time metrics. | ||
|
|
@@ -104,12 +105,15 @@ public CostResult computeCost(CostMetrics metrics, int proposedTaskCount, CostBa | |
| return new CostResult(cost, lagCost, weightedIdleCost); | ||
| } | ||
|
|
||
|
|
||
| /** | ||
| * Estimates the idle ratio for a given task count using a capacity-based linear model. | ||
| * Estimates the idle ratio for a proposed task count. | ||
| * Includes lag-based adjustment to eliminate high lag and | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think here again I am wondering about how generally applicable a constant for per partition lag in the real world. Same/similar questions as from the PPT scale limits computed in Also, in addition to the above, I think adding in this lag consideration does add some complexity here. Mainly it generally starts us down the path of making the cost function harder to easily and quickly understand for a newcomer, IMO. If this added complexity is considered a negative or "cost", the positive of improved behavior should outweigh it. So I guess that begs the question, how did we or are we going to measure the improvement that this additional logic/computation provides?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sometimes you want to have complex things in the project, because they make some things work slightly better. A good example is query planner / query optimizer, which we have from the Calcite side. It's not easy to enter, hard to master, but with complexity it brings a good framework to start using SQL for your database. I hook up your question from general comment:
We must do it, but the feature is not finally stabilized; anyway, it already has a decent base.
That's a very good question, and I would answer in the following manner: the less time we spend scaling supervisors manually / fine-tuning an autoscaler, the better the result we will receive. |
||
| * reduce predicted idle when work exists. | ||
| * <p> | ||
| * Formula: {@code predictedIdle = 1 - busyFraction / taskRatio} | ||
| * where {@code busyFraction = 1 - currentIdleRatio} and {@code taskRatio = targetTaskCount / currentTaskCount}. | ||
| * Formulas: | ||
| * {@code linearPrediction = max(0, 1 - busyFraction / taskRatio)} | ||
| * {@code lagBusyFactor = 1 - exp(-lagPerTask / LAG_SCALE_FACTOR)} | ||
| * {@code adjustedPrediction = linearPrediction × (1 - lagBusyFactor)} | ||
| * | ||
| * @param metrics current system metrics containing idle ratio and task count | ||
| * @param taskCount target task count to estimate an idle ratio for | ||
|
|
@@ -119,7 +123,6 @@ private double estimateIdleRatio(CostMetrics metrics, int taskCount) | |
| { | ||
| final double currentPollIdleRatio = metrics.getPollIdleRatio(); | ||
|
|
||
| // Handle edge cases | ||
| if (currentPollIdleRatio < 0) { | ||
| // No idle data available, assume moderate idle | ||
| return 0.5; | ||
|
|
@@ -130,13 +133,33 @@ private double estimateIdleRatio(CostMetrics metrics, int taskCount) | |
| return currentPollIdleRatio; | ||
| } | ||
|
|
||
| // Capacity-based model: idle ratio reflects spare capacity per task | ||
| // Linear prediction (capacity-based) - existing logic | ||
| final double busyFraction = 1.0 - currentPollIdleRatio; | ||
| final double taskRatio = (double) taskCount / currentTaskCount; | ||
| final double predictedIdleRatio = 1.0 - busyFraction / taskRatio; | ||
| final double linearPrediction = Math.max(0.0, Math.min(1.0, 1.0 - busyFraction / taskRatio)); | ||
|
|
||
| // Lag-based adjustment: more work per task → less idle | ||
| final double lagPerTask = metrics.getAggregateLag() / taskCount; | ||
| double lagBusyFactor = 1.0 - Math.exp(-lagPerTask / CostBasedAutoScaler.AGGRESSIVE_SCALING_LAG_PER_PARTITION_THRESHOLD); | ||
| final int partitionCount = metrics.getPartitionCount(); | ||
|
|
||
| if (partitionCount > 0) { | ||
| final double lagPerPartition = metrics.getAggregateLag() / partitionCount; | ||
| // Lag-amplified idle decay | ||
| if (lagPerPartition >= CostBasedAutoScaler.EXTRA_SCALING_LAG_PER_PARTITION_THRESHOLD) { | ||
| double ramp = Math.max(0.0, | ||
| (lagPerPartition - CostBasedAutoScaler.EXTRA_SCALING_LAG_PER_PARTITION_THRESHOLD) | ||
| / RAMP_DENOMINATOR | ||
| ); | ||
| ramp = Math.min(1.0, ramp); | ||
|
|
||
| final double multiplier = 1.0 + ramp * (LAG_AMPLIFICATION_MAX_MULTIPLIER - 1.0); | ||
| lagBusyFactor = Math.min(1.0, lagBusyFactor * multiplier); | ||
| } | ||
| } | ||
|
|
||
| // Clamp to valid range [0, 1] | ||
| return Math.max(0.0, Math.min(1.0, predictedIdleRatio)); | ||
| return Math.max(0.0, linearPrediction * (1.0 - lagBusyFactor)); | ||
| } | ||
|
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of magic-looking constants here. Please add javadocs to these constants describing what they do, and consider whether any of them should be configurable to allow for experimentation without changing the code. (Of course, ideally, users do not need to configure anything.)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add javadocs; I am following common sense here in terms of specific numbers.
That's what I'm looking for.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: caf6aa2