Dynamic auto scale Kinesis-Stream ingest tasks#10985
Dynamic auto scale Kinesis-Stream ingest tasks#10985clintropolis merged 17 commits intoapache:masterfrom
Conversation
|
Failed CI jobs are not this PR related. Maybe retry will be passed :) |
|
#10691 documents a few known flaky tests. Any ideas on how to make them less flaky will be much appreciated. I've re-triggered the failing tests - just to see if that fixes it |
Thanks for your help. Sure I will keep an eye on these jobs and try to do some tunning work if I can. |
|
This Change has been running in our Dev cluster for 2 weeks and works fine. So that I believe it is ready to be reviewed. Hi @pjain1 Sorry to bother you. Are you available to help me review this code? Since you have review the original design and may be more familiar with Stream Tasks autoscaler. Will appreciate it very much if you could lend me a hand :) |
|
@zhangyue19921010 any learnings from running in your dev cluster over the last few months? I've just started reviewing the change now. But hearing feedback first hand of anything you've noticed is very helpful! Thanks for your patience on this review. |
|
Hi @suneet-s Thanks a lot for your attention. As far as I know this patch works fine on our dev/stg cluster and will deploy to PRD cluster soon. |
techdocsmith
left a comment
There was a problem hiding this comment.
I added some suggestions as far as language and style the scaleOutThreshold and scaleInThreshold logic is confusing to me.
| | `lagCollectionRangeMillis` | The total time window of lag collection, Use with `lagCollectionIntervalMillis`,it means that in the recent `lagCollectionRangeMillis`, collect lag metric points every `lagCollectionIntervalMillis`. | no (default == 600000) | | ||
| | `scaleOutThreshold` | The Threshold of scale out action | no (default == 6000000) | | ||
| | `triggerScaleOutFractionThreshold` | If `triggerScaleOutFractionThreshold` percent of lag points are higher than `scaleOutThreshold`, then do scale out action. | no (default == 0.3) | | ||
| | `scaleInThreshold` | The Threshold of scale in action | no (default == 1000000) | |
There was a problem hiding this comment.
same comments as for scale out
clintropolis
left a comment
There was a problem hiding this comment.
code changes lgtm 👍
@techdocsmith do the doc changes need to be fixed or is it ok to do as a follow-up?
|
@clintropolis , assuming I didn't break anything w/ my suggested edits, docs changes LGTM. |
remove leading `
add missing `


Followed PR for #10524.
The core logic is the same, difference lies in the implementation of
computeLagStatsbetween Kafka and Kinesis.Description
This PR implements and documents the autoscaler based on
ingest/kinesis/lag/timemetrics for kinesis indexing service.Also only LagBased autoScalerStrategy is supported for now.
Key changed/added classes in this PR
KinesisSupervisor.javaKinesisSupervisorIOConfig.javaThis PR has: