Skip to content

Consider max lag for kinesis while autoscaling#16284

Merged
AmatyaAvadhanula merged 3 commits intoapache:masterfrom
ac9817:consider-max-lag-for-kinesis
Apr 17, 2024
Merged

Consider max lag for kinesis while autoscaling#16284
AmatyaAvadhanula merged 3 commits intoapache:masterfrom
ac9817:consider-max-lag-for-kinesis

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Apr 15, 2024

Description

In kinesis, lag is computed in minutes rather than count of records as done in kafka. If each shard has a lag of 1 min and there are 10 shards, we were computing that there was 10 mins of lag for auto scaling decisions, but we should be only looking at the max and make the auto scaling decisions. On the other hand, for kafka, the lag considered for autoscaling is total as previously.

Release note

For kinesis streams, autoscaling is now done on max lag per shard rather than the total lag for all shards.


Key changed/added classes in this PR
  • LagStats.java
  • LagMetric.java
  • LabBasedAutoScaler.java
  • Supervisor.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@ac9817 ac9817 marked this pull request as draft April 15, 2024 14:03
@ac9817 ac9817 marked this pull request as ready for review April 15, 2024 14:04
long totalLags = lagStats.getTotalLag();
lagMetricsQueue.offer(totalLags > 0 ? totalLags : 0L);
long lag = lagStats.get(supervisor.getLagMetricForAutoScaler());
lagMetricsQueue.offer(lag > 0 ? lag : 0L);

Check notice

Code scanning / CodeQL

Ignored error status of call

Method run ignores exceptional return value of CircularFifoQueue<Long>.offer.
@kfaraz kfaraz requested a review from AmatyaAvadhanula April 15, 2024 19:28
Copy link
Copy Markdown
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Thank you @adithyachakilam

@AmatyaAvadhanula AmatyaAvadhanula merged commit 34237bc into apache:master Apr 17, 2024
@ac9817 ac9817 deleted the consider-max-lag-for-kinesis branch April 18, 2024 15:47
kfaraz pushed a commit that referenced this pull request Apr 20, 2024
Tries to address the comments made on #16284 after merged.

Changes:
- Remove method `Supervisor.getLagMetric()`
- Add method `Supervisor.computeLagForAutoScaler()`
- Remove classes `LagMetric` and `LagMetricTest`
ac9817 pushed a commit to ac9817/druid that referenced this pull request Apr 25, 2024
@ac9817 ac9817 mentioned this pull request Apr 25, 2024
10 tasks
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants